AFAIK web scraping (the act of grabbing and downloading any data you see available on the internet) isn't illegal, and I would assume downloading PDFs provided to you online would fall under that. Since it is copyrighted it would probably be illegal to share it, though.
Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Please don't post about US Politics. If you need to do this, try !politicaldiscussion@lemmy.world
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com.
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
This. In a case around LinkedIn courts ruled that in the US it’s legal to scrape publicly available data. The company doing the scraping was selling that data to corporate customers, but ultimately use might depend on the information you’re accessing and under what permissions. (Not a lawyer)
What if I web scraped something like a pirate site full of all the good media?
According to the big tech its ok if you're training large language model with it.
You're confusing the law that applies for the ruling class with the one that applies to common people
There's a law for the ruling class? I always figured they gotta just cut their political buddies in.
My brain is essentially an enormous language model.
Unironically yes, you would not know who Spiderman was without viewing a copyrighted work demonstrating what he looks like, and now you understand while generative AI fundamentally has to ingest copyrighted works.
If you can see it, you've already downloaded it. You're just chosing to retain it.
As with everything with the law, it depends.
In Australia, distribution is the illegal part, seeding/sharing is where they get you. Not the actual download itself.
It's usually not a question of legality, but efficiency.
It's easy and efficient to bust someone for seeding, but busting hundreds for the odd file you can prove they downloaded is expensive and takes forever.
busting hundreds for the odd file you can prove they downloaded is expensive and takes forever.
And might well not be legally possible if all you have is an IP address, because lest we forget:
An IP is not an ID
viewable for free online
If you are viewing it on your computer, you have already downloaded it.
Don't let anyone tell you otherwise.
already downloaded onto your computer and can be found in the browser cache
Exactly.
Ask the AI companies who scraped my sites while the media companies were DCMA-ing everything in sight and working with enforcement paid for with publuc funds to prosecute/persecute the "pirates".
It's ridiculous that Homeland Security is spending resources taking down pirate sites. That's a department specifically created to prevent terrorism, and instead they're operating as Pinkertons for broadcasting companies.
I'd say if the copyright holder says you're not allowed to then you're not. It's piracy.
People will tell you that you've already downloaded the data so saving it is fundamentally, technically no different, but that doesn't matter to the law, it's still piracy.
Like yeah, it's absurd and pointless and anti-consumer and anti-knowledge and unenforceable and unsustainable, but that's copyright. It's always been that way.
Copyright destroys culture and piracy is our ethical duty in the face of that. The only reason to care about it is so you don't get caught.
The laws are bullshit and shouldn't be followed. Information should be free to all
I never said I follow the law, I'm just wondering what the law says ;)
Not an expert, but in the U.S. making a copy of a broadcast for personal use is legal under fair-use. Anything that loads up on your computer screen you can make a copy and save it for personal use. So screen captures are by definition legal.
How exactly you copy the material on your screen gets tricky under the DMCA clusterfuck. Breaking encryption to copy the material is illegal unless there is an valid exception for fair-use. What exactly those valid exceptions are is above my paygrade.
Laws of course differ from country to country but generally if it is legally publicly available then no, it at best violates their EULA or something if you scrap such data. A company trying to prevent direct downloads cannot really charge you for you finding ways around that, because from a technical point of view the data was already cached onto your PC anyway.
As a tip, use the browsers F12 console's Network tab, instead of inspect element. For videos you may also try the absolute right click addon. It breaks the video player controls when enabled but often you can just right click save video if it isn't timed out and you can also enable regular controls via right click show controls. Tools like JDownloader2 can also often scrap various files but the former methods may work better.
There's also the video download helper add-on for Firefox that will allow you to download streams that aren't just media files your browser can http get. Though your browser can still access those streams, it needs a script component to handle it, so the built in file downloader/saver won't even see it as a thing to download.
Everything on the Internet can be downloaded, copied etc
It might be illegal to post it without permission, but you can download it all you damn well please and they can't stop you. Unless it's like government top secret something or other. In that case you probably don't want it anywhere near your computer and should probably tell somebody where you found it.
Warthunder discord server lol
Astonishing listening to the news coverage of that story where the anchors were reading some terminally online nonsense from the teleprompter about Discord "Thug Shakers"
should probably tell somebody where you found it
Somebody, as in your lawyer. Who can then inform the correct authorities, while making sure you don't become their scapegoat.
You care more than all of the ‘AI’ companies combined
Depends on where you are. Usually if it's a legal source, you can save it. But you're not supposed to share it unless given permission. If you downloaded it from a source that's not legal, things might change, depending on the specifics of your law.
Mind posting a guide on how you tinker with those inspect element tools?
Right click -> inspect element (Q) works.
You can also press F12.
And if right click is blocked, on Firefox holding SHIFT will unblock right click. There is also a plugin that does this for you.
Often websites will put an invisible element in front of the content to intercept this trick, but you can navigate through the elements to find the one they were trying to obfuscate.
Also you can just block elements you right click on in Firefox (though this might be an option added by an add-on). If there's hidden elements you just need to go through each of those until you can click on the one you want directly (and you can tell by what is highlighted in the inspect element mode).
You can also hit delete in inspect element mode to remove that element. You can also edit whatever you want in the element. Makes me wish it existed back when I was doing more web dev work, would have made things a lot easier when debugging.
(Sorry for the late response.) Well it depends a lot on the site. Since I focus on books and scholarly articles, the ideal way is to find the URL of the original PDF. The website might show you just individual pages as images, but it might hide the link to the PDF somewhere in the code. Alternatively, you might just obtain all the URLs of the individual page images, put them all into a download manager, and later bundle them all into a new PDF. (When you open the "inspect element" window, you just have to figure out which part of the code is meant to display the pages/images to you.) Sometimes the PDFs and page images can be found in your browser cache, as I mention in the OP. There's quite some variety among the different sites, but with even the most rudimentary knowledge of web design you should be able to figure out most of them.
If need help with ripping something in particular, DM me and I'll give it a try.
If it's in the public domain, it's almost certainly legal. I don't have the general answer to your question.
Really this question shows how outdated copyright law is; in many countries it prohibits "copying", but in the age of computers nearly all accessing of information involves "copying" it in some way.
They aren’t going after the hoarders, they are going after the sharers.
If something is in the public domain, there is no copyright covering it, so you should make as many copies as you feel like. Many public domain books are posted on the Internet Archive, where you can easily download them in various formats. Then you won't have to work hard to get the data. Public domain artwork, likewise, is often available on Wikimedia Commons.