In my understanding, some people have donated their academic login credentials, effectively giving Sci-hub access to their institution's subscriptions.
When somebody requests an article, Sci-hub will try and see if it already has it. If not, it will attempt to use some of the donated credentials to gain access, download it and store it for further download requests.
I don't think you'll have to. Both hidden and conspicuous watermarks are common in trade book PDF downloads. RIAA members poisoned the Bit Torrent well.
I'm neutral on the matter, but there are some obvious paths of recourse for publishers who don't want people sharing their credentials.
(Disclaimer: I work for Crossref, which is an organisation in the scholarly publishing space.)
In any case, some researchers have claimed that Sci-Hub obtained their credentials through phishing. I don't know if it's true but in any case, such claims provide plausible deniability against that kind of watermarking proof, don't they ?
I'm asking because a friend of mine who's enrolled in 2 colleges, just downloaded the same scholarly PDF (one that's also available through sci-hub) with 2 different college credentials and they had the same sha256sum.
Then when he tried it through sci-hub, he also got the same sha256sum.
Crossref doesn't do anything at all in relation to SciHub or watermarking PDFs. We don't host content. (I'm happy to talk about what we do do, but it's off-topic)
Sorry if my disclaimer wasn't clear. I was just pointing out affiliation as standard on HN.
Nah, that doesn't work for academic papers. You usually access them anonymously anyway through your institution (which has a subscription). So even if they watermark (and I don't think they do), they can only watermark the institution.
I'm still curious as to how this system works. Don't institutions have logs that would show someone researching topics wildly outside of their field of study?
We do keep logs for a period of time, but the library administration's policy is to only investigate them when a publisher contacts us with a claim of abuse (we do not proactively monitor for unusual activity, although we do take steps like limiting the number of concurrent sessions per user and blacklisting IP addresses/ranges with a history of suspicious activity).
The publisher generally supplies examples of timestamps and URLs that were part of the alleged abuse. We use that information to identify the "abusing" user in the log.
Usually there is pretty clear evidence that the user is not conducting legitimate personal research (e.g. the user is a freshman early childhood education major at the local rural branch campus, but they're downloading thousands of chemistry papers from an IP address in China or Russia). Typically the user does not seem like an information freedom warrior, or even to have a clue what is going on, so it seems most likely the credentials were phished.
These cases may or may not be phishing. When corporations are hacked for their user credentials, those databases sometimes end up in dark web markets. It would be easy to extract email addresses with .edu domains ... so if a student used their university address for some service and reused the password, there's your login.
Moral of the story: Encourage students to use a password manager and 2FA.
Academic librarians, who negotiate terms with publishers, are obsessed with privacy and academic freedom. Bless 'em. In theory, publishers don't know who's downloading what. The librarians I've talked to say they delete their logs daily, if not several times a day.
As for geographic location, academics tend to travel a lot. Last I heard (from reading the court docs in Elsevier's lawsuit against Sci-Hub), they stopped using proxy connections a few years ago. They just log in using stored credentials, grab an auth token that lasts X minutes/hours, and download articles from whatever IP is convenient.
Usually, institutions don't subscribe to everything a publisher has on offer, but rather a subset that is interesting for them. Therefore, if a download works with a set of credentials, it should not look suspicious.
Storing the documents on Sci-hub controlled machines also makes sure that repeated requests don't actually hit the publishers, which significantly lowers what would otherwise be suspiciously high traffic.
When somebody requests an article, Sci-hub will try and see if it already has it. If not, it will attempt to use some of the donated credentials to gain access, download it and store it for further download requests.