Ok I am confused, 60k hits in a day? What broght down the website? 72MB of size, network congestion, or 60k hits? Even with authentication for download what can bring the system down? I have handled more traffic on a RPi with 100MBps connection. I really don’t get it.
Love the irony on that!
Which makes me wonder how much valuable content and research in locked away on technologically ancient servers in Cambridge and Oxford.
More modern and up to date than you'd think :) Some departments here in Oxford have a server life-cycle of 3-5 years. It's just that nobody bothers for higher than expected volume of traffic unfortunately (a practice that can be extrapolated to many things in academia).
If your server is configured for a low number of concurrent connections, it can easily seem swamped if a small number of people are concurrently downloading a large file slowly. All that's happening is that it's not accept()ing new connections until existing ones finish.
Heh, I don't know what the current hardware is for the CUDL DSpace repository, but having helped upgrade it a few years ago, I can't say the software is the fastest.
It's a java application with XLST in the page rendering pipeline, so not exactly optimised for speed.
If you're hitting a single box, this seems perfectly possible - especially if the server or the network is poorly configured. This is what CDNs are for.
This isn't Joe's blog running from a desktop found in a dumpster. It's a university with actual infrastructure. They should be able to handle this. Registration is probably more load.
A 72MB file being served 500,000 times over a 24hr period is 3-4Gb/sec.
The department probably has one or more web servers serving content off a network drive. Your average "rack of network storage" won't even blink at that.
There was probably a gigabit switch somewhere or that was being a bottleneck or the web server was simply misconfigured for this task.
With a half time of 3 days of interest and a 100 MBps connection the average download time would be 3 hours. That would indeed seem unusual for today. but 30k page hits spaced out over 60 hours is not too much of a burden on top of that.
Guilty as charged, as well. I download just about any programming manual/book that's free in PDF or similar form and I've probably read maybe 10% of them.
Was just thinking, "This sounds like a job for IPFS!" I spent most of Sunday reading about and playing with IPFS. Great idea - I hope it gets some traction.
But even so, while 1966 was indeed early for "regular" use of fax - the first "user friendly" Xerox fax machines hit the market around then -, the first transmission of facsimiles of images dates to the 1840's, and the first fax that used similar methods to "modern" fax machines of scanning line by line (the "scanning phototelegraph") dates to 1880. Commercial fax machines have been around since around 1900.
So it would indeed be possible.
One weird and wonderful product of early faxes (fax over radio predates "wired" fax machines): Finch Facsimile's [1] were used to transmit "newspapers" via AM radio in the '30's, that was then printed on thermal paper at the home of the subscriber.
From [1]: "Six hours overnight was enough time to print a six page two column news bulletin, delivered in time for breakfast."
I think the point was that he entered the information in digitally, so it should have been easier to provide a digital copy rather than scan actual printed pages.
I quoted UNIX to show how primitive technology was at the time, rather than say he could've possibly used a computer. With ALS, using a computer would as hard as writing I would presume.
Not only did he type it with a typewriter, all the math notation in the thesis is hand-written! There was no math typesetting available in 1966 outside major printing houses. (Or, I guess, any reasonable priced option for getting your PhD thesis properly typeset and printed at all, math or not.)
It's possible Hawking was able to type/write his thesis. It was published in 1965, two years after his ALS diagnosis. Wikipedia says he didn't begin to use crutches until the late 1960s.
"I've gone through early Chaplin work. I've seen Metropolis 17 times."
"I'm sure there's something"
(grabs his friend)
"You don't understand, Paul. I've been reading Shannon. A Mathematical Theory of Communication. I've run out. I've taken to begging strangers for a fix."
"That bad?"
(Guilty), "I just... " (resigned) "I just asked someone to put up a torrent of Stephen Hawking's Ph.D. thesis..."
I just have to ask why this is news? Is this really something newsworthy?
Cambridge's network probably isn't as hardened to spikes in traffic since they don't get much traffic. But still, it isn't 1995. They should have some form of load balancing or distributed/clustered web/data/file systems to handle temporary spikes in traffic and data requests. Serving simple static data isn't something that should "crash the site".
The technology behind the repository itself is not great. (DSpace[1]), add to that the factor that it is not actually build to handle this many requests and scaling quickly is out of the question too because of the server set up.
Even without issues, it often felt a bit sluggish when serving locally. The pages are quite large, and the whole pipeline from content -> webpage is rather tedious.(Java, XSLT -> html)
It shouldn't have happened - but I assumed it would.
Suggestions on getting this in audio form? I guess it requires transcribing the handwritten parts. The Chrome OCR fails there. Is there a better one?
Sample:
This implies that the universe is spatially homogeneous and isotropic
since there is no direction defined in the 3- space orthogonal to Ua.
In this universe we consider small perturbations of the motion
of tl1e fluid and of the '.ifeyl tensore 1
Ne neglect products of small
quantities and perform derivatives with respect to the undisturbed
metric. Since all the quantities we are interested in with the
exception of the scalars, µ, ~' e have unperturbed value zero, we
avoid perturbations that merely represent coordinate transformation
and have no physical significance.
To the first order the equations (1) - (4) and (7) - (9) are
Stephen Hawking and his dissertation are high-profile as these things go. The NPR mentions other popular items generating 100s of requests per month. I've run across items with lifetime request counts in the double or triple digits frequently (and suspect I doubled the count on one particular item).
More often, though, the truth is that this material simply isn't available online. There are several thesis repositories (either Michigan State or University of Michigan are one, as I recall), and I can frequently turn up a shelf reference via WorldCat ... somewhere.
But there's work from surprisingly prominent names in numerous fields that simply isn't available in electronic format. The worst case is for materials from rougly 1924 - 1980: to late to be out of copyright, and too early to have been composed, or converted to, digital formats (and 1980 is an early cut-off date for that, though it's when material seems to start appearing in bulk).
This includes PhD dissertations, Masters theses, and numerous academic or other writings, often including government documents not under copyright. Thankfully with Sci-Hub, actual published academic journal articles can be found, freely, with a very high success rate. Particularly painful for me are popular magazine and newspaper items, for which even the indices are very frequently locked behind site-restricted or affiliate-only access.
The time-and-effort differential of being able to look something up online, vs. travelling many miles to a facility for access, is tremendous. And it absolutely stops a great many incidential queries dead.
See Rick Falkvinge's excellent rant about how the KRACK vulnerability was blocked behind corporate-only paywalls for over a decade:
Note that the issues here are twofold. One element is the task of scanning and making available documents, and organising the results in a manner useful for search.
But much the harm is the direct consequence of the present regime of copyright and paid access to information, AS WELL AS the perverse incentives of advertising-backed media and media manipulation have created a media regime that is actively harmful to society.
I'd really like to see the elements of this addressed.