Fernando Tricas always has interesting things to say. In a recent post he talks about The life of links and digital content (Spanish):
«We tend to assume that digital [content] is forever. But anyone who accumulates enough information also knows that sometimes its difficult to find it, in other cases it breaks and, of course, there is a non-zero probability that things go wrong when hosted by third-party services. It is an old topic here, remember Will we have all this information in the future? . The topic resurfaces as news in the light of Currently charged by the article that can be read at A Year After the Egyptian Revolution, 10% of Its Social Media Documentation Is Already Gone».
In the comments, Anónima said: «Given a time t and an interval Δt, the larger Δt, the more likely is that all information in a time t-Δt you want to find is gone». This sounded like an statement to check, Thus, I decided to do an experiment with del.icio.us' bookmarks.
In delicious.com/rvr I have archived around 4000 links from 2004. So, I downloaded the backup file, an HTML file with all links and metadata (date, title, tags). I developed a python script to process this file: go through the links and save its current status (whether the link is alive or not). With another script, the status were processed to generate the statistics. These are the results:
As can be seen, there is a correlation between the age of the links and the probability of being dead. For the 10% who cited the Egyptian revolution, in the case of my delicious, we must go back three years ago (2009). But at 6 years from now, a quarter of the links are now defunct. Of course, the sample is very small shouldn't be representative. It would be interesting to compare it with other accounts and to extend the time span: How many links are still alive after 10 or 15 years? Is it the same with information stored in other media? Are all this death links resting in peace in a forgotten Google's cache disk?
I imagine that sometime in the future, librarians will begin to worry not only to digitize remote past documents, but also to preserve those of the present.
In case you are interested, the code to generate such data is available at github.com/vrruiz/delicious-death-links. The spreadsheet is also available in Google Docs .
I would be supportive on all of your articles and blogs because they are just upto the mark.
http://www.daytipper.net
Posted by: Frank Weber | September 20, 2012 at 08:42 AM