My new job: Ubuntu
HOWTO: Upgrade the BIOS in Aspire One (Linux)

The life of links

Fernando Tricas always has interesting things to say. In a recent post he talks about The life of links and digital content (Spanish):

«We tend to assume that digital [content] is forever. But anyone who accumulates enough information also knows that sometimes its difficult to find it, in other cases it breaks and, of course, there is a non-zero probability that things go wrong when hosted by third-party services. It is an old topic here, remember Will we have all this information in the future? . The topic resurfaces as news in the light of Currently charged by the article that can be read at A Year After the Egyptian Revolution, 10% of Its Social Media Documentation Is Already Gone».

In the comments, Anónima said: «Given a time t and an interval Δt, the larger Δt, the more likely is that all information in a time t-Δt you want to find is gone». This sounded like an statement to check, Thus, I decided to do an experiment with del.icio.us' bookmarks.

In delicious.com/rvr I have archived around 4000 links from 2004. So, I downloaded the backup file, an HTML file with all links and metadata (date, title, tags). I developed a python script to process this file: go through the links and save its current status (whether the link is alive or not). With another script, the status were processed to generate the statistics. These are the results:

Captura de pantalla 2012-04-12 a la(s) 01.02.39

As can be seen, there is a correlation between the age of the links and the probability of being dead. For the 10% who cited the Egyptian revolution, in the case of my delicious, we must go back three years ago (2009). But at 6 years from now, a quarter of the links are now defunct. Of course, the sample is very small shouldn't be representative. It would be interesting to compare it with other accounts and to extend the time span: How many links are still alive after 10 or 15 years? Is it the same with information stored in other media? Are all this death links resting in peace in a forgotten Google's cache disk?

I imagine that sometime in the future, librarians will begin to worry not only to digitize remote past documents, but also to preserve those of the present.

In case you are interested, the code to generate such data is available at github.com/vrruiz/delicious-death-links. The spreadsheet is also available in Google Docs .

Comments

Frank Weber

I would be supportive on all of your articles and blogs because they are just upto the mark.
http://www.daytipper.net

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)