HomeSpacer
TV
Spacer
MOVIES
Spacer
MUSIC
Spacer
FASHION
Spacer
GEEKS
Spacer
BOOKS
Spacer
ART
Spacer
COMEDY
Spacer
DANCE
Spacer
CLASSICAL
Spacer
OPERA
Spacer
TRAVEL
Spacer
FITNESS
Spacer
THEATER
 
 LOG IN | REGISTER NOW!

GEEKS TODAY
TOP TOPICS
TOP MOBILE APPS
ABOUT US

Library of Congress Having Major Issues with Twitter Archive


Related: Library of Congress, Twiiter

Library of Congress Having Major Issues with Twitter Archive

Back in 2010, the Library of Congress announced that they'd be archiving every public tweet made on Twitter since 2006. How's it going? Not so great as per a whitepaper released today by the Library of Congress which details their many technological challenges of searching 170 billion tweets - including searches taking more than 24 hours. Yes, hours.

Here's the update:

In April, 2010, the Library of Congress and Twitter signed an agreement providing the Library the public tweets from the company's inception through the date of the agreement, an archive of tweets from 2006 through April, 2010. Additionally, the Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis under the same terms. The Library's first objectives were to acquire and preserve the 2006-10 archive; to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date. This month, all those objectives will be completed. To date, the Library has an archive of approximately 170 billion tweets.

The Library's focus now is on confronting and working around the technology challenges to making the archive accessible to researchers and policymakers in a comprehensive, useful way. It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data. Even the private sector has not yet implemented cost-effective commercial solutions because of the complexity and resource requirements of such a task. The Library is now pursuing partnerships with the private sector to allow some limited access capability in our reading rooms. These efforts are ongoing and a priority for the Library.

This document summarizes the Library's work to date and outlines present-day progress and challenges.

Why the Twitter Collection is Important to the Nation's Library Twitter is a new kind of collection for the Library of Congress, but an important one to its mission of serving both Congress and the public. As society turns to social media as a primary method of communication and creative expression, social media is supplementing and in some cases supplanting letters, journals, serial publications and other sources routinely collected by research libraries. Archiving and preserving outlets such as Twitter will enable future researchers access to a fuller picture of today's cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes. The Library of Congress Agreement with Twitter

The Library's agreement with Twitter announced April 14, 2010 provided that:2

• Twitter would donate a collection consisting of all public tweets from the Twitter service from its inception to the date of the agreement, an archive of 21 billion tweets that occurred between 2006 and 2010.

• Any additional materials Twitter provides to the Library would be governed by the terms of the agreement unless both parties agree to different terms in advance of receiving such additional materials.

• The Library could make available any portion of the collection six months after it was originally posted on Twitter to "bona fide" researchers.

• A researcher must sign a "notification" prohibiting commercial use and redistribution of the collection.

• The Library cannot provide a substantial portion of the collection on its web site in a form that can be easily downloaded.

Transfer of Data to the Library

In December, 2010, Twitter named a Colorado-based company, Gnip, as the delivery agent for moving data to the Library.

Shortly thereafter, the Library and Gnip began to agree on specifications and processes for the transfer of files - "current" tweets - on an ongoing basis.

In February 2011, transfer of "current" tweets was initiated and began with tweets from December 2010.

On February 28, 2012, the Library received the 2006-2010 archive through Gnip in three compressed files totaling 2.3 terabytes. When uncompressed the files total 20 terabytes. The files contained approximately 21 billion tweets, each with more than 50 accompanying metadata fields, such as place and description.

As of December 1, 2012, the Library has received more than 150 billion additional tweets and corresponding metadata, for a total including the 2006-2010 archive of approximately 170 billion tweets totaling 133.2 terabytes for two compressed copies. Building a Stable, Sustainable Archive

The Library's first and most fundamental activities included developing a stable and sustainable way to acquire, preserve and organize the Twitter collection. Although the Library regularly acquires digital content, the Twitter stream is the first collection coming into the Library in a continuous stream. The Library leveraged the technical infrastructure and workflow established for other digital content in the transfer of Twitter data.

The Library runs a fully automated process for taking in these new files. Gnip, the designated delivery agent for Twitter, receives tweets in a single real-time stream from 3 Twitter. Gnip organizes the stream of tweets into hour-long segments and uploads these files to a secure server throughout the day for retrieval by the Library.

When a new file is available, the Library downloads the file to a temporary server space, checks the materials for completeness and transfer corruption, captures statistics about the number of tweets in each file, copies the file to tape, and deletes the file from the temporary server space.

The technical infrastructure for the Library's Twitter archive follows the same general practices for monitoring and managing other digital collection data at the Library. Tape archives are the Library's standard for preservation and long-term storage. Files are copied to two tape archives in geographically different locations as a preservation and security measure.

Leave Comments

Related Links
Carole King Set for 'In Performance at the White House' Series, 5/22Carole King Set for 'In Performance at the White House' Series, 5/22
May 17, 2013
Eric Bogosian, Rory O'Malley and More Set for LITTLE MURDERS Benefit Reading, 5/19Eric Bogosian, Rory O'Malley and More Set for LITTLE MURDERS Benefit Reading, 5/19
May 17, 2013
PAWN STARS Premieres New Season, 5/20PAWN STARS Premieres New Season, 5/20
May 14, 2013
History to Air New Season of PAWN STARS, 5/30History to Air New Season of PAWN STARS, 5/30
May 14, 2013
Univision Communications and El Rey Network Announce Strategic PartnershipUnivision Communications and El Rey Network Announce Strategic Partnership
May 14, 2013

Past Articles by This Author:
  • BlackBerry to Launch BBM Messenger for iOS and Android this Summer
  • Apple vs. Samsung Update: Apple Adds Galaxy S4 to Massive Lawsuit
  • Condoleezza Rice, Walter Isaacson, Jim Collins to Headline ExactTarget Connections Sept. 17-19
  • ChannelAdvisor and Google Host Webinar to Share Tips for Success with Enhanced Campaigns
  • Leaf Unveils Second Generation of its Built-for-Business Tablet
  • BlackBerry Unveils Version 10.1 Now Available for Download for Enterprise Users
  • BlackBerry Announces Q5 a 'Youthful and Fun Smartphone'
  • BlackBerry to Webcast Keynote and Alicia Keys Performance from Orlando
  • McAfee and Intel Deliver New Model for Consumer Security - LiveSafe
  • Leaf Unveils New POS Android Tablet

    More Articles by This Author...

  • Get News & Specials!

    FLIPBOARD
    SAMSUNG
    APPLE
    GOOGLE
    VERIZON
    PANASONIC
    T-MOBILE
    NETFLIX
    BELKIN
    ELECTRONIC ARTS

    CBS HBO GAMING ACCESSORIES DISNEY SMASH CLOUD MOBILE IPHONE AMAZON

    Apple's Latest Milestone: App Store Hits 50 Billion Downloads APPLE Apple's Latest Milestone: App Store Hits 50 Billion Downloads
    Google Tells Microsoft to Remove Ad-Less YouTube App from Windows Phones GOOGLE Google Tells Microsoft to Remove Ad-Less YouTube App from Windows Phones
    Apple and Sony iRadio Negotiations Stymied by Song-Skipping APPLE Apple and Sony iRadio Negotiations Stymied by Song-Skipping
    Google TV to Receive Android Jelly Bean OS Upgrade LG ELECTRONICS Google TV to Receive Android Jelly Bean OS Upgrade
    RunKeeper Hits Pebble Smart Watch Today on iPhone and Android RunKeeper Hits Pebble Smart Watch Today on iPhone and Android
    VIDEO: Must Watch - Bill Gates Talks Steve Jobs on 60 Minutes APPLE VIDEO: Must Watch - Bill Gates Talks Steve Jobs on 60 Minutes
    ABC & Nielsen Partner to Measure Mobile Advertising Campaigns ABC ABC & Nielsen Partner to Measure Mobile Advertising Campaigns

    BWW TV World Logo
      
    BWW Movies World Logo
      
    BWW Fashion World Logo
      
    BWW Music World Logo
    BroadwayWorld.com Logo
      
    BWW Opera World Logo
      
    BWW Dance World Logo
      
    BWW Classical World Logo

    All Materials Copyright 2013 Wisdom Digital Media | Privacy Policy | RSS/XMLFeeds