v.10 no.2 (Summer 2009)

Permanent Electronic Access to Government Information: A Study of Federal, State, and Local Documents

Claudene Sproles, Government Information Reference Librarian
University of Louisville Libraries, USA

Angel Clemons, Government Publications Librarian
University of Louisville Libraries, USA

Even before the conception of electronic-only documents, providing permanent public access to government information was challenging.  Since 1813, federal depository libraries have acquired and maintained tangible items to ensure continued access for the public.  Fugitive documents, or government-produced information which escapes distribution through the Federal Depository Library System (FDLP), have always been a major concern.  In our current environment, where the vast majority of government information is distributed electronic-only, this problem has only worsened.  That, coupled with the ability to completely destroy electronic documents with the click of a mouse, has made the task of finding and ensuring permanent public access to this information even more daunting.

Agencies remove information for a variety of reasons--fear of the release of sensitive material, web site restructuring, or a lack of understanding of the historical significance of some materials. In 1998, an audit of the National Institute of Health found that 78% of material suitable for inclusion in the FDLP was not submitted by government agencies.1 Former Public Printer Bruce James noted that agency self-publication which bypasses standard government distribution “deprives future generations from having an accurate record of the work of our government.”2  In this day and age, how permanent is electronic government information?  This paper will explore the permanency of selected electronic government information over a twelve month period to determine its stability and accessibility.

Literature Review

In the early 1990s, emergence of electronic information (e-information) prompted librarians and government bureaucrats to reevaluate the dissemination, access, and preservation of government documents.  They came to the realization that in the near future, the majority of government information would be accessible only in electronic format.  Partially in response to this issue, the Joint Committee on Printing held a hearing on “Government Information as a Public Asset” on April 25, 1991 in which librarians and other interested parties testified before Congress.  These groups voiced their concerns that core titles in the depository program disappeared from public view once they became electronic-only.3 Adding to this dilemma, many of the electronic titles that existed in 1991 were not publicly accessible, or users had to pay a fee to access this tax-payer funded information.  As the number of electronic titles increased, librarians worried over a perceived increase in fugitive titles.

The Government Printing Office (GPO) Electronic Information Access Enhancement Act of 1993 mandated that GPO maintain electronic information and provide free access to depository libraries.4 To complement this, a 1993 report by the Information Industry Association (IIA) outlined policy principles to ensure continued permanent availability to government sources.5 This was one of the few documents that addressed the importance of state and local document preservation also.  The report stresses that the public should have equal access rights “regardless of the media in which it exists.”6

In 1994, James P. Love summarized the current state of electronic government information by asserting that government policy ignored how federal agencies should provide access to the e-information they produce. Few mandates, he found, made e-information publicly accessible.7  At the same time, J.S. Walters noted that poor government policy, in part mandated by the Office of Management and Budget (OMB), allowed agencies too much autonomy over their electronic publications.   As a result, many electronic documents already were inaccessible to the public, “known only unto their creators and a handful of specialists.”8

By 1995, the GPO began assessing and evaluating technology options for providing permanent access to government information.  The GPO developed an Electronic Transition Staff to “identify, assess, and implement information technology solutions.”9 At the Fall 1995 Depository Library Council Meeting, the group summarized their mission to the depository community and promised to work with federal agencies to ensure access to electronic products.10  By 1997, the GPO estimated that 50% of all publications failed to enter the depository system,11 particularly scientific and technical reports.  GPO outlined some reasons many titles were escaping the depository system: complete sole electronic dissemination by agencies, less compliance with statutory provisions for publications, and copyright restrictions due to arrangements with the private sector. GPO knew of many titles that were bypassing the depository system, but were still struggling with agency cooperation.  They noted that “extensive negotiations and even Congressional intervention have proved necessary to ensure compliance.”12

A 1998 paper by O’Mahony echoed the concern of the GPO.  He feared government information was disappearing at an alarming rate, due to lack of systems in place to find and archive electronic information.13  O’Mahony predicted that due to the fleeting nature of electronic information, “there will be little or no information to survive from this period of time.”14 To combat this situation, GPO began using PURLs (Persistent Uniform Resource Locators) that directed users to a source when the URL had changed, which was happening frequently.  In a study of broken links, the GPO found that of “367 links checked, 213 were invalid.”15 PURLs helped address the question of authenticity, as the GPO determined the official version.  Additionally, the GPO began developing partnerships, such as with the DOE Information Bridge, giving users multiple access points for electronic information.16

In 1999, The United States Commission on Libraries and Information Science issued the “Report on the Assessment of Government Information Products.”17  Key findings of this report stated that there was still an overall lack of government information policy and coordination of initiatives at all levels.  Electronic publishing remained very decentralized and murky.  Agencies had difficulty identifying within their specific agency “person(s) responsible for the product”18 and often failed to grasp the importance of providing permanent public access to those products.

Concern continued to mount about the amount of material escaping dissemination. In 2001, the American Association of Law Libraries Government Documents Special Interest Section formed the Fugitive & Electronic-Only Documents Committee over this concern.  The Committee’s mission involved “identify[ing] and report[ing]” fugitive documents and aid in the tangible distribution of some of these materials.19  Robert Slater addressed policy options that promoted electronic publishing solely for dissemination of government information.20 He foresaw that electronic access will increase access to government information rather that restrict it. Like O’Mahoney, Slater noted that the problem of permanence was still the major issue, with many documents disappearing shortly after their conception.  Slater also predicted that “a vast amount of information may be gone forever.”21  Peterson, Cowell, and Jacobs addressed additional concerns about e-only government information.22  Supplementing the ever-present worry over continued access, documents only being available from a single digital collection, such as those on the GPO server, troubled them. While electronic information resides with the government, government entities theoretically could modify the original document at any time, in essence destroying access to the original material.  They called for the depository community to maintain a watchful eye over government dissemination practices to ensure authenticity and true permanent access.

In 2002, John Shuler advised depositories to “rethink” their role in electronic information dissemination.23 He suggested that librarians place too much emphasis on the idea of fugitive information and not enough on how to provide information in a useful fashion to the public.  Shuler prophetically stated that libraries would play an instrumental role in the preservation and promotion of electronic government information, but they would need to develop new approaches to the access and organization of this material.  In a related piece, George Barnum of the GPO outlined the agency’s model to create permanent public access to digital information by expanding current bibliographic control techniques and creating new metadata practices.24  This, along with the creation of a digital archive, will complement the agencies new models for identifying, cataloging, and providing access to e-documents. The GPO devoted much time to ensuring permanent access by developing technologies to authenticate information and investing in technologies to deliver permanent access to e-information harvested by the GPO.  In agreement with Shuler, Barnum predicts that electronic information makes depositories more relevant in the role they play in access, albeit in a new way.

The impacts on public access to government information in the wake of September 11th were far-reaching and the task of identifying items that disappeared approached impossible. A well-known example of this trend is the CD-ROM “Source Area Characteristics of Large Public Surface-Water Supplies in the Conterminous United States: an Information Resource for Source-Water Assessment, 1999” which the GPO ordered destroyed due to security concerns.25  The White House directed agencies to review departmental websites and remove “sensitive but unclassified information.”26  The Department of Education totally restructured their website and removed hundreds of reports for “reevaluation.”27  The Department of Energy and the Defense Technical Information Center deleted “thousands of documents” from their web sites citing that the information posed security threats.28  Many non-library sources began noting increased government secrecy in dissemination of information.  OMB Watch, the Federation of American Scientists, and even the New York Times all commented on the new level of secrecy employed by the government.29 30 As Harnett & Solomon noted, the new question of access became “Why would you need to know that about your government?”31

However, in the face of this disappearing data, Paula Wilson noted many depositories began withdrawing from the FDLP, citing the free availability of internet sources.32  Robin Haun-Mohamed estimated that, in 2003, 66% of depository titles were electronic.33  Public Printer Bruce James predicted that by 2008, only “five percent of government documents will be printed.”34 Some libraries began preservation projects to harvest the data themselves. The University of North Texas, for example, created the CyberCemetery to archive material from defunct agencies.35  Complementing this effort, Heintz suggested that librarians, instead of “fighting” to preserve access to government information should instead form individual partnerships with agencies to promote dissemination of useful information on the internet and direct patrons to the agencies’ websites.36  He also recommended that the GPO streamline the process to submit electronic documents, thus reducing the burden on the agency and making compliance easier.

In spite of attention to the problem, the trend of disappearing information continued. In 2004, Kennedy found that “more and more government information is becoming less and less publicly available.”37  Several projects were implemented to capture the massive amounts of digital data.  For example, the GPO partnered with Stanford University and other Federal depository library pilot partners on the LOCKSS (Lots of Copies Keep Stuff Safe) project.38  This program allows libraries to collect their own copy of a digital document, thus ensuring continued access.  GPO maintains a server of electronic files, which are harvested directly from the agency. In addition, the GPO worked on developing standards and specifications for digitizing material.  The GPO restructured its role and moved from dealing with print items to managing electronic information.

In 2006, Mart noted that, in spite of the government’s claim, information removed from the internet had little to do with national security.39  The biggest effect, she notes, is the “disproportionately high impact on citizens who need information.”40  She points out many sites are removed as a result of “web scrubbing” (i.e., documents being removed without reason).  The subject matter of documents that have disappeared ranged from workplace rights and women’s health to environmental contamination and civil rights issues.  Mart recommends the use of “watcher” organizations to monitor the status of information.  In late 2008, the Cline Center for Democracy released the report “Airbrushing History, American Style,” which reports on key documents either being replaced or altered on the White House Web site.41  The report asserts that “back-dating later documents and using them to replace the originals goes beyond irresponsible stewardship of the public record.  It is rewriting history.”42

Brian Rossman argued that the future of government information is an e-only environment.43 Librarians will need to adapt to a new system of bibliographic control to deal with “complex electronic collections.”44 Ensuring permanent access and the problem of fugitive information will be an ongoing issue for future government documents librarians. Jacobs summarized the current states of issues in a presentation entitled “Citizens in the dark?”45 He argues that there were fundamental changes in the way government interacted with citizens.  Rather than these changes being technology driven, Jacobs contends they are “economic, political, and social[ly]” driven.46 He believes that we need to define the contents of the historical government record, establish who is responsible for the maintenance and preservation of this record, and determine how information will be preserved and accessed. He calls for a “multifaceted approach to preserving the historical record”47 through influencing the creation and identification of partners in the preservation process.

The literature shows a mounting concern in the last 20 years by librarians and other interested individuals that an electronic environment impedes permanent, unaltered access to electronic information.  Is this concern realistic for the average government document?  Are state and local documents disappearing at the same rate?  This paper will examine the permanency of federal, state, and local information and draw some conclusions about electronic access.


In order to test the permanency of government information, the authors conducted a twelve month study of selected electronic government documents at the federal, state, and local level.   The goal was to determine the amount and types of information that had remained permanent or ‘disappeared’ at the end of a twelve month period.   A sample of electronic government documents was selected and checked regularly for availability.  Data about each document was loaded into a spreadsheet and maintained individually by each author.

For federal documents, the authors used the List of Classes48 to select federal agencies and their related Superintendent of Documents (Sudoc) stem.  Within each federal department, a mixture of large and small agencies where chosen.  The number of agencies chosen within each department was determined by the size of the department (i.e., more agencies were chosen to represent larger departments; smaller departments had fewer agencies in the study).  All 15 Executive Branch departments along with the Executive Office of the President  were included in the study as well as 22 independent federal agencies.  Within these 37 larger categories, 100 sub-agencies were selected for study (appendix 1).

The authors divided the 100 agencies equally, with each author assigned 50 agencies.  It was left to the discretion of each author to decide which electronic government documents representing each agency were studied.  It was agreed, however, that there should be a mix of formats which included pdf files, text files, and html documents.  Data collected about each document included the corresponding agency name, the sudoc stem, the title of the publication, its publication date, the URL, the PURL, whether the URL or the PURL changed and what it changed to, and whether or not it was available when checked (appendix 2).

To select state documents, the authors listed the fifty states and divided the list equally.  Again, it was the responsibility of each author to choose an electronic government document to represent each state.  Data collected about each document included the state name, title of the publication, its publication date, the URL, whether the URL changed and what it changed to, and whether or not it was available when checked (appendix 3).

The process of selecting local government documents was similar to that for the state documents.  However, only 25 of the states were selected then divided between the authors. One city from each of the states was chosen and a document from that local government’s website was selected to represent it.  The authors focused on selecting a mixture of small, medium, and large cities.  Data collected about each document included the city and state name, title of the publication, its publication date, the URL, whether the URL changed and what it changed to, and whether or not it was available when checked.

All documents were selected and data input into a spreadsheet by mid-January 2008 (figure 1).  Each author began checking their assigned documents on the last day of January and continued on the last day of each month until December 2008.  The process of checking each document consisted of copying and pasting the URL into a browser and checking to see if the website and/or document loaded.  For federal documents, both the URL and PURL were checked separately.  If the website or document loaded properly, a “yes” was put into the spreadsheet.  If the URL did not work or the document was no longer present at that site, the authors searched the parent site for the document to see if the URL had changed.  If so, that information was input into the spreadsheet as a URL change.  If the document could not be found either on the parent website or through an internet search, a “no” was entered into the spreadsheet for that particular month.  The document was still checked in succeeding months even if it was unavailable the preceding month(s).  If the publication on the document changed, it was considered a different document and therefore not available.  For those federal documents with PURLs, the PURL was also checked each month.  Each month, the authors also used the Government Printing Office’s (GPO) PURL search form49 to check for newly assigned PURLS for the federal documents in the study. 
While the authors attempted to include a mix of pdf, text, and html documents, analysis of the documents selected reveals that for the federal publications, 85% selected were pdf files.  For the state publications, 92% selected were pdf files, and for local publications, 84% were pdf files.

Discussion & Findings

Out of the 100 federal electronic documents sampled, at the end of 2008 the study found three documents had completely disappeared, or 3% of the sample.  Of the three that were removed, two were Department of Defense documents, and the third was from the Farm Credit Administration. Of the 13 Defense Documents surveyed, 15% were gone by the end of the year.The Farm Credit Administration document was replaced with a newer edition. Five changed the URL, one of which disappeared for a time before reappearing under the new URL.

Of the 100 documents surveyed, 22 contained a PURL.  At the end of the year 20 PURLs were still active. One PURL from National Center for Education Statistics and another from the Federal Trade Commission did not link. Interestingly, the original documents were still available on the agency web sites. The GPO has made great strides in improving the stability of online information; none of the items with a PURL disappeared.  However, the magnitude of electronic government information makes the task of archiving an electronic copy of each title virtually impossible.

The 50 state documents that were tracked showed a high rate of removal.  Six states’ publications--Arizona, Kansas, Massachusetts, Montana, Vermont and Wyoming--were completely removed, or 12% of the total sample. Of these, the documents from Arizona and Vermont were replaced by a newer edition.  A link to a federal document concerning the same subject replaced the Kansas document.  The documents from California, New York, and Texas changed the URLs and the Hawaii and Colorado documents disappeared for a month.

Of the 25 local documents surveyed, only one, from Troy, AL disappeared, or 4%.  Three, from Billings, MT, Brigham City, UT, and Altus, OK changed the URL.  Even though only one of the local documents was removed, 12% underwent a URL change.  However, the sample size is probably too small to make accurate assessments, although it does indicate there are problems with access also.

More in-depth examination is needed for particular federal agencies, to determine the types of information most at risk.  This study found that state and local documents displayed serious preservation issues, though more assessment is needed to determine the magnitude of the problem.  Longitudinal studies are needed to determine if the problem is improving or worsening over time and what effect watchdog groups are having.


There has been much attention paid to disappearance of electronic government information.  This study indicates that, despite efforts to preserve access, it still is disappearing at an alarming rate. Our study found that 3% of our random sample was completely gone in a years’ time.  The study also shows that disappearance of information can be an agency problem.  Alarmingly, of the three federal documents that disappeared, two were Department of Defense items.  State and local documents have the least amount of control and stability.  Little attention has been paid in the literature to these topics.  Since such a large portion of state documents were removed, more attention needs to be focused on preserving these items.

Overall trends seem to indicate that older editions often are not being preserved. In addition, frequent URL changes make information harder for users to locate and reference.  Reasons for URL changes seem to be web site restructuring or platform migration, but users will still have trouble locating the original material.  Often, when newer editions of documents appear, the older edition is removed, prohibiting access to the older data. Subject of material did not appear to be a major issue with removal.

Documents from all levels of government lack stability in an electronic environment. This study shows that there needs to be more focus and research on the preservation of state and local information, as well as federal agency assessments of their electronically produced information. Interested parties, including the library community, watchdog groups, and frequent users could better coordinate their efforts to make both the government and citizens aware of this continued problem. Questions that need to be addressed for the future include is there an acceptable level of loss of electronic information?  Loss of electronic government information may never completely stop, but more needs to be done to lessen the rate of disappearance.


