The Federal Web Archiving Working Group

It’s quite overwhelming if you stop to think about the immense volume of Federal Government information available online and how to ensure public access to it. We have seen a shift from print dissemination through the Federal Depository Library Program (FDLP) to agencies posting content directly to their own websites, along with a higher level of complexity of what a “publication” is that comes with web-based publishing. All of these factors have required the U.S. Government Publishing Office (GPO) to reevaluate how we operate and examine what sort of tools and resources are available to help ensure public access to Government information.

One way in which we have increased our efforts to provide access to web content is through website-level harvesting using the Archive-It service. As we started working with this tool and thinking about collection development, many questions arose.

  • How can we maximize the size of our account?
  • What is the most appropriate content to target first?
  • What are other Federal agencies web harvesting?

That last question is one that consistently came up. We knew that it was very important for us to solidify our collaboration and communication with other agencies, so we began this collaboration in the form of the Federal Web Archiving Working Group.

The group originated with the GPO, the Library of Congress, and the National Archives and Records Administration. We all agreed that it was vital for us to improve communication to increase our understanding of each other’s web archiving programs, avoid duplication of effort, and evaluate ways in which this group could help each other. The group had its inaugural meeting, organized by the Library of Congress, in August 2014, and we have continued to meet on a regular basis since then. We are also reaching out to additional agencies to increase participation and awareness. The National Library of Medicine, the U.S. Department of Health and Human Services, and the Smithsonian Institution have since joined.

Through sharing experiences and ideas, we have all enjoyed our meetings and found them to be informative. Discussions and outcomes of meetings have included:

  • Detailed presentations on each agency’s web archiving program as it relates to Federal Government web content
  • Monthly updates from each agency to keep the group informed of new activities
  • Developing a shared wiki to post information on agency programs, outreach, etc.
  • Development of contracts and Requests for Information (RFIs)
  • Processes for ingest and transfer of WARC files
  • Policy topics, such as, “should digital content collections be a part of web archiving?”
  • Technical topics, such as, “how are you archiving issue publications?”
  • Scope topics, such as, “are agencies focusing on their own agency site or including content from other agencies?”

At GPO, we have additional goals in conjunction with the Working Group’s. These focus on ensuring Government information is available in perpetuity; that it is cataloged, findable, and disseminated - goals that are laid out in the National Plan for Access to U.S. Government Information. This is in an effort to continue to support the FDLP community and its users. We want to create effective access to Federal Government information by utilizing our own resources and also pointing to other trusted repositories when we can, if they are already actively preserving their online resources.

At GPO, we have also conducted additional outreach efforts to maximize our accessibility to online resources. A topic of discussion that often comes up in the web archiving community is the archiveability of websites and increasing communication with content creators. We recently had a meeting with the Director of Digital Collections and his team at the United States Holocaust Memorial Museum, whose website we have been archiving since 2012. This has always been a complex site to archive. In the meeting, we sat down and looked at reports together. This helped us to increase our knowledge of the architecture of the website and determine the functionality of certain URLs that were unknown to us. In turn, we gave them an overview of the work that we do and a better understanding of how websites are archived. We look forward to having additional meetings like this with other content creators.

Recently, the National Endowment for the Humanities reached out to us to archive their website. We were pleased that they had heard of our web archiving program and, in turn, they were thankful that we could provide this service for them. All of this helps us to be able to provide continuous access to Federal Government information online that is not currently being disseminated through the FDLP, Keeping America Informed.

FDLP Connection Archive

We have sunsetted the FDLP Connection with the July / August 2018 issue and will not be publishing the Connection anymore. We’ve enjoyed bringing the FDLP Connection to the community over the years! You can still view past issues. View full archive (2011-2018).