New CGP Snapshot Files Available on GitHub

  • Last Updated: February 28, 2024
  • Published: February 28, 2024

As of February 2024, GPO’s Library Services and Content Management (LSCM) has posted a new set of all MARC bibliographic records (1,080,961) in the Catalog of U.S. Government Publications (CGP) on GitHub. That is an increase of 58,079 records (6%) since the 2023 snapshot. In combination with the CGP Records Monthly Files collection of the Catalog of U.S. Government Publications (CGP) Records Maintenance Files repository, these files essentially represent the entire CGP. LSCM will post new CGP snapshots annually.

The records are available in UTF-8 in the cataloging-records-all-cgp-utf8 repository and in MARCXML in the cataloging-records-all-cgp-marcxml repository. The total size of the UTF-8 files is 368 MB, and the total size of the MARCXML files is 379 MB. Both repositories have 28 zipped files, each of which holds approximately 40,000 records.

Please submit questions about the files via askGPO in the “Cataloging/Metadata (Policy and Records)” category.