PubMed Central Offers a Historical Treasure Trove

tags: libraries, Biomedical journal, PubMed Central

Jeffrey S. Reznick is Chief of the History of Medicine Division of the National Library of Medicine (NLM), National Institutes of Health (NIH); Christy Henshaw is Digital Production Manager at Wellcome Collection, London, UK, and manages Wellcome’s digitization program; Laura Randall is Technical Information Specialist for PubMed Central in the National Center for Biotechnology and Information of the NLM/NIH;Rosalyn Leiderman manages and coordinates the PubMed Central Backfiles Project in the Preservation and Collection Management Section of the NLM/NIH; Kathryn Funk is the program manager for PubMed Central.


Made with

Where can you freely read, download, text mine, and use for your research and teaching the full text of millions of historically-significant biomedical journal articles spanning three centuries, alongside millions more current biomedical journals? Look no further than PubMed Central (PMC) of the National Library of Medicine (NLM), National Institutes of Health. The NLM is the world’s largest biomedical library and one of the twenty-seven institutes and centers which constitute NIH, whose main campus is located in Bethesda, Maryland.


The NLM launched PMC in early 2000 as a digital counterpart to the library’s extensive print journal collection. In 2004, the NLM joined with Wellcome (a London-based biomedical charity, and one of the world’s largest providers of non-governmental funding for scientific research), the Jisc (a UK-based non-profit membership organization, providing digital solutions for UK education and research), and a number of medical journal publishers to agree that medical journals contain valuable information for research, education, and learning. Thus, journal archives should be digitized and freely available to all who would wish to consult them. Two years later, that agreement yielded public access to the full-text of 160 journals spanning nearly two centuries. More recently, the NLM completed a multi-year partnership with Wellcome to expand the historical corpus of PMC with dozens more titles encompassing three centuries and hundreds of thousands of pages. You will find a hyperlinked list of these titles at the end of this article; clicking on each title will take you its associated, digitized run in PMC.


While medical journals have always been invaluable resources, their digitization increases their accessibility and creates new opportunities to realize their research value. PMC makes available the machine-readable full text and metadata of the digitized journal articles, including titles, authors (and their affiliations where present), volume, issue, publication date, pagination, and license information. Such article-level digitization also enables us to link data, that is, to connect individual and associated articles with corresponding catalog records, and sometimes even Digital Object Identifiers (DOIs), to improve discovery and use of the articles by interested researchers.


In writing about one of these newly-available titles—namely The Hospital, a journal published in London from 1886-1921—on the popular NLM History of Medicine Division blog Circulating Now, Dr. Ashley Bowenobserved that “For researchers interested in the administration of British hospitals in the late 19th and early 20th century, [this journal] is a vital resource.” The Hospital“carried the tag-line ‘the modern newspaper of administrative medicine and institutional life,’ [and] published an enormous variety of items of interest to physicians, nurses, hospital administrators, and public health professionals—everything from medical research to notes on fire prevention and institutional kitchen management, reflections on ‘the dignity of medicine,’ opinions about housing policy, and much more.” Inspired by Dallas Liddle’s recent research which “used [digitized] file size as a way to identify the rate of change in Victorian newspapers,” Bowen downloaded and analyzed every article in the entire run of The Hospital—including all the file names and file sizes—to study the changing content, trends, and sheer volume of this important journal over time, to appreciate its metadata created in the process of digitization, and to evaluate this metadata “in addition to…traditional content analysis.” 


Bowen has also used PMC’s historical corpus to research Alfred Binet’s early 20thcentury intelligence tests using The Psychological Clinic and utilized Bristol Medico-Chirurgical Journal and its semi-regular series of articles about “Health Resorts in the West of England and South Wales.” 


Understandably, given the sheer size and scope of the overall PMC corpus, Bowen’s studies only scratch the surface of the archive which currently encompasses nearly 5.5 million full-text articles. Nearly 500,000 of those articles were published in 1950 or earlier and over 1 million articles date from 1951-1999. 


Among the millions of articles you will find alongside those surfaced by Bowen are:

  • Sir Alexander Fleming’s discovery of the use of penicillin to fight bacterial infections, which appeared in the British Journal of Experimental Pathology, 1929
  • Sir Richard Doll’s groundbreaking study that confirmed that smoking was a “major cause” of lung cancer, which appeared in the British Medical Journal, 1954; and 
  • Walter Reed’s paper proving that mosquitoes transmit yellow fever, which appeared in the Journal of Hygiene, 1902.  
  • reports of centralized health and relief agencies in Massachusetts during the 1918 influenza pandemic; 
  • an appeal for justice by Arthur Conan Doyle, related to the infamous case of the Parsi English solicitor George Edalji, which reflected contemporary racial prejudice;
  • a medical case report on America’s 20th president James A. Garfield, following his assassination in 1881; 
  • post-World War II thoughts about the future of the Army Medical Library by its director Frank Rogers; and 
  • a paper by the bacteriologist Ida A. Bengtson, the first woman to work in the Hygienic Laboratory of the U.S. Public Health Service, the forerunner of the National Institutes of Health. 


So, if we haven’t already tempted you to explore PMC for your own research and teaching—and explore its Open Access Subset and Historical OCR Collection, both ideal for text mining—what are you waiting for? Dive in! Encourage your colleagues and students to explore it. Be in touch and let us know what you discover in PMC. We would love to hear from you!


List of the historical journal titles made available freely in PMC 
through the multi-year partnership between Wellcome and the NLM/NIH. 

Clicking on each title will take you to its associated, digitized run in PMC.


Learn more about PMC and the partnerships dedicated to growing its freely-available historical content:

Public-Private Partnerships: Joining Together for a Win-Win,” Jeffrey S. Reznick and Simon Chaplin, The Public Manager, December 9, 2016.

PubMed Central: Visualizing a Historical Treasure Trove,” Tyler Nix, Kathryn Funk, and Erin Zellers, Circulating Now, the blog of the National Library of Medicine’s History of Medicine Division, February 23, 2016.