Answered By: Bobray Bordelon
Last Updated: Dec 05, 2024     Views: 909

First see the Text Mining Guide.  Scraping is often not permitted.

 
  • Newspaper Navigator (Library of Congress - Chronicling America):  (select pre and post WWI coverage with most coverage from 1900-1925).  Also see https://huggingface.co/datasets/dell-research-harvard/AmericanStories. Collection of full article texts extracted from historical U.S. newspaper images. It includes nearly 20 million scans from the public domain Chronicling America collection maintained by the Library of Congress.
  • North American News Text, Complete    Includes: 
    • ​Los Angeles Times & Washington Post May 1994-August 1997
    • New York Times News & Syndicate July 1994-December 1996
    • Reuters News Service (General & Financial) April 1994-December 1996
    • Wall Street Journal (not in General Release)  July 1994-December 1996
  • News API:API service that allows querying online news sources from the past month including major publications such as the New York Times, ABC News, and Al Jazeera. Register for a free API key to get started.
  • NY Times APIs. The Article Search API provides access to headlines, abstracts, lead paragraphs and more (but NOT full-text articles) from the New York Times, 1851+.
  • Integrum World Wide. Digital archive of the most influential sources of information of Russia as well as a range of analytical services for mass media and social networks monitoring.   
  • Newswire. Contains 2.7 million unique public domain U.S. news wire articles, written between 1878 and 1977. Locations in these articles are georeferenced, topics are tagged using customized neural topic classification, named entities are recognized, and individuals are disambiguated to Wikipedia using a novel entity disambiguation model.
  • Stanford Cable TV News Analyzer. Includes near 24-7 recordings of CNN, Fox News, and MSNBC January 1, 2010+. The dataset updates daily, with approximately a 24-36 hour lag from the original content's air date. In total, the dataset consists of over 370,000 hours of video and includes both TV news programming and commercial segments.

Contact Us

Chat with a Librarian

 

Text a Librarian

Text (609) 277-3245 to get live help on your mobile phone (available the same hours as the Chat service)


Email a Librarian

You can email your research questions to refdesk@princeton.edu or you can request an individual appointment with a subject specialist.


Call a Librarian

Call (609) 258-5964 to speak to a reference librarian during most open hours of the Libraries.