It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Sharing large corpora of data can also lead to the creation of new ways of exploring and discovery scholarship – effectively giving researchers another lens through which to view the published literature.
HathiTrust makes the texts of public domain works in its corpus available for research purposes. The data includes approximately 350,000 public domain volumes. The works fall into two categories: non-Google-digitized volumes, which are freely available, and Google-digitized volumes, which are available through an agreement with Google. HathiTrust can also assist in the creation of custom datasets.
The Data for Research (DfR) service is provided by JSTOR for use by the research community. It provides a set of web-based tools for selecting and interacting with content from the JSTOR archive, an archives that spans hundreds of years and billions of words of text.
The DfR service is completely free and available to the public.
With the standard Google Books interface, you can only do the most basic type of search -- just the simple frequency of a word or phrase. This interface helps unlock the potential of the Google Books data. It offers the same corpora as available in N-Grams.
This website allows you to quickly and easily search more than 100 million words of text of American English from 1923 to the present, as found in TIME magazine. You can see how words, phrases and grammatical constructions have increased or decreased in frequency and see how words have changed meaning over time.
When you enter phrases into the Google Books Ngram Viewer, it displays a graph showing how those phrases have occurred in a corpus of books (e.g., "British English", "English Fiction", "French") over the selected years.
This interface helps unlock the potential of the Google Books data. Search by word, phrase, substring, lemma, part of speech, synonyms, and collocates (nearby words). You can copy the data to other applications for further analysis, and you can quickly and easily compare the data in two different sections of the corpus.