Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Advice on data mining, text mining, SAS Enterprise Miner and Text Miner: Damien Mather
Damien Mather | Senior Lecturer in Marketing
Damien can provide advice on both text and data mining. He has experience using SAS Enterprise Miner and Text Miner, and supports the teaching of data and text mining to postgraduate students in the Departments of Marketing and Information Science.
This gateway lists and reviews sophisticated to the tools used in text analysis, manipulation and visualisation. Hosted at the University of Alberta, it provides a place for Humanities researchers to find tools, contribute their experiences using them, and even share tools they have developed.
A powerful open-source tool to search, visualize, review and tag up to hundreds of thousands of documents in any format.
Text & corpus analysis
Stanford Named Entity Recognizer (NER)
A Java-based Named Entity Recognizer (NER) which labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names.
Textometry, born in France in the 80’s, has developed powerful techniques for the analysis of large body of texts. Following lexicometry and text statistical analysis, it offers tools and methods tested in multiple branches of the humanities and is statistically well founded.
A freeware corpus analysis toolkit for concordancing and text analysis.
Topic models are a type of statistical model used as a text-mining tool to discover the hidden semantic structures ("topics") occurring in a collection of documents. They are useful for analyzing large collections of unlabeled text.
The MALLET topic modeling toolkit (one among a number of MALLET tools) contains efficient, sampling-based implementations of Latent Dirichlet Allocation, Pachinko Allocation, and Hierarchical LDA. A useful tutorial is available - see the Learn More section.
Paper Machines is a plugin for the Zotero bibliographic management software that makes cutting-edge topic-modeling analysis in Computer Science accessible to humanities researchers without requiring extensive computational resources or technical knowledge. It synthesizes several approaches to visualization within a highly accessible user interface.
Developed by a literary scholar, this enables users see the ways that a topic model attempts to classify words and documents through patterns of co-occurring words.
Bookworm is a simple and powerful way to visualize trends in repositories of digitized texts.
Further Tools to Create Beautiful Visualisations
A brief introduction to distant reading
We want your feedback!
This guide continues to evolve, and we really welcome your feedback so we can continue to improve it. Please let us know if you find:
- Incorrect or irrelevant details, tools that don't work, dead links or otherwise unhelpful information
- Helpful details, tools, links or information that you think need to be on the guide, but aren't currently.
We'd also love to hear from you if you want to have your project featured on the guide, or would like to be profiled on the Connect&Collaborate@Otago page. Email Alexander Ritchie, or Antje Lubcke with any comments or suggestions on how we can improve it.
Viva Digital Humanities!