Objectives:
In this tutorial we will take a look at some fundamental skills that will get you started on your journey with text mining. To kick off, we will learn how to tell the computer what to search for. We will start out with simple search operations and explore their limitations. After that, we will look at more complex search operations. We will also introduce the first data wrangling steps for example converting text data into other formats for further processing.
Topics:
First we will look at how a computer searches in a text. We then cover relatively simple searching in documents and identify the limits of that. This is followed by an explanation of more complex search operations, which are followed by practical exercises. We end with reasoning about complex search and replace operations (and potentially a brief explanation of the theory behind this).
Target audience:
This tutorial is for people who want to improve their search skill. It is accessible to humanities and social sciences students and researchers with no prior exposure to programming. We will not be covering any advanced text mining strategies or tools. Skills learned will be applicable in other aspects of research such as well, e.g. literature reviews.
Format:
This tutorial will be a three hour lecture (discussing background and theory) and hands-on, practical work.
Required resources:
Participants should bring a laptop they can work on. No special software is required, we will be using a webbrowser and a PDF viewer. We will be using internet access, which will be provided by the conference organizers.
As a professor in Digital Humanities, Menno is particularly interested in incorporating the use of computational techniques in the field of Humanities. His PhD in the area of computer science dealt with building systems that learn (linguistic) grammars from plain sequences (sentences). These empirical grammatical inference systems result in patterns that can be used for further analysis of the data, for instance, in applied machine learning, computational linguistics, or computational musicology. During his MA (computational linguistics) and MSc (computer science) studies, Menno used techniques from the one field and applied it to situations in the other, such as proofing tools and error correction, machine translation, and multi-modal information retrieval.