The objective of this day is to make you familiar with basic assumptions and techniques of text analysis as it is done in “text mining”, an area in Computer Science. Text mining is about finding patterns in and models of text corpora. It overlaps with many other areas such as corpus linguistics, and thus is being used a lot in the Digital Humanities already. We will focus on techniques and tools (rather than on an overview of research in the area), with the aim of giving you hands-on experience on a number of techniques for exploring the contents, the sentiment, and (sentential) argument structure in texts. As a third, equally important part we will investigate some problems that we have encountered during the first parts: Can we, and may we, “just take the data as they are and expect insights from them”? What problems arise? How can we engage critically and ethically with data (with texts being just one, albeit a very important type of data)? I will strive for generality, but, where applicable, focus on online textual data, in particular blogs and microblogs as well as news.
Further information including preparatory reading and tools can be found here.