Introduction to Computer-Aided Content Analysis (CATA)

Introduction to Computer-aided Text Analysis (CATA):

Computer coding involves the automated tabulation of variables for target content that has been prepared for the computer. Typically, computer coding means having software analyze a set of text, counting key words, phrases, or other text-only markers (Content Analysis Guidebook). Computer coding relies on dictionaries, which are lists of words or phrases that the text-analysis programs use to analyze content. Researchers may need to develop their own dictionaries, however, there are some dictionaries available that have been used in previous research and may be used in future research.

Quick Navigation

Evaluation Section for Computer Text Analysis Programs

Text Analysis Sites

CATPAC II – CATPAC II reads text files and produces a variety of outputs ranging from simple diagnostics (e.g., word and alphabetical frequencies) to a summary of the “main ideas” in a text. It uncovers patterns of word usage and produces such outputs as simple word counts, cluster analysis (with icicle plots), and interactive neural cluster analysis. A nifty add-on program called Thought View can generate two and three-dimensional concept maps based on the results of CATPAC analyses (one especially neat feature of Thought View allows users to look at the results through 3-D glasses and experience MDS-style output like never before, in true, movie theater-style, 3-D fashion!).

Concordance 3.3 – This program lets you make full concordances to texts of any size, limited only by available disk space and memory. You can also make fast concordances, picking your selection of words from text, and make Web Concordances: turn your concordance into linked HTML files, ready for publishing on the Web, with a single click.

Diction 7 – Listed in the Content Analysis Guidebook, this program is no longer available.

General Inquirer – The online version of the General Inquirer gets our vote for the simplest and quickest way to do a computer text analysis–simply visit the Internet General Inquirer site, type or paste some text into a box, click submit, and your text will be analyzed. The Internet General Inquirer codes and classifies text using the Harvard IV-4 dictionary, which assess such features as valence, Osgood’s three semantic dimensions, language reflecting particular institutions, emotion-laden words, cognitive orientation, and more. The program also returns cumulative statistics (e.g., simple frequencies for words appearing in the text) at the end of each analysis. Though we could not find any information on a software-based version of the Inquirer, creator Phillip J. Stone holds summer seminars on the program at the University of Essex.

Hamlet II 3.0 – “The main idea of HAMLET © is to search a text file for words in a given vocabulary list, and to count joint frequencies within any specified context unit, or as collocations within a given span of words. Individual word frequencies (fi) , joint frequencies (fij) for pairs of words (i,j), both expressed in terms of the chosen unit of context, and the corresponding standardised joint frequencies are displayed in a similarities matrix, which can be submitted to a simple cluster analysis and multi-dimensional scaling. A further option allows comparison of the results of applying multi- dimensional scaling to matrices of joint frequencies derived from a number of texts, using Procrustean Individual Differences Scaling (PINDIS).”

LIWC-22 – “Linguistic Inquiry and Word Count (LIWC) is the gold standard in software for analyzing word use. It can be used to study a single individual, groups of people over time, or all of social media.” The program also allows users to create custom dictionaries. The program seems especially useful to psychologists who wish to examine patient narratives.

MCCALite for Windows – Listed in the Content Analysis Guidebook, this program is no longer available.

PCAD 2000 – PCAD 2000 applies the Gottschalk-Gleser content analysis scales (which measure the magnitude of clearly defined and categorized mental or emotional states) to transcriptions of speech samples and other machine-readable texts. In addition to deriving scores on a variety of scales, including anxiety, hostility, social alienation, cognitive impairment, hope, and depression, the program compares scores on each scale to norms for the demographic groups of subjects. It can also explain the significance and clinical implications of scores and place subjects into clinical diagnostic classifications derived from the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV), developed by the American Psychiatric Association.

PolyAnalyst – PolyAnalyst offers data mining and text mining capabilities. PolyAnalyst derives actionable knowledge from large volumes of text and structured data, delivers custom reports and predictive models. Covering the complete data analysis cycle from data loading and integration to modeling and reporting, PolyAnalyst offers a comprehensive selection of algorithms for automated analysis of text and structured data. The system enables users to perform numerous knowledge discovery operations: Categorization, clustering, prediction, link analysis, keyword and entity extraction, pattern discovery, and anomaly detection.

Profiler Plus – Description to come.

SALT (Systematic Analysis of Language Transcripts) – This program is designed mainly to help clinicians identify and document specific language problems. It executes a myriad of analyses, including types of utterances (e.g., incomplete, unintelligible, nonverbal), mean length of utterances, number and length of pauses and rate of speaking, and frequencies for sets of word (e.g., negatives, conjunctions, and custom dictionaries). The Salt Reference Database, described online, allows users to compare the results of their SALT analyses to normative language measures collected via a sample of more than 250 children of various ages, economic backgrounds, and abilities in the Madison, Wisconsin area.

SentiStrength – Description to come.

Social Science Automation – Social Science Automation, Inc. provides three programs for automated text analysis products and services. Offerings include solutions for media analysis, campaign and election media evaluation, athlete achievement, profiling, and forensic psycholinguistics.

TABARI (Text Analysis By Augmented Replacement Instructions) – The successor to KEDS, this program is specifically designed for analyzing short news stories, such as those found in wire service reports. It codes international event data (which are essentially codes recording the interactions between actors) using pattern recognition and simple grammatical parsing. The authors have developed a number of dictionaries to help code event data. The WEIS coding scheme, for example, can determine who acts against whom, as in the case of an Iraqi attack against Kuwait. When such an event is reported in a news story, the program can automatically code the aggressor, victim and action, as well as the date of the event.

TextAnalyst – See PolyAnalyst, above.

Text Analytics for Surveys 4.0.1 (IBM SPSS) – Description to come.

TextPack – Description to come.

TextQuest – TextQuest performs text analysis, indexing, concordance, KWIC, KWOC, readability analysis, personality structure analysis, word lists, word sequence, word permutation, stylistics, and more. It performs all of the INTEXT analyses, but through an easier-to-use Windows or Mac interface.

T-LAB Pro – Description to come.

WordSmith – Description to come.

WordStat – This add-on to the Simstat statistical analysis program includes several exploratory tools, such as cluster analysis and multidimensional scaling, for the analysis of open-ended survey responses and other texts. It also codes based on user-supplied dictionaries and generates word frequency and alphabetical lists, KWIC, multi-unit data file output, and bivariate comparisons between subgroups. The differences between subgroups or numeric variables (e.g., age, date of publication) can be displayed visually in high resolution line and bar charts and through 2-D and 3-D correspondence analysis bi-plots. One particularly noteworthy feature of the program is a dictionary building tool that uses the WordNet lexical database and other dictionaries (in English and five other languages) to help users build a comprehensive categorization system.

Yoshikoder – “The Yoshikoder is a cross-platform multilingual content analysis program developed as part of the Identity Project at Harvard’s Weatherhead Center for International Affairs. You can load documents, construct and apply content analysis dictionaries, examine keywords-in-context, and perform basic content analyses, in any language. The Yoshikoder works with text documents, whether in plain ASCII, Unicode (e.g. UTF-8), or national encodings (e.g. Big5 Chinese.) You can construct, view, and save keywords-in-context. You can write content analysis dictionaries. Yoshikoder provides summaries of documents, either as word frequency tables or according to a content analysis dictionary. You can also apply a dictionary analysis to the results of a concordance, which provides a flexible way to study local word contexts. Yoshikoder’s native file format is XML, so dictionaries and keyword-in-context files are non-proprietary and human readable.”

Samples Analyses and Output

Sample Dictionaries (Custom and Internal/Standard)

Links to Audiovisual Software Supporting Content Analysis Tasks

Facereader – FaceReader is the world’s first tool that is capable of automatically analyzing facial expressions, providing users with an objective assessment of a person’s emotion.

MoCA Project – The aim of the MoCA project is to extract structural and semantic content of videos automatically. Different applications have been implemented and the scope of the project has concentrated on the analysis of movie material such as can be found on TV, in cinemas and in video-on-demand databases. Analysis features developed and used within the MoCA project fall into four different categories: (1) features of single pictures (frames) like brightness, colors, text, (2) features of frame sequences like motion, video cuts, (3) features of the audio track like audio cuts, loudness and (4) combination of features of the three classes to extract e.g. scenes.

The Observer XT – The Observer XT is a program for behavioral coding and analysis. It allows researchers to gather rich and meaningful data, record time automatically and accurately, integrate video and physiology in behavioral studies, calculate statistics, assess reliability, and create transition matrices.

Transana – Transana is software for researchers who want to analyze digital video or audio data. Transana lets you analyze and manage your data- transcribe it, identify analytically interesting clips, assign keywords to clips, arrange and rearrange clips, create complex collections of interrelated clips, explore relationships between applied keywords, and share your analysis with colleagues. Transana is free and open-source.

Links to Qualitative Computer Analysis Sites

ATLAS.ti – Computer software for the support of text interpretation, text management and the extraction of conceptual knowledge from documents (theory building). Also has the capability to handle video sequences, recorded interviews, photos, maps, music, movies, the nightly news, videocasts and podcasts. Application areas include social sciences, economics, educational sciences, criminology, market research, quality management, knowledge acquisition, and theology.

Computer Assisted Qualitative Data Analysis Software (CAQDAS) Networking Project – CAQDAS provides practical support, training and information in the use of a range of software programs designed to assist qualitative data analysis. They also provide platforms for debate concerning the methodological and epistemological issues arising from the use of such software packages and conduct research into methodological applications. Download demo versions of various qualitative analysis packages through this organization.

The Ethnograph v6.0 – Software for qualitative research and data analysis, facilitates the management and analysis of text based data such as transcripts of interviews, focus groups, field notes, diaries, meeting minutes, and other documents. According to the Ethnograph homepage it is the most widely used software for qualitative data analysis since 1985.

Kwalitan – Kwalitan is a support program for the analysis of qualitative data, such as the protocols of interviews and observations, or existing written material, such as articles from newspapers, annual reports of enterprises, ancient manuscripts, and so on. In fact, Kwalitan is a special purpose database program. The program has been developed in accordance with the narrowly elaborated procedures of the so called grounded theory approach, in which the researcher tries to generate a theoretical framework by means of an interpretative analysis of the qualitative material.

MAXQDA – MAXQDA supports all individuals performing qualitative data analysis and helps to systematically evaluate and interpret texts. It is also a powerful tool for developing theories and testing the theoretical conclusions of the analysis. It is used in a wide range of academic and non-academic disciplines, such as in Sociology, Political Science, Psychology, Public Health, Anthropology, Education, Marketing, Economics and Urban Planning.

QDA Miner – QDA Miner is an easy-to-use qualitative data analysis software package for coding textual data, annotating, retrieving and reviewing coded data and documents. The program can manage complex projects involving large numbers of documents combined with numerical and categorical information. QDA Miner also provides a wide range of exploratory tools to identify patterns in codings and relationships between assigned codes and other numerical or categorical properties.

NVivo 15 – NVivo 15 removes many of the manual tasks associated with the analysis of audio, video, pictures or documents (classifying, sorting and arranging information), so researchers can explore trends, build and test theories and ultimately arrive at answers to questions.

Tropes – Designed for Semantic Classification, Keyword Extraction, Linguistic and Qualitative Analysis, Tropes software is a tool for content analysis research in the Information Science, Market Research, Sociological Analysis, Scientific and Medical studies fields.