Computer Content Analysis Programs
New Software
AnnoTape (
AnnoTape is the solution for recording, analysing & transcribing audio, video, image and text data for qualitative research, marketing, journalism and broadcast, or archiving.
Crawdad Text Analysis System 1.2 (
"Crawdad 1.2 allows you to automatically create visual concept maps of your texts, browse for keywords, perform full-text searches, compare texts to one another, auto-sort texts into clusters, extract ontological themes, and export key word metrics for secondary analysis."
Profiler Plus (
Profiler Plus, a general purpose text coding software, is a platform for building and using text analysis coding schemes.
Transana (
"A freely available tool designed to facilitate the transcription and qualitative analysis of video data in a research setting."
Tropes (
"Designed for Semantic Classification, Keyword Extraction, Linguistic and Qualitative Analysis, Tropes software is a perfect tool for Information Science, Market Research, Sociological Analysis, Scientific and Medical studies, and more."
The Yoshikoder (
Yoshikoder is a cross-platform multilingual content analysis program developed as part of the Identity Project at Harvard's Center for International Affairs.
Quantitative Programs (list compiled by Paul Skalski)
CATPAC reads text files and produces a variety of outputs ranging from simple diagnostics (e.g., word and alphabetical frequencies) to a summary of the "main ideas" in a text. It uncovers patterns of word usage and produces such outputs as simple word counts, cluster analysis (with icicle plots), and interactive neural cluster analysis. A nifty add-on program called Thought View can generate two and three-dimensional concept maps based on the results of CATPAC analyses (one especially neat feature of Thought View allows users to look at the results through 3-D glasses and experience MDS-style output like never before, in true, movie theater-style, 3-D fashion!).
Computer Programs for Text Analysis (site down)
This is not a single computer program but rather a series of separate programs by Eric Johnson that each perform one or two basic functions, including analyzing appearances of characters in a play (ACTORS program), getting KWIC (CONCORD program), computing the amount of quotation in texts (DIALOG program), and comparing the vocabulary of two texts (IDENT program). The programs seem ideal for literary-type analyses.
Concordance 2.0 (
Concordance is a flexible text analysis program which lets you gain better insight into e-texts and analyse language in depth. You can make concordances, word lists, and indexes from electronic text. Count word frequencies, find phrases, lemmatise, see word collocations, and more. User-definable alphabet, contexts, and references. Works with most languages. With a single click you can turn your results into a Web Concordance ready for publishing on the web. See the original Web Concordances for examples.
Diction 5.0 (
Diction 5.0 contains a series of built-in dictionaries that search text documents for 5 main semantic features (Activity, Optimism, Certainty, Realism and Commonality) and 35 sub-features (including tenacity, blame, ambivalence, motion, and communication). After the userís text is analyzed, Diction compares the results for each of the 40 dictionary categories to a "normal range of scores" determined by running more than 20,000 texts through the program. Users can compare their text to either a general normative profile of all 20,000-plus texts OR to any of 6 specific sub-categories of texts (business, daily life, entertainment, journalism, literature, politics, scholarship) that can be further divided into 36 distinct types (e.g., financial reports, computer chat lines, music lyrics, newspaper editorials, novels and short stories, political debates, social science scholarship). In addition, Diction outputs raw frequencies (in alphabetical order), percentages, and standardized scores; custom dictionaries can be created for additional analyses.
DIMAP stands for DIctionary MAintenance Programs, and its primary purpose is dictionary development. The program includes a variety of tools for lexicon building rooted in computational linguistics and natural language processing (Litkowski, 1992). With DIMAP, users can build, manage, edit, maintain, search and compare custom and established dictionaries. The program also includes a text analysis module called MCCA (the lite version of which is described below).
General Inquirer
This program, created by Phillip J. Stone, "now provides English-language content analysis capabilities using both the "Harvard" and "Lasswell" general-purpose dictionaries [explained on the website] as well as any dictionary categories developed by the user. With today's PC's or Macs, the system, including its disambiguation routines for high-frequency English homographs, usually processes text files on the order of a million words an hour." The General Inquirer "is not packaged to be commercially available," but it is available for academic research, and seminars, workshops, and laboratories have been held on the program in the past at Harvard and University of Essex.
General Inquirer (Internet version) (
The General Inquirer has found new life on the World Wide Web. The online version of the General Inquirer gets our vote for the simplest and quickest way to do a computer text analysisĖsimply visit the Internet General Inquirer site, type or paste some text into a box, click submit, and your text will be analyzed. The Internet General Inquirer codes and classifies text using the Harvard IV-4 dictionary, which assess such features as valence, Osgoodís three semantic dimensions, language reflecting particular institutions, emotion-laden words, cognitive orientation, and more. The program also returns cumulative statistics (e.g., simple frequencies for words appearing in the text) at the end of each analysis.
"The main idea of HAMLET © is to search a text file for words in a given vocabulary list, and to count joint frequencies within any specified context unit, or as collocations within a given span of words.  Individual word frequencies (fi) , joint frequencies (fij) for pairs of words (i,j), both expressed in terms of the chosen unit of context, and the corresponding standardised joint frequencies are displayed in a similarities matrix, which can be submitted to a simple cluster analysis and multi-dimensional scaling.  A further option allows comparison of the results of applying multi- dimensional scaling to matrices of joint frequencies derived from a number of texts, using Procrustean Individual Differences Scaling (PINDIS)."
INTEXT/TextQuest--Text Analysis Software (
INTEXT is a program designed for the analysis of texts in the humanities and the social sciences. It performs text analysis, indexing, concordance, KWIC, KWOC, readability analysis, personality structure analysis, word lists, word sequence, word permutation, stylistics, and more.  TextQuest is the Windows version of INTEXT.  It performs all of the INTEXT analyses, but through an easier-to-use Windows interface.
Lexa (
Designed with linguists in mind, Lexa Corpus Processing Software is a suite of programs for tagging, lemmatization, type/token frequency counts, and several other computer text analysis functions. 
LIWC (Lingustic Inquiry and Word Count software) (
LIWC has a series of 68 built-in dictionaries that search text files and calculate how often the words match each of the 68 pre-set dimensions (dictionaries), which include linguistic dimensions, word categories tapping psychological constructs, and personal concern categories. The program also allows users to create custom dictionaries. The program seems especially useful to psychologists who wish to examine patient narratives.
MCCA Lite (
Though somewhat hampered by quirks such as limited function availability, the lite version of MCCA analyzes text by producing frequencies, alphabetical lists, KWIC, and coding with built-in dictionaries. The built-in dictionaries search for textual dimensions such as activity, slang, and humor expression. The programís window-driven output makes sorting and switching between results easy. MCCA also includes a multiple-person transcript analysis function suitable for examining plays, focus groups, interviews, hearings, TV scripts, other such texts.
MECA (no website)
MECA, which stands for Map Extraction Comparison and Analysis, contains 15 routines for text analysis. Many of these routines are for doing cognitive mapping and focus on both concepts and the relations between them. There are also routines for doing more classic content analyses, including a multi-unit data file output routine that shows the number of times each concept appears in each map.
MonoConc (
As the name suggests, MonoConc primarily produces concordance information. These results can be sorted and displayed in several different user-configurable ways. The program also produces frequency and alphabetical information about the words in a given corpus.
ParaConc (
ParaConc is a bilingual/multilingual concordance program designed to be used for contrastive corpus-based language research. For Macintosh, Windows version announced.
PCAD 2000 (
PCAD 2000 applies the Gottschalk-Gleser content analysis scales (which measure the magnitude of clearly defined and categorized mental or emotional states) to transcriptions of speech samples and other machine-readable texts. In addition to deriving scores on a variety of scales, including anxiety, hostility, social alienation, cognitive impairment, hope, and depression, the program compares scores on each scale to norms for the demographic groups of subjects. It can also explain the significance and clinical implications of scores and place subjects into clinical diagnostic classifications derived from the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV), developed by the American Psychiatric Association.
PROTAN (site down)
PROTAN (for PROTocol ANalyzer) is a computer-aided content analysis system.  It addresses the question of how does the text look like.  To achieve this first task, PROTAN rests on a series of semantic dictionaries that are part of the system.  The second task to which PROTAN is tuned is to answer the question of what the text is talking about. What are the main themes in it?  For more: 
SALT (Systematic Analysis of Language Transcripts) (
This program is designed mainly to help clinicians identify and document specific language problems. It executes a myriad of analyses, including types of utterances (e.g., incomplete, unintelligible, nonverbal), mean length of utterances, number and length of pauses and rate of speaking, and frequencies for sets of word (e.g., negatives, conjunctions, and custom dictionaries). The Salt Reference Database, described online, allows users to compare the results of their SALT analyses to normative language measures collected via a sample of more than 250 children of various ages, economic backgrounds, and abilities in the Madison, Wisconsin area.  
SWIFT Content Analysis Software (site down)
SWIFT stands for Structured Word Identification and Frequency Table, an interactive, keyword-based program for analyzing multiple, short texts. SWIFT is free, operating system DOS.  This free program seems best suited to coding open ended text responses. 
TABARI (Text Analysis By Augmented Replacement Instructions) (
The successor to KEDS, this program is specifically designed for analyzing short news stories, such as those found in wire service reports. It codes international event data (which are essentially codes recording the interactions between actors) using pattern recognition and simple grammatical parsing. The authors have developed a number of dictionaries to help code event data. The WEIS coding scheme, for example, can determine who acts against whom, as in the case of an Iraqi attack against Kuwait. When such an event is reported in a news story, the program can automatically code the aggressor, victim and action, as well as the date of the event. TABARI is currently only available for Macintosh, but a Windows version is in the works. 
TextAnalyst (
TextAnalyst is an intelligent text mining and semantic information search system.  TextAnalyst implements a unique neural network technology for structural processing of texts written in natural language. This technology automates the work with large volumes of textual information and can be applied effectively to perform the following tasks:  creation of knowledge bases expressed in a natural language, as well as creation of hypertext, searchable, and expert systems; AND automated indexing, topic assignment, and abstracting of texts.
The TEXTPACK program, which was originally designed for the analysis of open-ended survey responses, has been broadened in recent years to include features of use to content, literary and linguistic analysts. It now produces word frequencies, alphabetical lists, KWIC and KWOC (KeyWord Out of Context) searches, cross references, word comparisons between two texts, and coding according to user-created dictionaries. This multi-unit data file output can be imported in statistical analysis software. The new Windows version of the program takes full advantage of the Windows user interface.
TextSmart by SPSS Inc. (Program no longer supported)
This software, designed primarily for the analysis of open-ended survey responses, uses cluster analysis and multidimensional scaling techniques to automatically analyze key words and group texts into categories. Thus, it can "code" without the use of a user-created dictionary. TextSmart has a pleasant, easy-to-use Windows interface that allows for quick sorting of words into frequency and alphabetical lists. It also produces colorful, rich-looking graphics like bar charts and two-dimensional MDS plots.
VBPro (Program no longer supported)
Outputs frequency and alphabetical word lists, key words in context (KWIC), and coded strings of word-occurrence data based on user-defined dictionaries. In addition, it includes a multidimensional concept-mapping sub-program called VBMap that measures the degree to which words co-occur in a text or series of texts. Miller, Andsager and Riechert (1998), for example, used the program to compare the press releases sent by 1996 GOP presidential candidates to the coverage the candidates received in the press. The program helped the researchers (a) generate a list of key words appearing in the text and (b) generate a map showing the relative positions of candidates, in both press releases and media coverage, to each other and on key issues in the election (e.g., family values, education). The program runs under DOS and is available for free from the software authorís website.
WordStat v5.0 (
This add-on to the Simstat statistical analysis program includes several exploratory tools, such as cluster analysis and multidimensional scaling, for the analysis of open-ended survey responses and other texts. It also codes based on user-supplied dictionaries and generates word frequency and alphabetical lists, KWIC, multi-unit data file output, and bivariate comparisons between subgroups. The differences between subgroups or numeric variables (e.g., age, date of publication) can be displayed visually in high resolution line and bar charts and through 2-D and 3-D correspondence analysis bi-plots. One particularly noteworthy feature of the program is a dictionary building tool that uses the WordNet lexical database and other dictionaries (in English and five other languages) to help users build a comprehensive categorization system.
Qualitative Programs (list compiled by Matthias Romppel and PS)
Computer software for the support of text interpretation, text management and the extraction of conceptual knowledge from documents (theory building). Application areas include social sciences, economics, educational sciences, criminology, market research, quality management, knowledge acquisition, and theology. The software is available for IBM PCs and compatibles (DOS version).  A 32-bit Windows version (Windows 95, Windows/NT and Windows 3.1) is now available, a free trial version is provided via download.  You may want to join the discussion and support list about ATLAS/ti.
Code-A-Text is a software package that was written to help in the training of psychotherapists. It was originally designed to facilitate the analysis of therapeutic conversations where clinicians, teachers and research workers wanted to understand the ideas and structures underlying the "texts". Recently, Code-A-Text has been applied to other types of "texts", including process (field notes), responses to open ended questionnaires and metaplan analyses. Soon to be launched is a version which will support the coding of audio files. Code-A-Text was written by Dr Alan Cartwright, a psychotherapist who is Director of the Centre for the Study of Psychotherapy, University of Kent at Canterbury, UK.
Computer Assisted Qualitative Data Analysis Software (CAQDAS) Networking Project
Download demo versions of various qualitative analysis packages
The Ethnograph v4.0
Software for qualitative research and data analysis, facilitates the management and analysis of text based data such as transcripts of interviews, focus groups, field notes, diaries, meeting minutes, and other documents. According to the Ethnograph homepage it is the most widely used software for qualitative data analysis since 1985. Over 5,000 copies of v3 are in use worldwide. Runs on IBM and compatible computers with a 286 or later processor, 2MB of hard disk space and a minimum of 2MB RAM required.
Kwalitan 4.0
Kwalitan is a support program for the analysis of qualitative data, such as the protocols of interviews and observations, or existing written material, such as articles from newspapers, annual reports of enterprises, ancient manuscripts, and so on. In fact, Kwalitan is a special purpose database program. The program has been developed in accordance with the narrowly elaborated procedures of the so called grounded theory approach, in which the researcher tries to generate a theoretical framework by means of an interpretative analysis of the qualitative material.
This program assists researchers handling Nonnumerical Unstructured Data by Indexing, Searching and Theorizing.  It, among other things, automates tedious work by "auto coding" text and importing table data.  More information available at:
QDA Miner
"QDA Miner is an easy-to-use qualitative data analysis software package for coding textual data, annotating, retrieving and reviewing coded data and documents. The program can manage complex projects involving large numbers of documents combined with numerical and categorical information. QDA Miner also provides a wide range of exploratory tools to identify patterns in codings and relationships between assigned codes and other numerical or categorical properties."
Software for qualitative analysis by Udo Kuckartz, Berlin; demo version and tutorial are available for download
Other (unclassified) text-based programs (list compiled by PS)
This company focuses on "free-form" database and information management software.  They don't make any content analysis-specific programs, apparently.  For more information, go to:
This seems to be a complicated database programming language or database-type program.  It's definitely not a pure content analysis program, though it may perform useful CA functions.  See the Microsoft FoxPro page for more info:
This seems to be primarily a writing program to check style.   No Web site is currently online.
These are not programs per-se but rather programming languages that can be used to perform text analyses such as word counts, indexes and simple concordences.
Video Analysis Programs (list compiled by PS)
CAMERA 1.5  (
CAMERA is an event-recording system for behavioral observation and registration. Its hardware and software permit easy registration and encoding of complex behavioral interactions from video recordings. The system comprehensively registers and codes behavior from videotaped behavioral records as sequences of distinct behavioral items. Up to 1024 behavioral items can be coded per video frame. CAMERA is especially designed to improve accuracy, reliability, and training standards in behavioral registrations. The system supports several basic analyses of collected data.
EthoVision  (
EthoVision is a fully integrated system for automatic recording of activity, movement and interactions of animals. Combining the latest computer and video technology with powerful image processing and pattern recognition software, EthoVision offers a wide range of video tracking options, powerful analysis of locomotory tracks and automatic behavior recognition. Besides automating the data acquisition process, the Windows-based software provides easy-to-use tools for designing experiments, management of trial information in databases, as well as visualization and analysis of the collected data.
Excalibur Video Analysis Engine (VAE)
The Excalibur VAE is a software engine that analyzes analog and digital video. The VAE is able to "watch" video and detect pre-selected events. Each instance of the engine may contain any of a set of optional event detectors we call "cogs." Each cog "watches" the video for a particular type or class of events. Cogs can be mixed and matched depending upon the need of the application. On detecting a particular event, the engine will report the occurrence of
the event to the controlling program along with other pertinent information about the event.   The cogs currently supplied can detect cuts, fades in/out, shifts/pan/tilt, blank frames, dissolves, salient frames and aspect ratio.  A storyboard "metacog" is also provided, which automatically controls the underlying cogs to generate a storyboard representation of the supplied video
Executive Producer  (
The Executive Producer for Windows™ (WinTEP) is the professional's choice for video logging software. Save editing costs on your next project by starting with a complete,
informative log. Whether you log in the field or office, create comprehensive and easy to use logs, print storyboards and output batch digitizing lists for any non-linear editor.
MacSHAPA is one of a number of software tools now emerging to help with the problems of ESDA (Exploratory Sequential Data Analysis). MacSHAPA does not help with every kind of ESDA--no software tool can--but it does help with certain kinds. MacSHAPA lets you do the following:  Enter or import data into a spreadsheet-like viewing medium; Annotate, manipulate, and visualize data in various ways; Carry out statistical analyses of various kinds; Export data and results to other applications.  MacSHAPA has simple multimedia capabilities: you can control a VCR from the Macintosh, using MacSHAPA to play, pause, stop, rewind, fast forward, jog, and shuttle at different speeds. MacSHAPA lets you capture timecodes from video, insert them into the data, and find a videotape location that corresponds to a timestamp selected in the data.
MoCA Project  (
The aim of the MoCA project is to extract structural and semantic content of videos automatically. Different applications have been implemented and the scope of the project has concentrated on the analysis of movie material such as can be found on TV, in cinemas and in video-on-demand databases. Analysis features developed and used within the MoCA project fall into four different categories: (1) features of single pictures (frames) like brightness, colors, text, (2) features of frame sequences like motion, video cuts, (3) features of the audiotrack like audio cuts, loudness and  (4) combination of features of the three classes to extract e.g. scenes. 
The Observer  (
The Observer is the ultimate system for the collection, analysis, presentation and management of observational data. You can use it to record activities, postures, movements, positions, facial expressions, social interactions or any other aspect of human or animal behavior.
Scene Stealer  (
Scene Stealer is a video scene detection/logging system.   It automates video logging and audio transcription.  Special features include color and monochrome video capture plus phrase cutpoint technolog for closed captioning and subtitling operations.
TACT (Text Analysis Computing Tools) is a text-analysis and retrieval system for MS-DOS that permits inquiries on text databases in European languages. It has been developed by a team of programmers, designers, and textual scholars. It was begun under the IBM-University of Toronto Cooperative in the Humanities during 1986-89.  TACTweb is software developed by John Bradley and Geoffrey Rockwell. TACTweb connects TACT to the World Wide Web, by using WWW forms users get access to some of the interactive services that TACT provides them -- but without requiring them to use TACT itself, or have a copy of your TACT database on their own machine.  There is an interactive workbook that will teach you how to use the TACTweb environment and introduce you to computer assisted text analysis.
Virage VideoLogger 4.0  (
VideoLogger 4.0 uses advanced media analysis technology to "watch, listen to and read" an analog or digital video signal. Looking for changes in visual content, such as pans and zooms, VideoLogger 4.0 generates a storyboard of browsable keyframe images. Simultaneously, it extracts any text in the video signal, such as closed captions.

Back to The Content Analysis Guidebook Online HOME PAGE

Kimberly A. Neuendorf