|Quantitative Text Analysis Programs|
CATPAC reads text files and produces a variety of outputs ranging from simple diagnostics (e.g., word and alphabetical frequencies) to a summary of the "main ideas" in a text. It uncovers patterns of word usage and produces such outputs as simple word counts, cluster analysis (with icicle plots), and interactive neural cluster analysis. A nifty add-on program called Thought View can generate two and three-dimensional concept maps based on the results of CATPAC analyses (one especially neat feature of Thought View allows users to look at the results through 3-D glasses and experience MDS-style output like never before, in true, movie theater-style, 3-D fashion!).
|Computer Programs for Text Analysis
This is not a single computer program but rather a series of separate programs by Eric Johnson that each perform one or two basic functions, including analyzing appearances of characters in a play (ACTORS program), getting KWIC (CONCORD program), computing the amount of quotation in texts (DIALOG program), and comparing the vocabulary of two texts (IDENT program). The programs seem ideal for literary-type analyses.
This program lets you make full concordances to texts of any size, limited only by available disk space and memory. You can also make fast concordances, picking your selection of words from text, and make Web Concordances: turn your concordance into linked HTML files, ready for publishing on theWeb, with a single click. See the original Web Concordances for examples.
Diction 5.0 contains a series of built-in dictionaries that search text documents for 5 main semantic features (Activity, Optimism, Certainty, Realism and Commonality) and 35 sub-features (including tenacity, blame, ambivalence, motion, and communication). After the userís text is analyzed, Diction compares the results for each of the 40 dictionary categories to a "normal range of scores" determined by running more than 20,000 texts through the program. Users can compare their text to either a general normative profile of all 20,000-plus texts OR to any of 6 specific sub-categories of texts (business, daily life, entertainment, journalism, literature, politics, scholarship) that can be further divided into 36 distinct types (e.g., financial reports, computer chat lines, music lyrics, newspaper editorials, novels and short stories, political debates, social science scholarship). In addition, Diction outputs raw frequencies (in alphabetical order), percentages, and standardized scores; custom dictionaries can be created for additional analyses.
DIMAP stands for DIctionary MAintenance Programs, and its primary purpose is dictionary development. The program includes a variety of tools for lexicon building rooted in computational linguistics and natural language processing (Litkowski, 1992). With DIMAP, users can build, manage, edit, maintain, search and compare custom and established dictionaries. The program also includes a text analysis module called MCCA (the lite version of which is described below).
Inquirer (Internet version) (http://www.wjh.harvard.edu/~inquirer/)
This venerable, still widely-used program has found new life on the World Wide Web. The online version of the General Inquirer gets our vote for the simplest and quickest way to do a computer text analysisĖsimply visit the Internet General Inquirer site, type or paste some text into a box, click submit, and your text will be analyzed. The Internet General Inquirer codes and classifies text using the Harvard IV-4 dictionary, which assess such features as valence, Osgoodís three semantic dimensions, language reflecting particular institutions, emotion-laden words, cognitive orientation, and more. The program also returns cumulative statistics (e.g., simple frequencies for words appearing in the text) at the end of each analysis. Though we could not find any information on a software-based version of the Inquirer, creator Phillip J. Stone holds summer seminars on the program at the University of Essex.
"The main idea of HAMLET © is to search a text file for words in a given vocabulary list, and to count joint frequencies within any specified context unit, or as collocations within a given span of words. Individual word frequencies (fi) , joint frequencies (fij) for pairs of words (i,j), both expressed in terms of the chosen unit of context, and the corresponding standardised joint frequencies are displayed in a similarities matrix, which can be submitted to a simple cluster analysis and multi-dimensional scaling. A further option allows comparison of the results of applying multi- dimensional scaling to matrices of joint frequencies derived from a number of texts, using Procrustean Individual Differences Scaling (PINDIS)."
|INTEXT/TextQuest--Text Analysis Software
INTEXT is a program designed for the analysis of texts in the humanities and the social sciences. It performs text analysis, indexing, concordance, KWIC, KWOC, readability analysis, personality structure analysis, word lists, word sequence, word permutation, stylistics, and more. TextQuest is the Windows version of INTEXT. It performs all of the INTEXT analyses, but through an easier-to-use Windows interface.
Designed with linguists in mind, Lexa Corpus Processing Software is a suite of programs for tagging, lemmatization, type/token frequency counts, and several other computer text analysis functions.
(Lingustic Inquiry and Word Count software) (https://www.erlbaum.com/shop/tek9.asp?pg=products&specific=1-56321-208-0)
LIWC has a series of 68 built-in dictionaries that search text files and calculate how often the words match each of the 68 pre-set dimensions (dictionaries), which include linguistic dimensions, word categories tapping psychological constructs, and personal concern categories. The program also allows users to create custom dictionaries. The program seems especially useful to psychologists who wish to examine patient narratives.
Though somewhat hampered by quirks such as limited function availability, the lite version of MCCA analyzes text by producing frequencies, alphabetical lists, KWIC, and coding with built-in dictionaries. The built-in dictionaries search for textual dimensions such as activity, slang, and humor expression. The programís window-driven output makes sorting and switching between results easy. MCCA also includes a multiple-person transcript analysis function suitable for examining plays, focus groups, interviews, hearings, TV scripts, other such texts.
MECA, which stands for Map Extraction Comparison and Analysis, contains 15 routines for text analysis. Many of these routines are for doing cognitive mapping and focus on both concepts and the relations between them. There are also routines for doing more classic content analyses, including a multi-unit data file output routine that shows the number of times each concept appears in each map.
As the name suggests, MonoConc primarily produces concordance information. These results can be sorted and displayed in several different user-configurable ways. The program also produces frequency and alphabetical information about the words in a given corpus.
ParaConc is a bilingual/multilingual concordance program designed to be used for contrastive corpus-based language research. For Macintosh, Windows version announced.
PCAD 2000 applies the Gottschalk-Gleser content analysis scales (which measure the magnitude of clearly defined and categorized mental or emotional states) to transcriptions of speech samples and other machine-readable texts. In addition to deriving scores on a variety of scales, including anxiety, hostility, social alienation, cognitive impairment, hope, and depression, the program compares scores on each scale to norms for the demographic groups of subjects. It can also explain the significance and clinical implications of scores and place subjects into clinical diagnostic classifications derived from the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV), developed by the American Psychiatric Association.
|PROTAN (site down)
PROTAN (for PROTocol ANalyzer) is a computer-aided content analysis system. It addresses the question of how does the text look like. To achieve this first task, PROTAN rests on a series of semantic dictionaries that are part of the system. The second task to which PROTAN is tuned is to answer the question of what the text is talking about. What are the main themes in it? For more: http://www.psp.ucl.ac.be/~upso/protan/PROTANAE.html
|SALT (Systematic Analysis of
Language Transcripts) (http://www.waisman.wisc.edu/salt/index.htm)
This program is designed mainly to help clinicians identify and document specific language problems. It executes a myriad of analyses, including types of utterances (e.g., incomplete, unintelligible, nonverbal), mean length of utterances, number and length of pauses and rate of speaking, and frequencies for sets of word (e.g., negatives, conjunctions, and custom dictionaries). The Salt Reference Database, described online, allows users to compare the results of their SALT analyses to normative language measures collected via a sample of more than 250 children of various ages, economic backgrounds, and abilities in the Madison, Wisconsin area.
|SWIFT Content Analysis Software
SWIFT stands for Structured Word Identification and Frequency Table, an interactive, keyword-based program for analyzing multiple, short texts. SWIFT is free, operating system DOS. This free program seems best suited to coding open ended text responses.
TABARI (Text Analysis By Augmented
Replacement Instructions) (http://www.ku.edu/~keds/software.dir/tabari.html)
The successor to KEDS, this program is specifically designed for analyzing short news stories, such as those found in wire service reports. It codes international event data (which are essentially codes recording the interactions between actors) using pattern recognition and simple grammatical parsing. The authors have developed a number of dictionaries to help code event data. The WEIS coding scheme, for example, can determine who acts against whom, as in the case of an Iraqi attack against Kuwait. When such an event is reported in a news story, the program can automatically code the aggressor, victim and action, as well as the date of the event. TABARI is currently only available for Macintosh, but a Windows version is in the works.
TextAnalyst is an intelligent text mining and semantic information search system. TextAnalyst implements a unique neural network technology for structural processing of texts written in natural language. This technology automates the work with large volumes of textual information and can be applied effectively to perform the following tasks: creation of knowledge bases expressed in a natural language, as well as creation of hypertext, searchable, and expert systems; AND automated indexing, topic assignment, and abstracting of texts.
The TEXTPACK program, which was originally designed for the analysis of open-ended survey responses, has been broadened in recent years to include features of use to content, literary and linguistic analysts. It now produces word frequencies, alphabetical lists, KWIC and KWOC (KeyWord Out of Context) searches, cross references, word comparisons between two texts, and coding according to user-created dictionaries. This multi-unit data file output can be imported in statistical analysis software. The new Windows version of the program takes full advantage of the Windows user interface.
|TextSmart by SPSS Inc.
This software, designed primarily for the analysis of open-ended survey responses, uses cluster analysis and multidimensional scaling techniques to automatically analyze key words and group texts into categories. Thus, it can "code" without the use of a user-created dictionary. TextSmart has a pleasant, easy-to-use Windows interface that allows for quick sorting of words into frequency and alphabetical lists. It also produces colorful, rich-looking graphics like bar charts and two-dimensional MDS plots.
"Designed for Semantic Classification, Keyword Extraction, Linguistic and Qualitative Analysis, Tropes software is a perfect tool for Information Science, Market Research, Sociological Analysis, Scientific and Medical studies, and more.."
Outputs frequency and alphabetical word lists, key words in context (KWIC), and coded strings of word-occurrence data based on user-defined dictionaries. In addition, it includes a multidimensional concept-mapping sub-program called VBMap that measures the degree to which words co-occur in a text or series of texts. Miller, Andsager and Riechert (1998), for example, used the program to compare the press releases sent by 1996 GOP presidential candidates to the coverage the candidates received in the press. The program helped the researchers (a) generate a list of key words appearing in the text and (b) generate a map showing the relative positions of candidates, in both press releases and media coverage, to each other and on key issues in the election (e.g., family values, education). The program runs under DOS and is available for free from the software authorís website.
This add-on to the Simstat statistical analysis program includes several exploratory tools, such as cluster analysis and multidimensional scaling, for the analysis of open-ended survey responses and other texts. It also codes based on user-supplied dictionaries and generates word frequency and alphabetical lists, KWIC, multi-unit data file output, and bivariate comparisons between subgroups. The differences between subgroups or numeric variables (e.g., age, date of publication) can be displayed visually in high resolution line and bar charts and through 2-D and 3-D correspondence analysis bi-plots. One particularly noteworthy feature of the program is a dictionary building tool that uses the WordNet lexical database and other dictionaries (in English and five other languages) to help users build a comprehensive categorization system.
Yoshikoder is a cross-platform multilingual content analysis program developed as part of the Identity Project at Harvard's Center for International Affairs.
Kimberly A. Neuendorf