A guide to text analysis with latent semantic analysis in r with annotated code: Studying online reviews and the stack exchange community

David Gefen, James E. Endicott, Jorge E. Fresneda, Jacob Miller, Kai R. Larsen

Research output: Contribution to journalReview articlepeer-review

42 Scopus citations

Abstract

In this guide, we introduce researchers in the behavioral sciences in general and MIS in particular to text analysis as done with latent semantic analysis (LSA). The guide contains hands-on annotated code samples in R that walk the reader through a typical process of acquiring relevant texts, creating a semantic space out of them, and then projecting words, phrase, or documents onto that semantic space to calculate their lexical similarities. R is an open source, popular programming language with extensive statistical libraries. We introduce LSA as a concept, discuss the process of preparing the data, and note its potential and limitations. We demonstrate this process through a sequence of annotated code examples: we start with a study of online reviews that extracts lexical insight about trust. That R code applies singular value decomposition (SVD). The guide next demonstrates a realistically large data analysis of Stack Exchange, a popular Q&A site for programmers. That R code applies an alternative sparse SVD method. All the code and data are available on github.com.

Original languageEnglish (US)
Article number21
Pages (from-to)450-496
Number of pages47
JournalCommunications of the Association for Information Systems
Volume41
Issue number1
DOIs
StatePublished - Nov 2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Information Systems

Keywords

  • IS research methods
  • Latent semantic analysis (LSA)
  • Measurement
  • Metrics
  • SVD
  • Sparse SVD
  • Text analysis

Fingerprint

Dive into the research topics of 'A guide to text analysis with latent semantic analysis in r with annotated code: Studying online reviews and the stack exchange community'. Together they form a unique fingerprint.

Cite this