Widgets
Data
Transform
Visualize
Model
Evaluate
Unsupervised
Spectroscopy
Text Mining
- Corpus
- Import Documents
- Create Corpus
- The Guardian
- NY Times
- Pubmed
- Twitter
- Wikipedia
- Preprocess Text
- Corpus to Network
- Bag of Words
- Document Embedding
- Similarity Hashing
- Sentiment Analysis
- Tweet Profiler
- Topic Modelling
- LDAvis
- Corpus Viewer
- Score Documents
- Word Cloud
- Concordance
- Document Map
- Word Enrichment
- Duplicate Detection
- Word List
- Extract Keywords
- Annotated Corpus Map
- Ontology
- Semantic Viewer
- Collocations
- Statistics
Survival Analysis
Bioinformatics
Single Cell
Image Analytics
Networks
Geo
Educational
Time Series
Associate
Explain
Collocations
Compute significant bigrams and trigrams.
Inputs
- Corpus: A collection of documents.
Outputs
- Table: A list of bigrams or trigrams.
Collocations finds frequently co-occurring words in a corpus. It displays bigrams or trigrams by the score.
- Settings: observe bigrams (sets of two co-occurring words) or trigrams (sets of three co-occurring words). Set the frequency threshold (remove n-grams with frequency lower than the threshold).
- Scoring method:
- Pointwise Mutual Information (PMI)
- Chi Square
- Dice
- Fisher
- Jaccard
- Likelihood ratio
- Mi Like
- Phi Square
- Poisson Stirling
- Raw Frequency
- Student’s T
Example
Collocations is mostly intended for data exploration. Here, we show how to observe bigrams that occur more than five times in the corpus. Bigrams are computed using the Pointwise Mutual Information statistics.
We use the grimm-tales-selected data in the Corpus and send the data to Collocations.
References
Manning, Christopher, and Hinrich Schütze. 1999. Collocations. Available at: https://nlp.stanford.edu/fsnlp/promo/colloc.pdf