Widgets
Data
Transform
Visualize
Model
Evaluate
Unsupervised
Spectroscopy
Text Mining
- Corpus
- Import Documents
- Create Corpus
- The Guardian
- NY Times
- Pubmed
- Twitter
- Wikipedia
- Preprocess Text
- Corpus to Network
- Bag of Words
- Document Embedding
- Similarity Hashing
- Sentiment Analysis
- Tweet Profiler
- Topic Modelling
- LDAvis
- Corpus Viewer
- Score Documents
- Word Cloud
- Concordance
- Document Map
- Word Enrichment
- Duplicate Detection
- Word List
- Extract Keywords
- Annotated Corpus Map
- Ontology
- Semantic Viewer
- Collocations
- Statistics
Survival Analysis
Bioinformatics
Single Cell
Image Analytics
Networks
Geo
Educational
Time Series
Associate
Explain
Wikipedia
Fetching data from MediaWiki RESTful web service API.
Inputs
- None
Outputs
- Corpus: A collection of documents from the Wikipedia.
Wikipedia widget is used to retrieve texts from Wikipedia API and it is useful mostly for teaching and demonstration.
- Query parameters:
- Query word list, where each query is listed in a new line.
- Language of the query. English is set by default.
- Number of articles to retrieve per query (range 1-25). Please note that querying is done recursively and that disambiguations are also retrieved, sometimes resulting in a larger number of queries than set on the slider.
- Select which features to include as text features.
- Information on the output.
- Produce a report.
- Run query.
Example
This is a simple example, where we use Wikipedia and retrieve the articles on ‘Slovenia’ and ‘Germany’. Then we simply apply default preprocessing with Preprocess Text and observe the most frequent words in those articles with Word Cloud.
Wikipedia works just like any other corpus widget (NY Times, Twitter) and can be used accordingly.