Orange Blog
By: BLAZ, Apr 25, 2017
Outliers in Traffic Signs
Say I am given a collection of images of traffic signs, and would like to find which signs stick out. That is, which traffic signs look substantially different from the others. I would assume that the traffic signs are not equally important and that some were designed to be noted before the others. I have assembled a small set of regulatory and warning traffic signs and stored the references to their images in a traffic-signs-w.
By: AJDA, Apr 3, 2017
Image Analytics: Clustering
Data does not always come in a nice tabular form. It can also be a collection of text, audio recordings, video materials or even images. However, computers can only work with numbers, so for any data mining, we need to transform such unstructured data into a vector representation. For retrieving numbers from unstructured data, Orange can use deep network embedders. We have just started to include various embedders in Orange, and for now, they are available for text and images.
By: AJDA, Feb 3, 2017
For When You Want to Transpose a Data Table...
Sometimes, you need something more. Something different. Something, that helps you look at the world from a different perspective. Sometimes, you simply need to transpose your data. Since version 3.3.9, Orange has a Transpose widget that flips your data table around. Columns become rows and rows become columns. This is often useful, if you have, say, biological data. Related: Datasets in Orange Bioinformatics Today we will play around with brown-selected.tab, a data set on gene expression levels for 79 experiments.
By: AJDA, Jan 23, 2017
Preparing Scraped Data
One of the key questions of every data analysis is how to get the data and put it in the right form(at). In this post I’ll show you how to easily get the data from the web and transfer it to a file Orange can read. Related: Creating a new data table in Orange through Python First, we’ll have to do some scripting. We’ll use a couple of Python libraries - urllib.
By: AJDA, Jan 13, 2017
Data Preparation for Machine Learning
We’ve said it numerous times and we’re going to say it again. Data preparation is crucial for any data analysis. If your data is messy, there’s no way you can make sense of it, let alone a computer. Computers are great at handling large, even enormous data sets, speedy computing and recognizing patterns. But they fail miserably if you give them the wrong input. Also some classification methods work better with binary values, other with continuous, so it is important to know how to treat your data properly.
By: AJDA, Dec 12, 2016
Dimensionality Reduction by Manifold Learning
The new Orange release (v. 3.3.9) welcomed a few wonderful additions to its widget family, including Manifold Learning widget. The widget reduces the dimensionality of the high-dimensional data and is thus wonderful in combination with visualization widgets. Manifold Learning widget has a simple interface with powerful features. Manifold Learning widget offers five embedding techniques based on scikit-learn library: t-SNE, MDS, Isomap, Locally Linear Embedding and Spectral Embedding. They each handle the mapping differently and also have a specific set of parameters.
By: AJDA, Nov 30, 2016
Data Mining for Political Scientists
Being a political scientist, I did not even hear about data mining before I’ve joined Biolab. And naturally, as with all good things, data mining started to grow on me. Give me some data, connect a bunch of widgets and see the magic happen! But hold on! There are still many social scientists out there who haven’t yet heard about the wonderful world of data mining, text mining and machine learning.
By: AJDA, Jul 18, 2016
Network Analysis with Orange
Visualizing relations between data instances can tell us a lot about our data. Let’s see how this works in Orange. We have a data set on machine learning and data mining conferences and journals, with the number of shared authors for each publication venue reported. We can estimate similarity between two conferences using the author profile of a conference: two conference would be similar if they attract the same authors. The data set is already 9 years old, but obviously, it’s about the principle.
By: AJDA, Jul 5, 2016
Rehaul of Text Mining Add-On
Google Summer of Code is progressing nicely and some major improvements are already live! Our students have been working hard and today we’re thanking Alexey for his work on Text Mining add-on. Two major tasks before the midterms were to introduce Twitter widget and rehaul Preprocess Text. Twitter widget was designed to be a part of our summer school program and it worked beautifully. We’ve introduced youngsters to the world of data mining through social networks and one of the most exciting things was to see whether we can predict the author from the tweet content.
By: AJDA, Apr 25, 2016
Association Rules in Orange
Orange is welcoming back one of its more exciting add-ons: Associate! Association rules can help the user quickly and simply discover the underlying relationships and connections between data instances. Yeah! The add-on currently has two widgets: one for Association Rules and the other for Frequent Itemsets. With Frequent Itemsets we first check frequency of items and itemsets in our transaction matrix. This tell us which items (products) and itemsets are the most frequent in our data, so it would make a lot of sense focusing on these products.