Orange Blog
By: BLAZ, Aug 11, 2017
It's Sailing Time (Again)
Every fall I teach a course on Introduction to Data Mining. And while the course is really on statistical learning and its applications, I also venture into classification trees. For several reasons. First, I can introduce information gain and with it feature scoring and ranking. Second, classification trees are one of the first machine learning approaches co-invented by engineers (Ross Quinlan) and statisticians (Leo Breiman, Jerome Friedman, Charles J. Stone, Richard A.
By: BLAZ, Dec 22, 2016
The Beauty of Random Forest
It is the time of the year when we adore Christmas trees. But these are not the only trees we, at Orange team, think about. In fact, through almost life-long professional deformation of being a data scientist, when I think about trees I would often think about classification and regression trees. And they can be beautiful as well. Not only for their elegance in explaining the hidden patterns, but aesthetically, when rendered in Orange.
By: AJDA, Jul 29, 2016
Pythagorean Trees and Forests
Classification Trees are great, but how about when they overgrow even your 27’’ screen? Can we make the tree fit snugly onto the screen and still tell the whole story? Well, yes we can. Pythagorean Tree widget will show you the same information as Classification Tree, but way more concisely. Pythagorean Trees represent nodes with squares whose size is proportionate to the number of covered training instances. Once the data is split into two subsets, the corresponding new squares form a right triangle on top of the parent square.
By: AJDA, Aug 14, 2015
Classifying instances with Orange in Python
Last week we showed you how to create your own data table in Python shell. Now we’re going to take you a step further and show you how to easily classify data with Orange. First we’re going to create a new data table with 10 fruits as our instances. import Orange from Orange.data import * color = DiscreteVariable("color", values=["orange", "green", "yellow"])calories = ContinuousVariable("calories") fiber = ContinuousVariable("fiber") fruit = DiscreteVariable("fruit", values=["orange", "apple", "peach"]) domain = Domain([color, calories, fiber], class_vars=fruit) data=Table(domain, [</span> ["green", 4, 1.
By: BIOLAB, Feb 5, 2012
Random decisions behind your back
When Orange builds a decision tree, candidate attributes are evaluated and the best candidate is chosen. But what if two or more share the first place? Most machine learning systems don’t care about it and always take the first, which is unfair and, besides, has strange effects: the induced model and, consequentially, its accuracy depends upon the order of attributes. Which shouldn’t be. This is not an isolated problem. Another instance is when a classifier has to choose between two equally probable classes when there is no additional information (such as classification costs) to help make the prediction.
By: BIOLAB, Aug 24, 2011
Faster classification and regression trees
SimpleTreeLearner is an implementation of classification and regression trees that sacrifices flexibility for speed. A benchmark on 42 different datasets reveals that SimpleTreeLearner is 11 times faster than the original TreeLearner. The motivation behind developing a new tree induction algorithm from scratch was to speed up the construction of random forests, but you can also use it as a standalone learner. SimpleTreeLearner uses gain ratio for classification and MSE for regression and can handle unknown values.