Orange Blog

By: BLAZ, Sep 15, 2016

Data Mining Course in Houston #2

This was already the second installment of Introduction to Data Mining Course at Baylor College of Medicine in Houston, Texas. Just like the last year, the course was packed. About 50 graduate students, post-docs and a few faculty attended, making the course one of the largest elective PhD courses from over a hundred offered at this prestigious medical school. The course was designed for students with little or no experience in data science.


By: PRIMOZGODEC, Aug 25, 2016

Visualizing Gradient Descent

This is a guest blog from the Google Summer of Code project. Gradient Descent was implemented as a part of my Google Summer of Code project and it is available in the Orange3-Educational add-on. It simulates gradient descent for either Logistic or Linear regression, depending on the type of the input data. Gradient descent is iterative approach to optimize model parameters that minimize the cost function. In machine learning, the cost function corresponds to prediction error when the model is used on the training data set.


By: SALVACARRION, Aug 19, 2016

Making recommendations

This is a guest blog from the Google Summer of Code project. Recommender systems are everywhere, we can find them on YouTube, Amazon, Netflix, iTunes,… This is because they are crucial component in a competitive retail services. How can I know what you may like if I have almost no information about you? The answer: taking Collaborative filtering (CF) approaches. Basically, this means to combine all the little knowledge we have about users and/or items in order to build a grid of knowledge with which we make recommendation.


By: PRIMOZGODEC, Aug 16, 2016

Visualization of Classification Probabilities

This is a guest blog from the Google Summer of Code project. Polynomial Classification widget is implemented as a part of my Google Summer of Code project along with other widgets in educational add-on (see my previous blog). It visualizes probabilities for two-class classification (target vs. rest) using color gradient and contour lines, and it can do so for any Orange learner. Here is an example workflow. The data comes from the File widget.


By: PRIMOZGODEC, Aug 12, 2016

Interactive k-Means

This is a guest blog from the Google Summer of Code project. As a part of my Google Summer of Code project I started developing educational widgets and assemble them in an Educational Add-On for Orange. Educational widgets can be used by students to understand how some key data mining algorithms work and by teachers to demonstrate the working of these algorithms. Here I describe an educational widget for interactive k-means clustering, an algorithm that splits the data into clusters by finding cluster centroids such that the distance between data points and their corresponding centroid is minimized.


By: MATEVZKREN, Aug 5, 2016

Rule Induction (Part I - Scripting)

This is a guest blog from the Google Summer of Code project. We’ve all heard the saying, “Rules are meant to be broken.” Regardless of how you might feel about the idea, one thing is certain. Rules must first be learnt. My 2016 Google Summer of Code project revolves around doing just that. I am developing classification rule induction techniques for Orange, and here describing the code currently available in the pull request and that will become part of official distribution in an upcoming release 3.


By: AJDA, Jul 29, 2016

Pythagorean Trees and Forests

Classification Trees are great, but how about when they overgrow even your 27’’ screen? Can we make the tree fit snugly onto the screen and still tell the whole story? Well, yes we can. Pythagorean Tree widget will show you the same information as Classification Tree, but way more concisely. Pythagorean Trees represent nodes with squares whose size is proportionate to the number of covered training instances. Once the data is split into two subsets, the corresponding new squares form a right triangle on top of the parent square.


By: AJDA, Jul 18, 2016

Network Analysis with Orange

Visualizing relations between data instances can tell us a lot about our data. Let’s see how this works in Orange. We have a data set on machine learning and data mining conferences and journals, with the number of shared authors for each publication venue reported. We can estimate similarity between two conferences using the author profile of a conference: two conference would be similar if they attract the same authors. The data set is already 9 years old, but obviously, it’s about the principle.


By: AJDA, Jul 5, 2016

Rehaul of Text Mining Add-On

Google Summer of Code is progressing nicely and some major improvements are already live! Our students have been working hard and today we’re thanking Alexey for his work on Text Mining add-on. Two major tasks before the midterms were to introduce Twitter widget and rehaul Preprocess Text. Twitter widget was designed to be a part of our summer school program and it worked beautifully. We’ve introduced youngsters to the world of data mining through social networks and one of the most exciting things was to see whether we can predict the author from the tweet content.


By: AJDA, Jun 10, 2016

Scripting with Time Variable

It’s always fun to play around with data. And since Orange can, as of a few months ago, read temporal data, we decided to parse some data we had and put it into Orange. TimeVariable is an extended class of continuous variable and it works with properly formated ISO standard datetime (Y-M-D h:m:s). Oftentimes our original data is not in the right format and needs to be edited first, so Orange can read it.