Introduction to Azure Machine Learning Studio – part 1: overview

Microsoft’s ML Studio democratizes Machine Learning as it’s very easy to use and requires no programming, so that everybody, not just Data Scientists and Machine Learning Engineers can use it.

azure-machine-learning-studio

ML Studio is a web app from Microsoft running on Azure cloud computing platform, that allows via primitive drag-and-drop to train Machine Learning models and then to operationalize them by making them available as web service. The only catch is: it’s not free if free tier is not enough for your needs.

There are generally 2 ways to use Azure Machine Learning studio: free tier available if you access it via https://studio.azureml.net/Home/ and paid tier that requires that you give your credit card to Microsoft to buy Azure subscription (remember: there is no spending limit with Azure so you must be careful about overcharging your credit card, otherwise you may end up with thousands of dollars charged overnight).

Let’s compare these 2 tiers:

type FREE PAID (standard)
price Free $9.99 per seat per month

$1 per studio experimentation hour

+ costs for Web API calls (see below)

Azure subscription Not required Required
(no spending limit, can overcharge your credit card!)
Max number of modules per experiment 100 Unlimited
Max experiment duration 1 hour per experiment Up to 7 days per experiment with a maximum of 24 hours per module
Max storage space 10 GB Unlimited – BYO (bring your own)
Production Web API (so that trained model can be used via web service) No Yes (you need to pay additionally if more than 1000 calls per month)
SLA (service level agreement; commitment to provide service) No (Microsoft doesn’t guarantee it will work) Yes

Note: each box that can be visually dragged and dropped is called “module”. Experiment means that you press “Run” button at the bottom of Azure Machine Learning Studio.

Now let’s start doing the experiment with Azure Machine Learning studio: I will try to use free tier only – it’s available, as mentioned previously, at: https://studio.azureml.net/Home/.

First thing we need is training data set – the input data. It’s usually provided as file in CSV format – comma-separated with column titles in first line, or via web service call from outside.

Then we do data set preparation, then we do model training and then we do model scoring and operationalisation (making it available as web service).

Here is how the Azure Machine Learning Studio looks like:

Generally what you are doing is you are uploading datasets to the cloud and then you are just dragging and dropping modules from left side to the right side and you are connecting models with arrows, and then you are setting properties of modules on the right side.

In free tier you don’t have Azure blob storage, so you just usually upload CSV files directly to Azure ML studio instead of specifying location in Azure cloud storage.

Note that above screenshot I took on iPad but in fact it’s impossible to work with Azure Machine Learning Studio on iPad as mouse and drag and drop events are required – both not supported on iPad. While Android supports mouse, it doesn’t support drag and drop events in web browser with web apps (remember: Azure Machine Learning Studio is a web app not a stand-alone program) so you can’t use Azure ML studio in Android too. Boo!, that’s not nice, as you have to use PC or Mac or Linux/Unix and a web browser with mouse sadly.

At first I thought I will write one big article on this topic, but after some pondering I concluded: it makes no sense to write one big article about Azure Machine Learning Studio, but it’s better to split it into parts. Next part will be: Part 2: data set preparation, where I will discuss how to clean data – remove rows that have no data in some columns, including usage of SQL transformations for that purpose (like: select * from Something where column1 <> ‘NA’, etc.), how to select columns and how to convert text/string values to numeric ones, before proper machine learning process can start. Clearly preparing data is a huge task in itself!

I will also write in next part where to find datasets on the Internet and I will use data set not from existing repositories of data sets for Machine Learning, but from other source – so that I will be able better to demonstrate how to prepare data (if data sets are pre-prepared then you learn nothing).

Stay tuned!

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s