Introduction to Machine Learning Development and Deployment in Bluemix

Introduction to Machine Learning

Machine learning is a concept that can only be applied to machines with artificial intelligence. These machines are often tasked with diagnosis, prediction, and recognition, and can learn from the data provided to them. This data, called “training data,” can be either sample data or historical data that helps teach the machine how to respond. The machine then produces a prediction based on the data with which it was trained.

Machine learning is used in numerous applications in banking and finance, retail, healthcare, auto insurance, and many other industries. Google and Amazon use it to display advertisements based on users’ past behavior.

Implement Your First Machine Learning Model

In this blog post, you will learn how to build a small machine learning web app, deploy it in Bluemix, and host it in Cloud Foundry.

To start machine learning programing and deployment in Bluemix, you need the following software:

  1. Python: A popular and powerful general-purpose programming language that recently emerged as the preferred language among data scientists.
  2. Jupyter Notebook: An open-source web application that allows you to create and develop machine learning code, equations, and data visualization.
  3. Flask: A lightweight web application framework used to develop applications and manage HTTP requests and rendering.
  4. Cloud Foundry: An open-source cloud computing platform used for deploying applications on Bluemix.

Case Study

Here is an example of how to use a machine learning (sklearn-learn) algorithm to implement linear regression for the relationship between the fuel consumption and carbon dioxide emission of cars.

Then we split our data into training and test sets, create a model using the training set, evaluate that model using the test set, and use the model to predict the unknown value.

Below are the steps in generating the machine learning Model using the Python programing language.

Step 1: Install Python

1. Download the latest version of Python from https://www.python.org/downloads/

2. Go to Control Panel –> System Properties –> Environment Variables and select the Path variable from the list.

3. Click Edit.

4. To check whether Python has been installed successfully, try the following command:

$ python --version

Python 3.7.0

Step 2: Install Jupyter Notebook with pip

As an experienced Python user, you may wish to install Jupyter using Python’s package manager, pip, instead of Anaconda.

If you have Python 3 installed (which is recommended):

python3 -m pip install --upgrade pip

python3 -m pip install jupyter

Congratulations! You have installed Jupyter Notebook. To run the notebook, run the following command at the terminal (Mac/Linux) or command prompt (Windows):

$ jupyter notebook

Step 3: Writing Python code

To start your Python code, use the Jupyter Notebook editor to write the following code:

1.    Read the dataset that is related to fuel consumption and carbon dioxide emission from FuelConsumptionCo2.csv using this command:

df = pd.read_csv("FuelConsumptionCo2.csv")

df.head()

Here is the sample dataset:

2.    Then we split our data into training sets (80% of the total dataset) and test sets (20% of the total dataset).

This involves splitting the dataset into mutually exclusive training and testing sets. Once this is done, you train with the training set and test with the testing set.

This provides a more accurate evaluation of out-of-sample accuracy because the testing dataset is not part of the dataset that have been used to train the data. It is more realistic for real-world problems.

To split the data into training and test sets (with 20% held out for testing), use this command:

 (train_x, train_y) = data.randomSplit([0.8, 0.2], 24)

3.    Create a model using the training set with these commands:

from sklearn import linear_model

regr = linear_model.LinearRegression()

train_x = np.asanyarray(train[['ENGINESIZE']])

train_y = np.asanyarray(train[['CO2EMISSIONS']])

regr.fit (train_x, train_y)

4.    Evaluate the model using the test set with this command:

test_y_ = regr.predict(test_x)

5.    Finally, use the model to predict the unknown values for the following:

  • Model Year (e.g. 2014)
  • Make (e.g. Acura)
  • Model (e.g. ILX)
  • Vehicle Class (e.g. SUV)
  • Engine Size (e.g. 4.7)
  • Cylinders (e.g 6)
  • Transmission (e.g. A6)
  • Fuel Consumption (City) (L/100 km) (e.g. 9.9)
  • Fuel Consumption (Hwy) (L/100 km) (e.g. 8.9)
  • Fuel Consumption (Comb.) (L/100 km) (e.g. 9.2)

6.    The model will predict the value for CO2 emissions.

  • CO2 Emissions (g/km) (e.g. 182)

7.    A complete Python code for the previous case study:

# To download the data, you can download it from IBM Object Storage https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/Cognit...

import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
get_ipython().run_line_magic('matplotlib', 'inline')

## Reading the data in
df = pd.read_csv("FuelConsumptionCo2.csv")

# Take a look at the dataset
df.head()

# Summarize the data
df.describe()

# Let’s select some features to explore more.
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)

# We can plot each of these features:
viz = cdf[['CYLINDERS','ENGINESIZE','CO2EMISSIONS','FUELCONSUMPTION_COMB']]
viz.hist()
plt.show()

# Now, let’s plot each of these features vs the emission, to see the linearity of their relationship:
plt.scatter(cdf.FUELCONSUMPTION_COMB, cdf.CO2EMISSIONS,  color='blue')
plt.xlabel("FUELCONSUMPTION_COMB")
plt.ylabel("Emission")
plt.show()

# #### Creating train and test datasets
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]

# ### Simple regression model
from sklearn import linear_model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (train_x, train_y)

# The coefficients
print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)
from sklearn.metrics import r2_score
test_x = np.asanyarray(test[['ENGINESIZE']])
test_y = np.asanyarray(test[['CO2EMISSIONS']])
test_y_ = regr.predict(test_x)
print("Mean absolute error: %.2f" % np.mean(np.absolute(test_y_ - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((test_y_ - test_y) ** 2))
print("R2-score: %.2f" % r2_score(test_y_ , test_y) )
 

Step 4: Creating Web APIs with Python and Flask

1.    Install Flask in your virtaulenv using pip.

$ pip install Flask

2.    Check the installed version of Flask.

$ Flask –version
Flask 1.0.3

3.    Open a text editor (such as Notepad++) and enter the following sample code:

import flask

app = flask.Flask(__name__)

app.config["DEBUG"] = True
@app.route('/', methods=['GET'])
def home():
    return "<h1>Distant Reading Archive</h1><p>This site is a prototype API for distant reading of science fiction novels.</p>"
app.run()

4.    Save this code as api.py in the api folder you created for this application.

5.    Run the application.

6.    Run the Flask application with this command:

$ python api.py

7.    You should see output similar to this:

$ Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

8.    You may also see some lines related to debugging. This message means that Flask is running your application locally (on your computer) at that address. Follow the link above, http://127.0.0.1:5000/, using your web browser to see the running application:

 

9.    Below is a complete api.py used to deploy the model in Bluemix.

# Dependencies
from flask import Flask, request, jsonify
import os
import numpy as np
import sklearn
from pandas import DataFrame
from sklearn import linear_model
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.externals import joblib
import traceback
import pandas as pd
app = Flask(__name__)
@app.route('/predict', methods=['POST'])
#def index():
def predict():
            json_ = request.json
            query = pd.get_dummies(pd.DataFrame(json_))
            query = query.reindex(columns=model_columns, fill_value=0)
            prediction = classifier.predict(query)
            #LogisticRegression
            confidence = classifier.predict_proba(query)
            return jsonify({'Approved': str(prediction[0]), 'Confidence_in_No': (round((confidence[0][0]*100),3)), 'Confidence_in_Yes': (round((confidence[0][1]*100),3))})
            #return jsonify({'Approved': str(prediction[0]), 'Confidence': [(round((confidence[0][0]*100),3)), (round((confidence[0][1]*100),3))]})
            
if __name__=="__main__":
    classifier = joblib.load("claim_model.pkl") # Load "model.pkl"
    model_columns = joblib.load("model_columns.pkl") # Load "model_columns.pkl"
    port = int(os.getenv('VCAP_APP_PORT', '5000'))
    app.run(host='0.0.0.0', use_reloader=False, port=port, debug=True)
    

10.    Learn more about how to set up a basic Application Programming Interface (API) to make your machine learning application and data more accessible to users.

Deploying Your Web Application on Bluemix

Step 1: Install the Cloud Foundry Command Line Interface

1.     Install the Cloud Foundry command-line interface from https://github.com/cloudfoundry/cli

2.    Scroll down to the Downloads section of the README.md file, and then select and download the installer for your operating system.

3.    Extract and install cf_installer-windows-386.zip.

4.    Open a command line on your operating system and use the cf -v command to verify that cf is working. You should see something similar to this:

$ cf –v
$ cf version 6.2.0-c9d4aaa-2014-06-19T22:04:01+00:00

Step 2: Prepare Your Data Upload

1.    Create a directory for your Python web server. For example, you could use your initials followed by something like <pythonsrv>.

2.    Add the web file api.py file created in Flask.

3.    Create an empty file named requirements.txt. The file should include all the libraries used in your application. You should have something similar to this:

Flask==1.0.3
joblib==0.13.2
matplotlib==3.1.0
numpy==1.16.4
pandas==0.24.2
python-dateutil==2.8.0
scikit-learn==0.21.2

4.    Finally, create a file named runtime.txt containing the Python version you wish to use as runtime.python-3.4.1.

Step 3: Deploy your Application on Bluemix

1.    Open a command line on your operating system and navigate to your server project folder.

2.    Using your IBM Bluemix ID and password, login using the following cf commands:

$ cf api https://api.ng.bluemix.net
$ cf login

3.    Push a basic Python web application to the Bluemix runtime environment using the following command:

$ cf push <plication Name> -m 32M -b https://github.com/cloudfoundry/cf-buildpack-python.git -c "api.py"

Step 4: Verify That Your Application is Running

1.    Navigate to your web browser and locate the IBM Bluemix welcome screen.

2.    On the IBM Bluemix dashboard, assuming you did not get any errors when you pushed the basic Python server, you should see an <Application Name> application running.


 


About the Author

Khaled S. Moawad

IT Architect

Khaled is an IT architect and senior managing consultant in Prolifics’ IBM Cognitive Process Transformation practice. His consulting experience includes a wide range of IBM applications, as well as in machine learning.