Train a Text Classification Model With Create ML

Machine learning is great, but it can be hard to implement in mobile applications. This is especially true for people without a data science degree. With Core ML, however, Apple makes it easy to add machine learning to your existing iOS app. Learn how to use the all-new Create ML platform for training lightweight, custom neural networks.

At a Glance

What Is Machine Learning?

Machine learning is the use of statistical analysis to help computers make decisions and predictions based on characteristics found in that data. In other words, it's the act of having a computer parse a stream of data to form an abstract understanding of it (called a "model"), and using that model to compare with newer data.

How Is It Used?

Many of your favorite apps on your phone likely use machine learning. For example, when you're typing a message, autocorrect predicts what you're about to type next using a machine learning model which is constantly updated as you type. Also, virtual assistants such as Siri, Alexa, and Google Assistant are completely reliant on machine learning to mimic human-like behavior.

Getting Started

Let's experiment with machine learning by actually building a model! You'll need to make sure you have Xcode 10 installed, along with macOS Mojave running on your development Mac. In addition, I'll assume you already have experience with Swift, Xcode, and iOS development in general.

Image Recognition

If you haven't already, check out my other tutorial, which takes you step by step through training an image classification model for use in your app. After that tutorial, you'll be able to train a model which can differentiate between different varieties of plants and between breeds of dogs, and which can accurately identify objects. You can take a look here:

Training an Image Classification Model With Create ML

Vardhan Agrawal

17 Oct 2018

1. Dataset and JSON

You may wonder: in a text classification model, what would the dataset be? The answer to that question lies in your objective. For example, if you want to train a model which tells you whether a string of text is spam or not, you'd have a whole bunch of strings which are pre-classified.

Downloading the File

To save us having to manually create the training data for our sentiment analysis model, our friends at Carnegie Melon University have provided us with a wonderful free data set. I have taken the time to convert it into JSON for you to use. (Of course, you can always use your own dataset if you want to make one yourself.)

Go ahead and download the training data JSON file from our GitHub repo. Click download and save the file to your own computer. (Note, we've redacted some offensive language from our version of this dataset. However, if you're training a production machine learning system, you will want to use the entire corpus, including the possibly offensive comments.)

Great! Now that it's on your computer, let's take a closer look at what the file shows you.

Dissecting the JSON

If you're not already familiar with JSON, it's simple. JSON is an acronym for JavaScript Object Notation, and as the name suggests, it is useful for representing objects and their respective properties.

In the file you just downloaded, you'll see that each item has two properties:

label tells you whether the specified sentence or phrase is positive or negative.
text is a string of text which the analysis is to be done on.

Some of the items' labels say Pos, and some say Neg. As you may have already guessed, these stand for Positive and Negative, respectively.

2. Preparing Data

So let's dive in and actually create the model. You might be surprised to hear that most of your work is already done! Now, all we have left to do is write some code and put Swift and Xcode to work to do the magic for us.

Creating a New Playground

While most of us are used to creating actual iOS applications, we'll be headed to the playground this time to create our machine learning models. Interesting, isn't it? If you think about it, it actually makes sense—you don't need all those extra files, but instead you just need a clean slate to tell Swift how to create your model. Go ahead and create a macOS playground to start off.

First, open Xcode.

Then create a new playground.

And give it a useful name.

Importing Frameworks

Of course, before we begin, we'll need to import the appropriate frameworks. If you've seen my previous tutorial about training an image classification model, you'll see that our import statements will be slightly different. Why? Because we don't have a user interface to work with this time around—just pure code!

Remove all of the starter code in the playground and enter the following:

1	import CreateML
2	import Foundation

The reason we'll need both is that we'll be using URL to tell Create ML where our dataset is and where we want our resulting model to be stored, and URL is available in the Foundation framework.

Setting Up the Dataset

Now, it's time to get all of the data set up and ready to train the model.

Converting JSON to a `MLDataTable`

First and foremost, we'll need to tell Create ML where it can find our JSON file. In this example, mine is in my Downloads folder, and yours likely is, too.

Enter the following line of code:

1	let dataset = try MLDataTable(contentsOf: URL(fileURLWithPath: "/Users/vardhanagrawal/Downloads/sentiment_analysis.json"))

However, you'll need to make sure that your fileURLWithPath parameter is set to where your JSON file lives, not where mine does. Here, we're creating a data table using the information found in the provided JSON file and storing it in a constant called dataset.

Splitting the Data

As I mentioned in Training an Image Classification Model With Create ML, it's good practice to divide your datasets into two categories: one for training the model and one for testing. Since you want to give your actual model the most attention, 80% of your dataset should be used for training, and you should save the other 20% to make sure everything is working as it should. After all, that's important too!

If you've seen the previous tutorial, you might find this concept familiar. It's simply included here in case you haven't. In essence, we'll be splitting up the data by using the randomSplit(by:seed:) method from MLDataTable.

Paste the following line of code into your playground:

1	let (trainingData, testingData) = data.randomSplit(by: 0.8, seed: 5)

Looking at the documentation, randomSplit(by:seed:) returns a tuple, which contains two MLDataTables. We'll be storing them as (trainingData, testingData), putting 80% of the dataset in trainingData and 20% in testingData.

3. Training and Testing

Now that your data is all set up and ready, it's time to finally train it and test your resulting model.

Metadata

Believe it or not, training is the easiest part of the whole process. First, you'll need to define the metadata using MLModelMetadata. You can do this by writing the following line of code:

1	let metadata = MLModelMetadata(author: "Vardhan Agrawal", shortDescription: "This model analyzes the sentiment of a given string.", version: "1.0")

Here, you can put your name as the author, a useful description, and the version number as well. This data will be shown when you preview the model in Xcode.

Training and Writing

Now, you'll need to create a classifier from the dataset. You'll need to enter the following line of code:

1	let sentimentAnalysisClassifier = try MLTextClassifier(trainingData: trainingData, textColumn: "text", labelColumn: "label")

This uses the try keyword to attempt to instantiate an MLTextClassifier and tells it that the text column is called "text" and the label column is called "label". This refers to the field names present in our JSON file.

Lastly, you'll need to write the classifier to a location on your computer. The location that you choose for this step is where you'll find your .mlmodel file if everything goes smoothly. Enter the following lines of code:

do {
    try sentimentAnalysisClassifier.write(to: URL(fileURLWithPath:
        "/Users/vardhanagrawal/Desktop/SpamDetector.mlmodel"), 
        metadata: metadata)
} catch {
    print("Something went wrong, please try again!")
}

We're wrapping the write(to:) method in a do-catch block so that we're aware if something goes wrong. Alternatively, you could just say try and then call this method, since both ways work. Don't forget, you likely don't have a user named vardhanagrawal on your computer, so be sure to change the file path to where you want your machine learning model saved.

Testing the Model

After you're done training, you'll see some output in the console of your playground. It will look something like this:

Parsing JSON records from /Users/vardhanagrawal/Downloads/sentiment_analysis.json
Successfully parsed 479 elements from the JSON file /Users/vardhanagrawal/Downloads/sentiment_analysis.json
Automatically generating validation set from 5% of the data.
Tokenizing data and extracting features
50% complete
100% complete
Starting MaxEnt training with 360 samples
Iteration 1 training accuracy 0.341667
Iteration 2 training accuracy 0.858333
Iteration 3 training accuracy 0.991667
Iteration 4 training accuracy 0.994444
Finished MaxEnt training in 0.01 seconds
Trained model successfully saved at /Users/vardhanagrawal/Desktop/SpamDetector.mlmodel.

What this essentially tells you is that it's parsed the data from your JSON file, trained it with a certain accuracy, and saved your model at its specified location. Since you already set aside some of the dataset for training earlier, the model already tells you the accuracy in this console snippet.

In an Xcode project, all you would need to do is drag in your model and enter the following lines of code to get your output from your model:

import NaturalLanguage

let sentimentAnalysisClassifier = try NLModel(mlModel: 
    SentimentClassifier().model)
sentimentAnalysisClassifier.predictedLabel(for: 
    "It was the best I've ever seen!")

This will allow you to test your model and use it in your apps.

Conclusion

In this tutorial, you learned how to import a JSON file, create a custom text classifier, and then use that in Xcode projects. To learn more about this topic and review what you learned here, I recommend checking out Apple's Documentation.

And while you're here, check out some of our other great machine learning content here on Envato Tuts+!