Apple Core ML Testing and Review

Apple Core ML Testing and Review

At WWDC’17, Apple introduced a new framework for working with Core ML machine learning technologies. Based on it in iOS, Apple’s own products are implemented: Siri, Camera, and QuickType.

  • Core ML allows you to simplify the integration of machine learning into applications and create various “smart” functions with a couple of lines of code.

Core ML features

Using Core ML in the application, you can implement the following functions:

  • Real-time image recognition;
  • Predictive text input;
  • pattern recognition;
  • Tone analysis;
  • Handwriting recognition;
  • Search ranking;
  • Stylization of images;
  • Face recognition;
  • Voice identification;
  • Definition of music;
  • Text referencing;

Core ML makes it easy to import various machine learning algorithms into your application, such as

  • A tree ensembles;
  • SVMs;
  • Generalized linear models.

It uses low-level technologies, such as

  • Metal;
  • Accelerates;
  • BNNS.

The results of the calculations are almost instantaneous.

Vision

  • The Vision framework works on the basis of Core ML and helps with tracking and recognizing faces, text, objects, bar codes. It is also possible to determine the horizon and obtain a matrix for aligning the image.

NSLinguisticTagger

  • With iOS 5, Apple introduced NSLinguisticTagger, which allows you to analyze a natural language, supports many languages and alphabets.
  • With the release of iOS 11 class improved, now it can feed a line with text in different languages and it will return the dominant language in this line and many other improvements.
  • NSLinguisticTagger also uses machine learning for an in-depth understanding of the text and its analysis.

Core ML Model

  • On the promo page Core ML, Apple has provided 4 models. All of them analyze the images. Core ML models work locally and are optimized for use on mobile devices, minimizing the amount of memory used and power consumption.
  • You can generate your own models using Core ML Tools.

A way to load models at runtime:

  1. Put the model file in the target application.
  2. Compile a new model from .mlmodel to .mlmodelc without changing its interface.
  3. Put these files on the server.
  4. Download them inside the app.
  5. Initialize a new model, for example:

CoreMLModelClass.init(contentOf: URL)

Performance after the release of the application in the App Store has not been tested.

Core ML Features

  • The decision from Apple can not accept data and train the model. Just take some types of trained models, convert them into your own format and make predictions.
  • The model does not shrink.
  • It is not encrypted in any way. You will have to take care of the data protection yourself.

Testing Core ML

I prepared a test project using Core ML. We will make a simple cat locator that will distinguish everything from the cat.

  • Create a project and select the Single View Application. Previously, you need to download the Core ML model, which will analyze the objects from the camera.
  • In this project, we use Inception v3.
  • Next, you need to move the model to Project Navigator, Xcode will automatically generate an interface for it.
  • On the storyboard we add to the entire View screen, there we will display the image from the camera. Add Visual Effect View and Label. We throw outlets in the ViewController.
  • Do not forget to add permission to use the camera in the plist.

  • We need to display the image from the camera in real time, for this we create AVCaptureSession and the queue for receiving new DispatchQueue frames.
  • Add to our View a layer of AVCaptureVideoPreviewLayer, it will display an image from the camera, you also need to create an array of VNRequest – these are queries to Vision. Immediately in the view did a load check the availability of the camera.
import UIKit
import AVFoundation
import Vision

class ViewController: UIViewController {

    @IBOutlet var resultLabel: UILabel!
    @IBOutlet var resultView: UIView!

    let session = AVCaptureSession()
    var previewLayer: AVCaptureVideoPreviewLayer!
    let captureQueue = DispatchQueue(label: "captureQueue")
    var visionRequests = [VNRequest]()

    override func viewDidLoad() {
        super.viewDidLoad()

        guard let camera = AVCaptureDevice.default(for: .video) else {
            return
        }
        do {
            previewLayer = AVCaptureVideoPreviewLayer(session: session)
            resultView.layer.addSublayer(previewLayer)
        } catch {
            let alertController = UIAlertController(title: nil, message: error.localizedDescription, preferredStyle: .alert)
            alertController.addAction(UIAlertAction(title: "Ok", style: .default, handler: nil))
            present(alertController, animated: true, completion: nil)
        }
    }
}
  • Next, configure the camera Input and cameras Output, add them to the session and start it to get a stream of data.
let cameraInput = try AVCaptureDeviceInput(device: camera)
let videoOutput = AVCaptureVideoDataOutput()
videoOutput.setSampleBufferDelegate(self, queue: captureQueue)
videoOutput.alwaysDiscardsLateVideoFrames = true
videoOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32BGRA]

session.sessionPreset = .high
session.addInput(cameraInput)
session.addOutput(videoOutput)

let connection = videoOutput.connection(with: .video)
connection?.videoOrientation = .portrait
session.startRunning()
  • Now we need to initialize the Core ML model for Vision and configure the query.
guard let visionModel = try? VNCoreMLModel(for: Inceptionv3().model) else {
    fatalError("Could not load model")
}

let classificationRequest = VNCoreMLRequest(model: visionModel, completionHandler: handleClassifications)
classificationRequest.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
visionRequests = [classificationRequest]
[/sourcecode]
  • Now, create a method that will process the results. Taking into account the error, we take 3 most probably in the model’s opinion and look for the word cat among them.
private func handleClassifications(request: VNRequest, error: Error?) {
    if let error = error {
        print(error.localizedDescription)
        return
    }
    guard let results = request.results as? [VNClassificationObservation] else {
        print("No results")
        return
    }

    var resultString = "Это не кот!"
    results[0...3].forEach {
            let identifer = $0.identifier.lowercased()
            if identifer.range(of: "cat") != nil {
            resultString = "Это кот!"
        }
    }
    DispatchQueue.main.async {
        self.resultLabel.text = resultString
    }
}
  • The last thing we need to do is add the AVCaptureVideoDataOutputSampleBufferDelegate method, which is called with each new frame received from the camera. In it, we configure the request and execute it.
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            return
        }

        var requestOptions: [VNImageOption: Any] = [:]
        if let cameraIntrinsicData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
            requestOptions = [.cameraIntrinsics: cameraIntrinsicData]
        }

        let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: 1, options: requestOptions)
        do {
            try imageRequestHandler.perform(visionRequests)
        } catch {
            print(error)
        }
    }
}

That all! You wrote an application that distinguishes cats from all other objects!

Link to the repository.

Conclusions

Despite the features, Core ML will find its audience. If you are not ready to accept restrictions and small opportunities, there are many third-party frameworks. For example, YOLO or Swift-AI.