Swift Package Introduction¶

This document gives a basic walkthrough of xgboost swift package.

List of other helpful links

Install XGBoost¶

To install XGBoost, please follow instructions at GitHub.

Include in your project¶

SwiftXGBoost uses Swift Package Manager, to use it in your project, simply add it as a dependency in your Package.swift file:

.package(url: "https://github.com/kongzii/SwiftXGBoost.git", from: "0.7.0")

Python compatibility¶

With PythonKit package, you can import Python modules:

let numpy = Python.import("numpy")
let pandas = Python.import("pandas")

And use them in the same way as in Python:

let dataFrame = pandas.read_csv("Examples/Data/veterans_lung_cancer.csv")

And then use them with SwiftXGBoost, check AftSurvival for a complete example.

TensorFlow compatibility¶

If you are using S4TF toolchains, you can utilize tensors directly:

let tensor = Tensor<Float>(shape: TensorShape([2, 3]), scalars: [1, 2, 3, 4, 5, 6])
let tensorData = try DMatrix(name: "tensorData", from: tensor)

Data Interface¶

The XGBoost swift package is currently able to load data from:

LibSVM text format file
Comma-separated values (CSV) file
NumPy 2D array
Swift for Tensorflow 2D Tensor
XGBoost binary buffer file

The data is stored in a DMatrix class.

To load a libsvm text file into DMatrix class:

let svmData = try DMatrix(name: "train", from: "Examples/Data/data.svm.txt", format: .libsvm)

To load a CSV file into DMatrix:

// labelColumn specifies the index of the column containing the true label
let csvData = try DMatrix(name: "train", from: "Examples/Data/data.csv", format: .csv, labelColumn: 0)
Note

Use Pandas to load CSV files with headers.

Currently, the DMLC data parser cannot parse CSV files with headers. Use Pandas (see below) to read CSV files with headers.

To load a NumPy array into DMatrix:

let numpyData = try DMatrix(name: "train", from: numpy.random.rand(5, 10), label: numpy.random.randint(2, size: 5))

To load a Pandas data frame into DMatrix:

let pandasDataFrame = pandas.DataFrame(numpy.arange(12).reshape([4, 3]), columns: ["a", "b", "c"])
let pandasLabel = numpy.random.randint(2, size: 4)
let pandasData = try DMatrix(name: "data", from: pandasDataFrame.values, label: pandasLabel)

Saving DMatrix into an XGBoost binary file will make loading faster:

try pandasData.save(to: "train.buffer")

Missing values can be replaced by a default value in the DMatrix constructor:

let dataWithMissingValues = try DMatrix(name: "data", from: pandasDataFrame.values, missingValue: 999.0)

Various float fields and uint fields can be set when needed:

try dataWithMissingValues.set(field: .weight, values: [Float](repeating: 1, count: try dataWithMissingValues.rowCount()))

And returned:

let labelsFromData = try pandasData.get(field: .label)

Setting Parameters¶

Parameters for Booster can also be set.

Using the set method:

let firstBooster = try Booster()
try firstBooster.set(parameter: "tree_method", value: "hist")

Or as a list at initialization:

let parameters = [Parameter(name: "tree_method", value: "hist")]
let secondBooster = try Booster(parameters: parameters)

Training¶

Training a model requires a booster and a dataset.

let trainingData = try DMatrix(name: "train", from: "Examples/Data/data.csv", format: .csv, labelColumn: 0)
let boosterWithCachedData = try Booster(with: [trainingData])
try boosterWithCachedData.train(iterations: 5, trainingData: trainingData)

After training, the model can be saved:

try boosterWithCachedData.save(to: "0001.xgboost")

The model can also be dumped to a text:

let textModel = try boosterWithCachedData.dumped(format: .text)

A saved model can be loaded as follows:

let loadedBooster = try Booster(from: "0001.xgboost")

Prediction¶

A model that has been trained or loaded can perform predictions on data sets.

From Numpy array:

let testDataNumpy = try DMatrix(name: "test", from: numpy.random.rand(7, 12))
let predictionNumpy = try loadedBooster.predict(from: testDataNumpy)

From Swift array:

let testData = try DMatrix(name: "test", from: [69.0,60.0,7.0,0,0,0,1,1,0,1,0,0], shape: Shape(1, 12))
let prediction = try loadedBooster.predict(from: testData)

Plotting¶

You can also save the plot of importance into a file:

try boosterWithCachedData.saveImportanceGraph(to: "importance") // .svg extension will be added

C API¶

Both Booster and DMatrix are exposing pointers to the underlying C.

You can import a C-API library:

import CXGBoost

And use it directly in your Swift code:

try safe { XGBoosterSaveModel(boosterWithCachedData.booster, "0002.xgboost") }

safe is a helper function that will throw an error if C-API call fails.

More¶

For more details and examples, check out GitHub repository.