fastmap.io BETA
faq docs github

How to train machine learning models with fastmap

What is fastmap?

Fastmap is a simple distributed computing framework that allows you to offload arbitrary Python code onto the cloud. Fastmap is useful for:

Fastmap is a cloud supercomputer that you deploy once. Once deployed, fastmap allows you to offload arbitrary Python functions to run on their own servers with one line of code. When those functions are done, the resident servers automatically delete themselves to save you money. Fastmap saves your results and logs which can download whenever you need them.

Installation

Fastmap is composed of two parts, the Python client library and the open source cloud service.

Client library installation

$ pip3 install fastmap

Cloud service installation

Ordinarily, this would be deployed by you or you would get an API key from the website. For now, here is your API key (Burner key here).

Quickstart

Let’s do some machine learning on some real-world data: California housing prices. This dataset contains the median housing prices of 20,640 California housing districts along with 8 independent variables like “median income”, “average number of bedrooms”, and “district population”. Lucky for us, this dataset is also very easy to download directly through the sklearn library.

For this example, we’ll train three different types of models and compare them against each other.

Let’s start by importing everything we need. Don’t worry too much about what is here. It might look like a lot but we are just importing fastmap and then all the other ML stuff we need to run the tutorial.

import fastmap
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import r2_score
from xgboost import XGBRegressor

Next, let’s define functions to train three different types of models. Each of these functions takes your Xs and Ys and returns a trained model and its r^2. You’ll notice that they’re all basically the same.

def train_linear_model(X_train, y_train, X_test, y_test):
    print("Training linear model...")
    print("This will be very fast but not that accurate.")
    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    r2 = r2_score(y_pred, y_test)
    return model, r2


def train_xgboost_model(X_train, y_train, X_test, y_test):
    print("Training xgboost model...")
    print("This will be a little slower but much more accurate.")
    model = XGBRegressor(verbosity=2)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    r2 = r2_score(y_pred, y_test)
    return model, r2


def train_svr_model(X_train, y_train, X_test, y_test):
    print("Training support vector regression model...")
    print("This is very slow but often produces great results.")
    model = SVR(kernel='rbf', verbose=True)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    r2 = r2_score(y_pred, y_test)
    return model, r2

Great, let’s run some code! First, setup fastmap globally. This takes just an API key which you should have obtained from the directions above.

fastmap.global_init(secret=your_api_key)

Next, download, the California housing dataset and split it into test/train groups

california_dataset = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(california_dataset.data,
                                                    california_dataset.target,
                                                    test_size=0.1)

And here is where the magic happens. Start all three tasks at once using fastmap. Fastmap tasks are asynchronous so you can train them simultaneously.

linear_model_task = fastmap.offload(
    train_linear_model,
    kwargs={"X_train": X_train, "y_train": y_train, "X_test": X_test, "y_test": y_test})
xgboost_model_task = fastmap.offload(
    train_xgboost_model,
    kwargs={"X_train": X_train, "y_train": y_train, "X_test": X_test, "y_test": y_test})
svr_model_task = fastmap.offload(
    train_svr_model,
    kwargs={"X_train": X_train, "y_train": y_train, "X_test": X_test, "y_test": y_test})

All three tasks have been uploaded to fastmap and the server is now busy spinning up three different virtual machines to run them. Let’s check on the first: the linear_model_task.

linear_model_task.poll()

This will return a dictionary. You will probably see {“task_state”: “PENDING”} or {“task_state”: “PROCESSING”}. A PENDING state means that the task hasn’t yet been placed on a server. A PROCESSING state means that your task is running.

We know that in machine learning, training linear models is fast. So let’s just go ahead and wait for that task to finish. The live_logs parameter below allows us to see the output on the server as it runs. You’ll notice that your dependencies were automatically installed without you having to do anything!

linear_model, r2 = linear_model_task.wait(live_logs=True)
print(f"California dataset LinearRegression model r^2: {r2}")

If everything went well, you should get an r^2 of about 0.36. This is pretty good but we can probably do better. Rather than waiting for the other tasks, to finish, let’s kill our Python shell. Trust me on this, it’s safe. You won’t lose your tasks!

On your command line, run:

$ fastmap poll

You should see something like this:

Found 3 task(s)
type     func_name            task_id    task_state    outcome    start_time           runtime    label    last_heartbeat
-------  -------------------  ---------  ------------  ---------  -------------------  ---------  -------  ----------------
OFFLOAD  train_svr_model      9696d4dc   PROCESSING               2021-07-01 21:22:42                      36 seconds ago
OFFLOAD  train_xgboost_model  7613491a   DONE          SUCCESS    2021-07-01 21:22:39  293.5s              2 minutes ago
OFFLOAD  train_linear_model   0a5d228b   DONE          SUCCESS    2021-07-01 21:22:35  316.1s              2 minutes ago

Chances are good that your first two tasks will be done while your third is still processing. SVMs take famously long to train!

We can get the logs of the finished “train_xgboost_model” task by finding it’s task_id in the fastmap poll output. For me, this is “7613491a”. Yours will be different. Once you’ve found yours, run:

$ fastmap logs <xgboost_task_id>

Then, we can get the function’s return value by running:

$ fastmap return_value <xgboost_task_id>

The return value should be something like:

XGBRegressor(base_score=0.5, ...), 0.80

Recall that the functions we wrote returned two values, the model and the r^2. For XGBRegressor, we got 0.8 compared to 0.3 for the LinearRegression model. This is a fantastic result!

Let’s say you need your excellent XGBRegressor model to do some more training. With fastmap, you can easily download it from the cloud. To do that, open a new shell and write:

import fastmap
fastmap.global_init(<your_api_key>)
fantastic_xgboost_model, r2 = fastmap.return_value(<xgboost_task_id>)

Worth noting - return values from your functions stay on the cloud until you delete them. So it doesn’t matter if you download them a minute later or year later - they will be there. But what about our third task, the svr_model_task? Let’s check in on it.

$ fastmap poll

Assuming you are doing this tutorial at a normal pace, it’s likely that your last task still hasn’t finished. Nothing is broken! Support vector machines are just very slow to train. Since we got such a great result from XGBoost, let’s kill this task.

$ fastmap kill <svr_task_id>

If you now run fastmap poll again, it should have a state of either “KILLING” or “KILLED_BY_REQUEST”. Once a task is killed, it is no longer running or consuming resources.

By default, fastmap keeps the logs and return values of every task you run on the cloud. Let’s say you’re on a super-secret-project. You can delete the logs and return values of a task by running:

$ fastmap clear <task_id>

Or, to clear all tasks at once:

$ fastmap clear

From here

This covers most of Fastmap’s offload capabilities. For more, check out the docs.