Insurance Mlflow

Step 1: Pre-process

We will use Practicus SDK to pre process the current data

import practicuscore as prt
import pandas as pd

worker = prt.get_local_worker()

conn_conf = {
    "connection_type": "WORKER_FILE",
    "sampling_method": "ALL",
    "file_path": "/home/ubuntu/samples/data/insurance.csv",
}

proc = worker.load(conn_conf)
proc.show_head()

df = proc.get_df_copy()

Step 2: Initializing the AutoML Experiment

PyCaret's regression module is utilized here for predicting a continuous target variable, i.e., energy costs. We begin by initializing our AutoML experiment.

df_model = df

from pycaret.regression import RegressionExperiment, load_model, predict_model

exp = RegressionExperiment()

Step 3: Configuring the Experiment

We'll configure our experiment with a specific name, making it easier to manage and reference.

# You need to configure using the service unique key, you can find your key on the "Practicus AI Admin Console"
# service_key = 'mlflow-primary'
# Optionally, you can provide experime name to create a new experiement while configuring
experiment_name = "insurance"

# prt.experiments.configure(service_key=service_key, experiment_name=experiment_name)
# No experiment service selected, will use MlFlow inside the Worker. To configu3re manually:
# prt.experiments.configure(experiment_name=experiment_name, service_name='Experiment service name')

Step 4: Preparing Data with PyCaret's Setup

A critical step where we specify our experiment's details, such as the target variable, session ID for reproducibility, and whether to log the experiment for tracking purposes.

# setup_params = {'normalize': True, 'normalize_method': 'minmax',
#'remove_outliers' : True, 'outliers_method':  'iforest'}

exp.setup(
    data=df_model,
    target="charges",
    session_id=42,
    log_experiment=True,
    feature_selection=True,
    experiment_name=experiment_name,
)

Step 5: Model Selection and Tuning

This command leverages AutoML to compare different models automatically, selecting the one that performs best according to a default or specified metric. It's a quick way to identify a strong baseline model without manual experimentation.

best_model = exp.compare_models(include=["lr", "lasso", "lightgbm"])

Once a baseline model is selected, this step fine-tunes its hyperparameters to improve performance. The use of tune-sklearn and hyperopt indicates an advanced search across the hyperparameter space for optimal settings, which can significantly enhance model accuracy.

tune_params = {}

tuned_model = exp.tune_model(best_model, **tune_params)

final_model = exp.finalize_model(tuned_model)

predictions = exp.predict_model(final_model, data=df)
display(predictions)

predictions

exp.save_model(final_model, "model")

(Recommneded) Adding model metadata to your API

You can create and upload model.json file that defines the input and output schema of your model and potentially other metadata too.
This will explain how to consume your model efficiently and make it accessible to more users.
Practicus AI uses MlFlow model input/output standard to define the schema
You can build the model.json automatically, or let Practicus AI build it for you using the dataframe.

model_config = prt.models.create_model_config(
    df=df,
    target="charges",
    model_name="insurance-4",
    problem_type="Regression",
    version_name="2024-05-30",
    final_model="knn",
    score=4.2493,
)
model_config.save("model.json")
# You also can directly instantiate ModelConfig class to provide more metadata elements
# model_config = prt.models.ModelConfig(...)

Step 6: Model Deployment

df.head()

deployment_key = "automl-depl"
assert deployment_key, "Please select a deployment key"
prefix = "models"
model_name = "insurance-mlflow-test"
model_dir = None

# Deploy to current Practicus AI region
prt.models.deploy(deployment_key=deployment_key, prefix=prefix, model_name=model_name, model_dir=model_dir)

region = prt.current_region()

# *All* Practicus AI model APIs follow the below url convention
api_url = f"{region.url}/{prefix}/{model_name}/"
# Important: For effective traffic routing, always terminate the url with / at the end.
print("Model REST API Url:", api_url)

# We will be using using the SDK to get a session token (or reuse existing, if not expired).
# To learn how to get a token without the SDK, please view 05_others/tokens sample notebook
token = None
token = prt.models.get_session_token(api_url, token=token)
print("API session token:", token)

import requests

headers = {"authorization": f"Bearer {token}", "content-type": "text/csv"}
data_csv = df.head(5000).to_csv(index=False)

r = requests.post(api_url, headers=headers, data=data_csv)
if not r.ok:
    raise ConnectionError(f"{r.status_code} - {r.text}")

from io import BytesIO

pred_df = pd.read_csv(BytesIO(r.content))

print("Prediction Result:")
pred_df.head()

Previous: Generating Wokflows | Next: Generative AI > Introduction