Insurance Mlflow
Step 1: Pre-process
We will use Practicus SDK to pre process the current data
conn_conf = {
"connection_type": "WORKER_FILE",
"sampling_method": "ALL",
"file_path": "/home/ubuntu/samples/data/insurance.csv",
}
proc = worker.load(conn_conf)
proc.show_head()
Step 2: Initializing the AutoML Experiment
PyCaret's regression module is utilized here for predicting a continuous target variable, i.e., energy costs. We begin by initializing our AutoML experiment.
from pycaret.regression import RegressionExperiment, load_model, predict_model
exp = RegressionExperiment()
Step 3: Configuring the Experiment
We'll configure our experiment with a specific name, making it easier to manage and reference.
# You need to configure using the service unique key, you can find your key on the "Practicus AI Admin Console"
# service_key = 'mlflow-primary'
# Optionally, you can provide experime name to create a new experiement while configuring
experiment_name = "insurance"
# prt.experiments.configure(service_key=service_key, experiment_name=experiment_name)
# No experiment service selected, will use MlFlow inside the Worker. To configu3re manually:
# prt.experiments.configure(experiment_name=experiment_name, service_name='Experiment service name')
Step 4: Preparing Data with PyCaret's Setup
A critical step where we specify our experiment's details, such as the target variable, session ID for reproducibility, and whether to log the experiment for tracking purposes.
# setup_params = {'normalize': True, 'normalize_method': 'minmax',
#'remove_outliers' : True, 'outliers_method': 'iforest'}
exp.setup(
data=df_model,
target="charges",
session_id=42,
log_experiment=True,
feature_selection=True,
experiment_name=experiment_name,
)
Step 5: Model Selection and Tuning
This command leverages AutoML to compare different models automatically, selecting the one that performs best according to a default or specified metric. It's a quick way to identify a strong baseline model without manual experimentation.
Once a baseline model is selected, this step fine-tunes its hyperparameters to improve performance. The use of tune-sklearn and hyperopt indicates an advanced search across the hyperparameter space for optimal settings, which can significantly enhance model accuracy.
(Recommneded) Adding model metadata to your API
- You can create and upload model.json file that defines the input and output schema of your model and potentially other metadata too.
- This will explain how to consume your model efficiently and make it accessible to more users.
- Practicus AI uses MlFlow model input/output standard to define the schema
- You can build the model.json automatically, or let Practicus AI build it for you using the dataframe.
model_config = prt.models.create_model_config(
df=df,
target="charges",
model_name="insurance-4",
problem_type="Regression",
version_name="2024-05-30",
final_model="knn",
score=4.2493,
)
model_config.save("model.json")
# You also can directly instantiate ModelConfig class to provide more metadata elements
# model_config = prt.models.ModelConfig(...)
Step 6: Model Deployment
deployment_key = "automl-depl"
assert deployment_key, "Please select a deployment key"
prefix = "models"
model_name = "insurance-mlflow-test"
model_dir = None
# Deploy to current Practicus AI region
prt.models.deploy(deployment_key=deployment_key, prefix=prefix, model_name=model_name, model_dir=model_dir)
region = prt.current_region()
# *All* Practicus AI model APIs follow the below url convention
api_url = f"{region.url}/{prefix}/{model_name}/"
# Important: For effective traffic routing, always terminate the url with / at the end.
print("Model REST API Url:", api_url)
# We will be using using the SDK to get a session token (or reuse existing, if not expired).
# To learn how to get a token without the SDK, please view 05_others/tokens sample notebook
token = None
token = prt.models.get_session_token(api_url, token=token)
print("API session token:", token)
import requests
headers = {"authorization": f"Bearer {token}", "content-type": "text/csv"}
data_csv = df.head(5000).to_csv(index=False)
r = requests.post(api_url, headers=headers, data=data_csv)
if not r.ok:
raise ConnectionError(f"{r.status_code} - {r.text}")
from io import BytesIO
pred_df = pd.read_csv(BytesIO(r.content))
print("Prediction Result:")
pred_df.head()
Previous: Generating Wokflows | Next: Generative AI > Introduction