Skip to content

Docs | Practicus AI

Consume Parallel

Docs | Practicus AI

Welcome
Getting Started
Tutorials
Technical Tutorial
Technical Tutorial
- Getting Started
  Getting Started
- Modeling
  Modeling
  - Introduction
  - Sample Modeling
    Sample Modeling
    
    Build And Deploy
- Workflows
  Workflows
  - Introduction
  - Tasks
    Tasks
    
    Task Basics
  - Airflow
    Airflow
    
    Deploying On Airflow
  - AI Studio
    AI Studio
    
    Generating Wokflows
  - Mlflow
    Mlflow
    
    Insurance Mlflow
- Generative AI
  Generative AI
  - Introduction
  - Apps
    Apps
    
    Build
  - Model Serving
    Model Serving
    
    LLM
    LLM
    
    Model Serving
    
    Gateway
    Gateway
    
    Model Gateway
    
    Custom
    Custom
    
    Models
    Models
    
    Build
    
    Embeddings
    Embeddings
    
    Build
    
    LangChain
    LangChain
    
    Build
  - LangChain
    LangChain
    
    LangChain Basics
    
    Streaming
    
    Embeddings
  - Vector Databases
    Vector Databases
    
    Qdrant
    
    Milvus
  - Relational Databases
    Relational Databases
    
    Build
  - Message Queues
    Message Queues
    
    Build
  - Agentic AI
    Agentic AI
    
    Build
  - MCP
    MCP
    
    Build
- Distributed Computing
  Distributed Computing
  - Introduction
  - Spark
    Spark
    
    Interactive
    Interactive
    
    Start Cluster
    
    Use Cluster
    
    Batch Job
    Batch Job
    
    Batch Job
    
    Auto Scaled
    Auto Scaled
    
    Interactive
    Interactive
    
    Start Cluster
    
    Use Cluster
    
    Batch
    Batch
    
    Batch Job
  - Dask
    Dask
    
    Interactive
    Interactive
    
    Start Cluster
    
    Use Cluster
    
    Batch Job
    Batch Job
    
    Batch Job
    
    Distributed Training
    Distributed Training
    
    XGBoost
  - DeepSpeed
    DeepSpeed
    
    Basics
    Basics
    
    Intro To DeepSpeed
    
    LLM Fine Tuning
    LLM Fine Tuning
    
    Llms With DeepSpeed
  - Ray
    Ray
    
    Interactive
    Interactive
    
    Start Cluster
    
    Use Cluster
    
    Batch Job
    Batch Job
    
    Batch Job
    
    Modin
    Modin
    
    Start Cluster
    
    Use Cluster
    
    Vllm
    Vllm
    
    Start Cluster
    
    Use Cluster
  - Custom Adaptor
    Custom Adaptor
    
    Start Cluster
    
    Use Cluster
- Unified DevOps
  Unified DevOps
  - Introduction
  - Secrets With Vault
  - Automated Init
    Automated Init
    
    Build
  - Automated Git Sync
  - Git Integrated CICD
  - Build Custom Images
  - Notifications
- How To
  How To
  - Automate Notebooks
    Automate Notebooks
    
    Executing Notebooks
  - Improve Code Quality
    Improve Code Quality
    
    Automated Code Quality
  - Work With Connections
  - Integrate Git
  - Use Security Tokens
  - Work With Processes
  - Work With Data Catalog
  - Caching Large Model Files
  - Use Custom Metrics
  - Customize Templates
  - Hosting Temporary Apis
  - Configure Advanced Gpu
  - Configure Workspaces
  - Share Workers
  - View Stats
  - Create Virtual Envs
  - Model Tokens
  - Use Polars
  - Personal Startup Scripts
- Extras
  Extras
  - Modeling
    Modeling
    
    SparkML
    SparkML
    
    Ice Cream
    Ice Cream
    
    SparkML Ice Cream
    
    Spark With Job
    Spark With Job
    
    Batch Job
    
    Spark For Ds
    Spark For Ds
    
    Spark Tutorial
    
    Model Tracking
    Model Tracking
    
    Experiment Tracking
    Experiment Tracking
    
    Experiment Tracking Logging
    
    Experiment Tracking Model Training
    
    Model Drift
    Model Drift
    
    Model Drift
    
    Model Observability
    Model Observability
    
    Model Observability
    
    XGBoost
    XGBoost
    
    XGBoost
    
    AutoML
    
    Shap Analysis
    
    Bank Marketing
    Bank Marketing
    
    Bank Marketing
    
    Zip Unzip
    Zip Unzip
    
    Zip Unzip
  - Workflows
    Workflows
    
    Task Runnner App
    Task Runnner App
    
    Build
    
    Task Parameters
    
    Run Task Safe
    Run Task Safe
    
    Deploy
    
    API Triggers For Airflow
  - Generative AI
    Generative AI
    
    Databases
    Databases
    
    Using Databases
    
    Prtchatbot
    Prtchatbot
    
    Prtchatbot
    
    Advanced LangChain
    Advanced LangChain
    
    Lang Chain LLM Model
    
    AI Assistants
    AI Assistants
    
    AI Assistants
    
    Mobile-Banking
    Mobile-Banking
    
    Mobile-Banking
    
    LLM Apps
    LLM Apps
    
    API LLM Apphost
    API LLM Apphost
    
    Build
    
    Sdk LLM Apphost
    Sdk LLM Apphost
    
    Non-Stream
    Non-Stream
    
    Sdk Streamlit Hosting
    
    Langflow LLM Apphost
    Langflow LLM Apphost
    
    Langflow Streamlit Hosting
    
    Milvus Embedding And LangChain
    Milvus Embedding And LangChain
    
    Milvus Chain
    
    Langflow Apis
    Langflow Apis
    
    Langflow API
    
    Ecomm-Sdk
    Ecomm-Sdk
    
    Memory Chabot
    
    Growth Strategist AI
    Growth Strategist AI
    
    Build
    
    Cv Assistant
    Cv Assistant
    
    Cv Assistant
    
    Deploying LLM
    Deploying LLM
    
    Introduction
    
    Preparation
    Preparation
    
    Model Download
    
    Upload LLM
    
    Basic Deployment
    Basic Deployment
    
    Model
    
    Model Json
    
    Deploy
    
    Consume Parallel
    
    LangChain Deployment
    LangChain Deployment
    
    Model
    
    Model Json
    
    Deploy
    
    Consume Parallel Consume Parallel
    Table of contents
    
    This tutorial demonstrates how to interact with a PracticusAI LLM deployment for making predictions using the PracticusAI SDK. The methods used include ChatPracticus for invoking the model endpoint and practicuscore for managing API tokens.
    
    The workflow illustrates obtaining a session token, invoking the LLM API endpoint, and processing responses in parallel.
    
    Defining parameters.
    
    The test_langchain_practicus function is defined to interact with the PracticusAI model endpoint. It uses the ChatPracticus object to invoke the API with the provided URL, token, and input data. The response is printed in two formats: a raw dictionary and its content.
    
    We retrieve an API session token using PracticusAI SDK. This token is required to authenticate and interact with the PracticusAI deployment.
    
    The method below creates a token that is valid for 4 hours, longer tokens can be retrieved from the admin console.
    
    We invoke the test_langchain_practicus function with the API URL, session token, and an example query, 'What is the capital of England?'. The function sends the query to the PracticusAI endpoint and prints the received response.
    
    Supplementary Files
    
    model.json
    
    model.py
    
    Combined Method
    Combined Method
    
    Model
    
    Model Json
    
    Deploy
    
    Consume Parallel
    
    Email E Assistant
    Email E Assistant
    
    Mail E-Assistant
    
    Memory-Chatbot-Sdk
    Memory-Chatbot-Sdk
    
    Chatbot-Console-OpenAI
    
    Stream Chatbot
    Stream Chatbot
    
    Memory Chabot
  - Data Analysis
    Data Analysis
    
    Plot
    Plot
    
    Introduction
    
    Multiple Layers
    
    Eda
    Eda
    
    Analyze
  - Data Processing
    Data Processing
    
    Pre Process Data
    Pre Process Data
    
    Preprocess
    
    Process Data
    Process Data
    
    Insurance
    
    Insurance With Remote Worker
    
    Spark Custom Config
    
    Spark Object Storage
  - Big Data Analytics
    Big Data Analytics
    
    Trino
    Trino
    
    Trino Iceberg API Integration
AI Studio Tutorial
AI Studio Tutorial
Operations Tutorial
Operations Tutorial
SDK v25.5.1
Legacy
Legacy
- Setup Guide
- Open MLOps
  Open MLOps
  - Home
  - For developers
- Open DataOps
  Open DataOps
- Explore
- Cloud
- Analyze
- Model
- Prepare
  Prepare
- Predict
  Predict
  - Predict with App
  - Predict with Excel
- AWS IAM policies
- Logging
- Spreadsheets SDK
  Spreadsheets SDK
- Demo Videos
- Feedback

Consume LLM API

This tutorial demonstrates how to interact with a PracticusAI LLM deployment for making predictions using the PracticusAI SDK. The methods used include `ChatPracticus` for invoking the model endpoint and `practicuscore` for managing API tokens.

The workflow illustrates obtaining a session token, invoking the LLM API endpoint, and processing responses in parallel.

from langchain_practicus import ChatPracticus
import practicuscore as prt

Defining parameters.

This section defines key parameters for the notebook. Parameters control the behavior of the code, making it easy to customize without altering the logic. By centralizing parameters at the start, we ensure better readability, maintainability, and adaptability for different use cases.

api_url = None  # E.g. "https://company.practicus.com/llm-models/llama-3b-chain-test/"

assert api_url, "Please enter your model api url."

The `test_langchain_practicus` function is defined to interact with the PracticusAI model endpoint. It uses the `ChatPracticus` object to invoke the API with the provided URL, token, and input data. The response is printed in two formats: a raw dictionary and its content.

def test_langchain_practicus(api_url, token, inputs):
    chat = ChatPracticus(
        endpoint_url=api_url,
        api_token=token,
        model_id="current models ignore this",
    )

    response = chat.invoke(input=inputs)

    print("\n\nReceived response:\n", dict(response))
    print("\n\nReceived Content:\n", response.content)

We retrieve an API session token using PracticusAI SDK. This token is required to authenticate and interact with the PracticusAI deployment.

The method below creates a token that is valid for 4 hours, longer tokens can be retrieved from the admin console.

token = None  # Get a new token, or reuse existing if not expired.
token = prt.models.get_session_token(api_url, token=token)
print("API session token:", token)

We invoke the `test_langchain_practicus` function with the API URL, session token, and an example query, `'What is the capital of England?'`. The function sends the query to the PracticusAI endpoint and prints the received response.

test_langchain_practicus(api_url, token, ["What is the capital of England?"])

Supplementary Files

model.json

{
"download_files_from": "cache/llama-1b-instruct/",
"_comment": "you can also define download_files_to otherwise, /var/practicus/cache is used"
}

model.py

import sys
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from practicuscore.gen_ai import PrtLangMessage, PrtLangRequest, PrtLangResponse
import json

generator = None


async def init(model_meta=None, *args, **kwargs):
    global generator
    if generator is not None:
        print("generator exists, using")
        return

    print("generator is none, building")
    model_cache = "/var/practicus/cache"
    if model_cache not in sys.path:
        sys.path.insert(0, model_cache)

    try:
        from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    except Exception as e:
        raise print(f"Failed to import required libraries: {e}")

    # Initialize the local LLM model using transformers:

    def load_local_llm(model_path):
        tokenizer = AutoTokenizer.from_pretrained(model_path)
        model = AutoModelForCausalLM.from_pretrained(model_path)
        model.to("cpu")  # Change with cuda or auto to use gpus.
        return pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200)

    try:
        generator = load_local_llm(model_cache)
    except Exception as e:
        print(f"Failed to build generator: {e}")
        raise


async def cleanup(model_meta=None, *args, **kwargs):
    print("Cleaning up memory")

    global generator
    generator = None

    from torch import cuda

    cuda.empty_cache()


async def predict(payload_dict: dict, **kwargs):
    from practicuscore.gen_ai import PrtLangRequest, PrtLangResponse

    # The payload dictionary is validated against PrtLangRequest.
    practicus_llm_req = PrtLangRequest.model_validate(payload_dict)

    # Converts the validated request object to a dictionary.
    data_js = practicus_llm_req.model_dump_json(indent=2, exclude_unset=True)
    payload = json.loads(data_js)

    # Joins the content field from all messages in the payload to form the prompt string.
    prompt = " ".join([item["content"] for item in payload["messages"]])

    # Generate a response from the model
    response = generator(prompt)
    answer = response[0]["generated_text"]

    # Creates a PrtLangResponse object with the generated content and metadata about the language model and token usage
    resp = PrtLangResponse(
        content=answer,
        lang_model=payload["lang_model"],
        input_tokens=0,
        output_tokens=0,
        total_tokens=0,
        # additional_kwargs={
        #     "some_additional_info": "test 123",
        # },
    )

    return resp

Previous: Deploy | Next: Combined Method > Model