Using Practicus AI Studio with Airflow

You can use Practicus AI Studio for the following tasks for Airflow workflows.

Practicus AI Studio functionality for Airflow

Explore data sources such as Data Lakes, Data Warehouses and Databases
Transform data
Join data from different data sources
Export the result to any data source
Perform these tasks on individual Workers or on distributed Spark cluster
Generate data processing steps as Python code
Auto-detect dependencies between tasks
Generate the DAG code
Export data connection files separately so you can change them later

Load some_table from a Database A
- Make changes
- Save to Database B
Load some_other_table from a Data Lake C
- Make changes
- Save to Data Warehouse D
Load final_table from Database E
- Join to some_table
- Join to some_other_table
- Make other changes
- Save to Data Lake F
- Export everything to Airflow

Let's take a quick look on the experience.

Practicus AI automatically detects the dependency:
Operations on some_table and some_other_table can execute in parallel since they do not depend on each other
If both are successful, operations on final_table can happen including joins

airflow

After the code export is completed you can update 4 types of files:
.py files: Each are tasks that include the data processing steps, SQL etc.
.._worker.json files: Defines the worker that each task will run on.
- Container image to use, worker capacity (CPU, GPU, RAM) ..
.._conn.json files: Defines how to read data for each task.
- Note: Data source credentials can be stored in the Practicus AI data catalog.
.._save_conn.json files: Defines how to write data for each task.
- Note: Data source credentials can be stored in the Practicus AI data catalog.
.._join_.._conn.json files: Defines how each join operation will work: how to read data and where to join.
.._dag.py file: The DAG file that brings everything together.

Sample view from the embedded Jupyter notebook inside Practicus AI Studio.

You have 2 options to deploy to Airflow from Practicus AI Studio.

Select the schedule and deploy directly to Airflow add-on service that an admin gave you access to.
This will instantly start the Airflow schedule.
You can then view your DAGs using Practicus AI and monitor the state of your workflows.
You can also manually trigger DAGs.

Just export the code and share with a Data Engineer, so they can:
Validate your steps (.py files)
Update data sources for production databases (conn.json files)
Select appropriate Worker capacity (worker.json files)
Select appropriate Worker user credentials (worker.json files)
Deploy to Airflow
Define the necessary monitoring steps with automation (e.g. with Practicus AI observability)

Previous: Deploying On Airflow | Next: Mlflow > Insurance Mlflow