Using the interactive Spark Cluster Client
- This example demonstrates how to connect to the Practicus AI Spark cluster we created, and execute simple Spark operations.
- Please run this example on the
Spark Coordinator (master)
.
# And execute some code
data = [("Alice", 29), ("Bob", 34), ("Cathy", 23)]
columns = ["Name", "Age"]
# Create a DataFrame
df = spark.createDataFrame(data, columns)
# Perform a transformation
df_filtered = df.filter(df.Age > 30)
# Show results
df_filtered.show()
Terminating the cluster
- You can go back to the other worker where you created the cluster to run:
Troubleshooting
If you’re experiencing issues with an interactive cluster that doesn’t run job/train.py, please follow these steps:
-
Agent Count Mismatch: If the number of distributed agents shown by
prt.distributed.get_client()
is less than what you expected, wait a moment and then runget_client()
again. This is usually because the agents have not yet joined the cluster. Note: Batch jobs automatically wait for agents to join. -
Viewing Logs: To view logs, navigate to the
~/my/.distributed
folder.
Previous: Start Cluster | Next: Batch Job > Batch Job