Introduction to Data Visualization
This section requires a Practicus AI Cloud Worker. Please visit the introduction to Cloud Workers section of this tutorial to learn more.
Plotting datasets visually aids in data exploration, revealing patterns, and relationships. Thus, it is very important for decision-making, storytelling and insight generation. For that Practicus AI give you Plot service which plots the data in worker as well as in app.
Let's have a look of Plot basics by loading salary.csv. We will ignore the meaning of this dataset since we will only use it to explain basics of plot.
- Open Practicus AI app
- You will see the Explore tab, click on New Worker
- Select the worker which has been started
- Navigate to the samples directory and open salary.csv :
- samples > salary.csv
- Click on the file and you will see a preview
- Click Load
After loading the data set, click on Plot button to start ploting service.
Basics of Plot
The first thing we will see at Plot tab is going to be Data Source, from this menu we can select the data sheet which we want to visualize.
- Click Data Source drop down menu
- select salary
After choosing the data sheet then we could chose graphic from Graphic drop down menu which we want use while working on visualizing.
- Click Graphic drop down menu
- Select Line
After choosing the graphic style we want to work with we will see the options listed down below:
- Sampling: This option refers to a subset of data set selected from a larger dataset to represent its characteristics. Smaller samples in large data sets can be plotted more quickly, enhancing the efficiency of exploratory data analysis.
- X Coordinate: This option refers to the horizontal axis of the plot, representing the column(s) of the data set. Within Bar and H Bar graphic styles axis could get string columns as well as numerical columns.
- Y Cooridnate: This option refers to the vertical axis of the plot, also representing column(s).
- Color: This option refers to color which will be the filling of shapes within selected graphic style.
- Size: This option refers to size of the shapes within selected graphic style, with the exception of the Bar and H Bar graphic styles, where size refers to the spacing between bars.
Let's have a quick look to these options with a simple examle.
- Click to X Coordinate drop down menu and select YearsExperience
- Click to Y Coordinate drop down menu and select Salary
- Click to Add Layer
In the end we will have the plot down below:
Advanced Techniques in Plot
In this section we will try to have look to more advance techniques we can use in Plot such as adding multiple layer of visualizing, dynamic size and color, transparency, tooltip and usage of Geo Map graphic style.
Dynamic Size & Color
One of the most illustrative datasets for demonstrating dynamic size and color options is the Titanic dataset. Let's load it into one of our worker environments.
- Open Practicus AI app
- You will see the Explore tab, click on New Worker
- Select the worker which has been started
- Navigate to the samples directory and open titanic.csv :
- samples > titanic.csv
- Click on the file and you will see a preview
- Click Load
The Titanic dataset is a popular dataset used in machine learning and data analysis. It contains information about passengers aboard the RMS Titanic, including whether they survived or not. Within this data set we will use Circle graphic style from Plot and columns of pclass, fare, age and survived. Let's describe what these columns means for better understanding.
- Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).
- Fare: Passenger fare.
- Age: Passenger's age in years.
- Survived: Indicates whether the passenger survived or not (0 = No, 1 = Yes).
Let's start our plotting journey,
- Click on Plot
- Click Data Source drop down menu
- Select titanic
- Click Graphic drop down menu
- Select Circle
- Click to Advanced
After open up advanced section you will see the options of Dynamic Size and Dynamic Color. Dynamic size and color in a circle plot refer to adjusting the size and color of circles based on additional variables, beyond the x and y coordinates. Let's have look with the example of titanic data set.
- Click to X Coordinate drop down menu and select age
- Click to Y Coordinate drop down menu and select fare
- Click to Dynamic size drop down menu and select pclass
- Click to Dynamic color drop down menu and select survive
- Click to Add Layer
The plot down below should be showed up:
Hence, we can deduce from the analysis that passengers with smaller data points (indicating lower values of "Pclass") paid higher fares and had a better chance of survival. Moreover, it's evident that passengers with lower ages (on the X-axis) had a higher likelihood of surviving.
Analyze over multiple layer
One of the most illustrative datasets for demonstrating multiple layer anlyze is the Iris dataset. Let's load it into one of our worker environments.
- Open Practicus AI app
- You will see the Explore tab, click on New Worker
- Select the worker which has been started
- Navigate to the samples directory and open iris.csv :
- samples > iris.csv
- Click on the file and you will see a preview
- Click Load
The Iris dataset is a popular dataset in machine learning and statistics, often used for classification tasks. It consists of 150 samples of iris flowers, each belonging to one of three species: Setosa, Versicolor, or Virginica. Within this data set we will use both Bar and Circle graphic style from Plot. The dataset comprises four features, each representing measurements of the length and width of both the petals and sepals of flowers.
Before start let's use label encode and group by on the species and for better visualisation:
- Click Snippets
- Click Advanced
- Locate and select Label encoder
- Select species from Text columns drop down menu
- Click +
- Click OK
- Click Prepare
- Click Group By
- Select species from Group by drop down menu
- Select sepal_length and Mean (Average) from Summarize drop down menus
- Select sepal_windth and Mean (Average) from Summarize drop down menus
- Select petal_length and Mean (Average) from Summarize drop down menus
- Select petal_windth and Mean (Average) from Summarize drop down menus
- Click OK
In the end we should have the table down below:
Let's start plotting for multiple layer analyze,
- Click on Plot
- Click Data Source drop down menu
- Select iris
For first layer:
- Click Graphic drop down menu
- Select Bars
- Click to X Coordinate drop down menu and select species
- Click to Y Coordinate drop down menu and select sepal_length_mean
- Click to Advanced
- Select greenish color from Color drop down menu
- Enter a value of 50 for the Transparency % input
- Click to Add Layer
For second layer:
- Click Graphic drop down menu
- Select Line
- Click to X Coordinate drop down menu and select species
- Click to Y Coordinate drop down menu and select sepal_windth_mean
- Click to Advanced
- Select a darker greenish color from Color drop down menu
- Click to Add Layer
For third layer:
- Click Graphic drop down menu
- Select Bars
- Click to X Coordinate drop down menu and select species
- Click to Y Coordinate drop down menu and select petal_length_mean
- Click to Advanced
- Select blueish color from Color drop down menu
- Ênter a value of 50 for the Transparency % input
- Click to Add Layer
For fourth layer:
- Click Graphic drop down menu
- Select Line
- Click to X Coordinate drop down menu and select species
- Click to Y Coordinate drop down menu and select petal_windth_mean
- Click to Advanced
- Select a darker blueish color from Color drop down menu
- Click to Add Layer
In the end we sould have a plot like down below:
As we hover over the bars and lines, data point values will be displayed. Additionally, on the right side of plot, there are options available for zooming in, zooming out, and saving the plot.
Observing this multi-layer graph, it becomes evident that both sepal length and petal length play a crucial role in distinguishing between classes. Similarly, the same differentiation can be observed for petal width.
Geo-map Tutorial
To use the Geo-map feature of Plot, the initial requirement is to define the Google Maps API either through the admin console or within the application itself. If you don't know how to retrieve a Google Maps API key you can check Google's documentetion.
Defining a Google Maps API over admin console:
- Open Admin Console of Practicus AI
- Expand (Click) Definitions from left menu
- Click Cluster Definitions
- Click GOOGLE_MAPS_API_KEY from table
- Enter your key to Value input
- (Optional) Enter a description to Description input
- Click Save
Defining a Google Maps API within application:
- Click Settings frop top menu
- Click Other tab from opened window
- Enter your Google Maps API to Personal API Key at down below
- Click Save
After assigning the Google Map API we could have a look to Geo-Map by using car_insurance dataset. This dataset contains information about the insurance company's past customers who have purchased health insurance. The objective is to use this dataset to train a predictive model that can determine whether these past customers would also be interested in purchasing vehicle insurance from the same company.
The features can be listed as:
id: A unique identifier assigned to each customer. Gender: The gender of the customer. Age: The age of the customer. Driving_License: Indicates whether the customer possesses a driving license. Region_Code: A distinct code assigned to denote the region of the customer. Previously_Insured: Indicates whether the customer already holds vehicle insurance. Vehicle_Age: The age of the vehicle. Vehicle_Damage: Indicates whether the customer's vehicle has been damaged in the past. Annual_Premium: The yearly premium amount that the customer is required to pay. Policy_Sales_Channel: An anonymized code representing the outreach channel used to contact the customer, including different agents, mail, phone, and in-person visits. Vintage: The duration, in days, for which the customer has been associated with the company. Response: Indicates customer interest. 1 indicates interest, while 0 signifies no interest.
Let's load the dataset to our worker:
- Open Practicus AI app
- You will see the Explore tab, click on New Worker
- Select the worker which has been started
- Navigate to the samples directory and open airports.csv :
- samples > car_insurance.csv
- Click on the file and you will see a preview
- Click Load
Before start let's group the data on the Regeion_Code column for better visualisation:
- Click Prepare
- Click Group By
- Select Regeion_Code from Group by drop down menu
- Select Response and sum from Summarize drop down menus
- Select Previously_Insured and sum from Summarize drop down menus
- Select Lat and max from Summarize drop down menus
- Select Lon and max from Summarize drop down menus
- Click OK
Let's start our plotting journey,
- Click on Plot
- Click Data Source drop down menu
- Select car_insurance
- Click Graphic drop down menu
- Select Geo Map
After selecting the "Geo Map" graphic style, we observe four distinct options that set it apart from other graphic styles:
- Latitude: Indicates distance north or south of the Equator.
- Longitude: Specifies distance east or west of the Prime Meridian.
- Map Type: Indicates Google Maps styles.
- Zoom: Provides an approximation of the number of miles/kilometers that fit into the area represented by the plot.
Let's try to visualize the relation between Response and Previously_Insured on Google Maps by plotting data from the car_insurance dataset:
- Select Lon_max from Longitude drop down menu
- Select Lat_max from Latitude drop down menu
- Enter 1500 to Zoom input
- Click Advanced
- Select Response_sum from Dynamic Size drop down menu
- Select Previously_Insured_sum from Dynamic Color drop down menu
- Click Add Layer
Let's say we want to email this plot to someone:
- Click on Save from menu at right side
- Select a file name. e.g. us_flight.png
You will get a graphic file saved on your computer.