Skip to content

Data Profiling

This section requires a Practicus AI cloud node. Please visit the introduction to cloud nodes section of this tutorial to learn more.

Profiling your data is an extremely powerful way to get a good understanding of your data distribution and correlations.

  • Open Explore tab
  • Make sure a Cloud Node is selected (upper right)
  • Select Cloud Node Files and open the file below
  • Home > samples > insurance.csv
  • Select Analyze > Profile

After a few minutes, you will see the data profile completed.

  • Scroll down to find charges, and click its name to expand

You will see several statistical calculations about this data. On the right, you will also see correlations.

For instance, we can see that there is a very high correlation (0.79) between an individual smoking cigarettes and the insurance charges they pay.

  • Scroll up and click Associations

You will see a matrix view of the correlations and how strong the correlation is.

Categorical columns (e.g. male / female, smoker yes / no) are shown in squares and numerical columns (e.g. age, bmi) are displayed in circles.

In this example, all correlations are positive, so they are displayed with blue. Negative correlations are displayed in red. E.g. If we had a column about how fast someone can run, age column would probably have a negative correlation. I.e. if someone is younger, they would run faster.

Profiling for difference

You can make changes to your data and then create a profile to compare with the original.

  • Go back to the insurance dataset
  • Under smoker column click on a cell that has the no value
  • Select Filter to keep cell value
  • Select Analyze > Profile again
  • You will be asked if you would like to compare with the original data set, select yes

You will now see all the statistics for the original and current data and can compare them side-by-side.

< Previous | Next >