Skip to content

Introduction to Data Preparation

This section only requires Practicus AI app and can work offline.

Let's start by loading boston.csv. We will ignore the meaning of this dataset since we will only use it to manipulate values.

Click on Prepare button to view some common data preparation actions

Sorting Data

  • Click on CRIM column
  • Hit Ctrl (Cmd for macOS) + down arrow to sort ascending
  • Hit Ctrl (Cmd for macOS) + up arrow to sort descending

You can also open the advanced sort menu by clicking Prepare > Sort

Filtering Data

There are several ways to filter data:

  • Click on RM column to view the mini-histogram
  • Click on the left most distribution to view filter menu
  • Select Filter to keep >= 4.605

  • Click on a cell value in INDUS column, e.g. 6.2
  • Select Filter to keep <

This will remove all INDUS values greater than 6.2

Updating Wrong Values

  • Click on any cell with 12.5 in ZN column
  • Select Change Values

  • Enter 100, click ok

You will see that ALL cells with the value 12.5 in ZN column will be updated to 100

Please note that Practicus AI does not allow you to update the value of an individual cell only. All updates need to be rule-based. The reason for this is to be able to create production data pipelines. E.g. what you design can be used on fresh data every night, automatically. Individual cell updates do not work for this scenario.

Formulas

Practicus AI supports 200+ Excel compatible functions to write formulas. If you can use formulas in Excel, you can in Practicus AI.

  • Click on the Formula button to open up the formula designer

  • Select ISODD function under Math section to find odd numbers
  • You will be asked to choose a column, select RAD column as Number
  • Click Add

You can use the designer to build the formula or type by hand. You can also create them in Excel and copy / paste. Unlike Excel, you do not use cell references e.g. A5, D8, but column names directly like the below:

  • Leave the column name as suggested: ISODD_RAD
  • Click ok to run the formula
  • You will see a new column named ISODD_RAD added to the dataset
  • Click on ISODD_RAD column to select and hit Ctrl (Cmd in macOS) + left arrow key to move the column to left. Keep doing it until it is next to RAD column

Advanced filter

  • Click on INDUS column name to select and then Prepare > Filter
  • Advanced filter designer will open and INDUS column already selected
  • Select <= Less Than and Equal as criteria
  • Click on ... and choose 6.91 select ok
  • Click on Add
  • Now, select our newly created column ISODD_RAD instead of INDUS
  • Leave Is True and click Add

You will see a filter statement is getting created like the below. You can use brackets and keywords such as (, ), and, or, to build the filter you need.

After applying your filter you will see the new data set.

  • Click on INDUS column
  • Hit Ctrl (or Cmd) + up arrow to sort descending
  • Hit Ctrl (or Cmd) + right arrow to move the column next to RAD

You can view the columns we are working on listed together, like the below

Viewing and updating Steps

You can make mistakes while manipulating data, and it is possible to fix these without starting from scratch.

  • Click on Steps button

You will view the current steps so far

  • Select the filter step
  • Hit Edit step button
  • Hint: Practicus AI detects the types of variables and does type conversion automatically. You can see this in steps 2 and 3.

  • Change the filter value from 6.91 to 5
  • Click ok

You will see the updated step in green

  • Click ok to make the change

You will see that INDUS column is now less than 5

Please note that updating steps will reset the column order, such as moving columns left / right or hiding them.

Instead of opening the Steps dialog you can quickly undo / redo as well:

  • Hit Ctrl (or Cmd) + Z to undo a step.
  • Do it a few more times, you will see data is updated automatically
  • Now, hit Ctrl (or Cmd) + Y few times to redo the steps you undid

< Previous | Next >