Preprocessing Archives - Data Science from a Practical Perspective

Data Pipeline

By tienlinhle32 March 20, 2023Data Science, Preprocessing2 Comments

The pipeline that we have learned previously is very useful but only performs a fixed sequence of transformation on the input. However, more often than not, we want to apply…

Processing Pipeline

By tienlinhle32 March 18, 2023Data Science, Preprocessing1 Comment

At this point, we have gone through quite some preprocessing methods for different issues in data such as handling outliers, scaling, imputation, encoding, etc. So, I think it is now…

Train-test Split

By tienlinhle32 March 15, 2023Data Science, Preprocessing1 Comment

Predictive analysis is a major branch of data analytics where we want to apply knowledge learned from historical data on new data. However, there is one potential issue in this…

Encode Categorical Data

By tienlinhle32 March 14, 2023Data Science, Preprocessing1 Comment

Categories are a big part in tabular data. You will see them more often than not, and it is just inevitable. However, a lot of analytical models cannot handle categorical…

Scale Numeric Data

By tienlinhle32 March 10, 2023Data Science, Preprocessing1 Comment

It is very often that we have numeric columns with very different scales in the same data set. For example, a data set may have people's income in the range…

Handle Missing Data

By tienlinhle32 March 6, 2023Data Science, Preprocessing2 Comments

Missing data is prevalent in analytics. They are fields in your data without a valid value, and they must be addressed. Otherwise, most analytical models would omit data that has…

Handling Outliers

By tienlinhle32 March 1, 2023Data Science, Preprocessing2 Comments

Now that we have had a good idea on what to do or can be done during an exploratory analysis, it is time to move on to data preprocessing! So,…

Handle Skewed Data

By tienlinhle32 February 25, 2023Data Science, Preprocessing

A while ago, we discussed distributions of numeric columns. Depending on the types of analysis, sometimes a symmetrical distribution is preferred over a skewed one. So, in this post, I…