The pipeline that we have learned previously is very useful but only performs a fixed sequence of transformation on the input. However, more often than not, we want to apply…
At this point, we have gone through quite some preprocessing methods for different issues in data such as handling outliers, scaling, imputation, encoding, etc. So, I think it is now…
Predictive analysis is a major branch of data analytics where we want to apply knowledge learned from historical data on new data. However, there is one potential issue in this…
Categories are a big part in tabular data. You will see them more often than not, and it is just inevitable. However, a lot of analytical models cannot handle categorical…
It is very often that we have numeric columns with very different scales in the same data set. For example, a data set may have people's income in the range…
Missing data is prevalent in analytics. They are fields in your data without a valid value, and they must be addressed. Otherwise, most analytical models would omit data that has…
A while ago, we discussed distributions of numeric columns. Depending on the types of analysis, sometimes a symmetrical distribution is preferred over a skewed one. So, in this post, I…