So far, we have discussed distribution analysis and correlation analysis when initially exploring data. One important task in this phase is to determine whether there is anything that will cause…
Previously, we have learned about tools for correlation analysis on numeric columns. By now though, you should have known that numeric is not the only type of data. In fact,…
Previously, we have learned to analyze distributions of numeric and categorical columns. However, those techniques only focus on one individual column at a time. In exploratory analysis, we have a…
Surely after discussing distribution analysis on numeric data, we will move on to categorical data, right? Of course! In this post, we will discuss tools for performing analysis on categorical…
With an overview understanding about distribution analysis, let us actually perform those, starting with numerical data. Obviously, we will be using a mixture of Pandas and Matplotlib - a powerful…
Sun Tzu once said "know your data, know your models, a hundred analyses, a hundred wins", or something along that line. See, people in the mediaeval times knew the importance…
At this point, we have obtained a good amount of understanding and hands-on about NumPy arrays and Pandas dataframes. We can now start some analysis. And, the first one that…
While certainly useful in some cases, concatenating dataframes is fairly problematic because of its strict requirement on row orders. You may end up with wrong and meaningless results even with…
Previously, we have discussed basic data concatenation with NumPy arrays. In Pandas, concatenating dataframes is also a thing, however with a few differences. The operation no longer requires equal shapes…
So far, we have only been discussing operations with numbers, so you may start wondering if we would ever talk about text data, right? Sure, why don't we do that…