Now that we have established the concept of tabular data, let discuss what we can do with them. The goal of data science is to discover useful knowledge from mountains of data. So, what kind of knowledge we can learn from our data, and how do we do that? In the remaining of this post, I will discuss three among the most common types of analysis that we can do on tabular data.
Exploratory data analysis
Exploratory analysis is probably among the most basic types in data analytics. It involves getting an overview of what you have in your data set. Here, “an overview” means things like where values in each column lie, whether your data has extreme values or weird values, or any other unusual issues that may present in the data. Additionally, we are interested in useful information such as which features are strongly related to each other, or any interesting patterns that occur in your data. Tools for exploratory analysis include figures, charts, statistics, etc.; basically, anything that can describe the data in an intuitive way.
Due to its usefulness and purpose, we usually perform exploratory analysis at the beginning of a project. The exploration results will then inform our later steps such as data processing, the focus of analyses, etc.
Hypothesis testing
In this type of analysis, you perform certain statistical tests to quantify the possibility of a hypothesis related to your data. For example, you may test the hypothesis of whether students in a college have an average GPA over 3.0, or students in the CS department have higher average GPA than those in the IT department.
It is important to note the use of hypothesis tests is to evaluate the plausibility of a hypothesis and not to certainly conclude if it is true or not. You cannot state that, for example, CS students indeed have higher average GPA than IT students, even if the tests support that hypothesis. Instead, such results should be interpreted as CS students’ GPAs are statistically higher than that of IT students with some levels of confidences. Hypothesis testing is common in clinical trials where a series of experiments and tests are perform to evaluate the effectiveness of new treatments.
Predictive data analysis
This is among the most common types of data analysis with a huge tool set that evolves rapidly everyday. In short, predictive analysis aims to learn useful patterns or knowledge from data collected historically. The learned knowledge is then applied to future or new data to make inferences that can be used for decision making.
There are countless of examples for predictive analysis in today’s world. A retailer may try to analyze purchase patterns of historical customers to customize their advertising strategies for current and future ones. Manufacturers can analyze historical sales trends of their products to make prediction for near future and assign their resources accordingly. Banks could try to analyze historical transactions to determine fraud patterns which will be used to flag new ones. You yourself can try to analyze historical patterns of stock data to decide to buy or sell in the future. The list can just go on and on.
Predictive analysis is probably among the most active area in data science. Researchers are putting efforts into generate new predictive techniques everyday. Companies are hiring more and more data scientists to perform their predictive analysis. So, if you want to learn data science, this is definitely an unignorable topic!
Conclusion
The three types of analysis that I discussed in this post are, by no means, the only three. They are just among the most common ones that you may encounter as you start learning data science. Overall, I hope that I did a good job summarizing these types. Details and hands-on for each will surely be provided later on!