Definitions of data from different sources
How many definitions of data are there? A quick google of the term “data” gives you the definition “facts and statistics collected together for reference or analysis” as the first result. Similarly, Wikipedia defines data as “collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted“.
I asked my wife, “what is data?”. She said, “data is a bunch of numbers“.
In this era where people digitize everything, my wife’s turned out to be the most correct. At the lowest level, we store data in computer systems. So, it is indeed a bunch of (binary) numbers. Nevertheless, in any definitions, from Google’s to my wife’s, data is always a collection of something.
My personal view
Personally, I think that the answer to “what is data” has expanded quite a bit in modern data science. No longer does data only comprise of numbers or values. A folder of images you have just downloaded from the Internet can be data. Your collection of favorite songs is data. The series of YouTube videos you have just watched during work times is data. Even this blog, including this article, is data. So, what is data? In my very informal definition, data is a collection of informational objects belonging to any entities. However, the entity of your data is not necessarily a bunch of numbers or values. It can be anything! For examples, we may have a data set of images, of videos, of news articles, of locations and their interrelationships, etc.
Back to the ultimate goal of data analytics of discovering knowledge from data. How? Very fortunately, we now have a huge set of tools to analyze any types of data in any ways you can think of. And slowly but surely, we will learn all (or most) of such tools together. We will first start with the simplest form of data – tabular data, in my next post.