Now that we have some ideas on what is data and what we can do, we can start getting into the actions! Let us not rush ourselves however, because there are a few things we need to do first. I have been mentioning tools and models for analysis for a few posts. In this post, we will set up our Python workbench to use those tools. So, let get started!
Why Python?
Actually before discussing “why Python”, let us talk about what Python is. Essentially, Python is an interactive programming language. Like everything else, it has certain advantages and disadvantages, but right now, Python is considered the language of data science. Why? There are many programming languages besides Python like Java, C++, C#, Ruby, Pearl, etc. Some of them are even specialized for data analytics like SAS, R, Scala. So how did Python get its popularity in data science?
First, Python is the easier language to start learning. It is more flexible than those like Java or C++. It is also an interactive language, so you can start running your code line-by-line and immediately. This is tremendously useful for beginners to observe and understand the specific code statements before piecing them together.
Second, Python has an enormous community who continuously developed new tools for data analytics. Sure, languages like R also get a lot of efforts from researchers and developers. Regardless, in my opinion, none has as much community supports as Python. So, You gain access to this endless source assistances just by using Python.
Overall, I do believe that Python is the language for data science. It is surely the one I love the most. Next, we will work on setting up your own Python workbench.
Setting up your Python workbench
First, go to the Python site to download and install the correct version for your computer. In a Linux or Mac OS computer, you should have Python already, but it is no harms to install a newer version. Installation is very straight forward, just click Install Now and Next until the end. Notably, remember to select “Add Python x.x to PATH” if being asked.
When the Python installation has finished, test it by open a new CMD/terminal windows and run the command “python”. If you are redirected to the Python shell, installation is successful. To exit, type exit()
and press enter to exit the Python shell, or close the CMD/terminal window.
Installing libraries for your Python workbench
Base Python does not have much for data analytics. Rather, its strength comes from the gigantic source of publicly developed libraries for analysis and modeling. To start with, we will use the most common libraries in a Python workbench for data science:
– Numpy for numerical operation and manipulation
– Pandas for data operation and manipulation
– Matplotlib for data visualization
– Scikit-learn for advance data operation and machine learning models
– Jupyter for interactive and convenient composing of Python codes
To install the packages in Windows, open a new CMD window and run
pip install numpy
pip install pandas
pip install matplotlib
pip install scikit-learn
pip install notebook
To install the packages in Linux or Mac, open a new terminal window and run
sudo pip install numpy
sudo pip install pandas
sudo pip install matplotlib
sudo pip install scikit-learn
sudo pip install notebook
Verifying your workbench
We will discuss Jupyter notebook in the next post. For now, you can verify that you have installed numpy, pandas, matplotlib, and scikit-learn successfully by open a CMD/terminal, and paste and run the following codes. Do not worry if you have no ideas what you have just ran. We will get to know all of them later on.
import numpy as np
from matplotlib import pyplot as plt
from sklearn.svm import SVC
import pandas as pd
X = np.random.normal(0,1,50)
Y = (X > 0.5) * 1
data = pd.DataFrame({'X':X,'Y':Y})
svc = SVC()
svc.fit(data[['X']],data['Y'])
print(svc.score(data[['X']],data['Y']))
plt.plot(X)
plt.show()
If you see results similar to the image below instead of error messages, you have successfully set up your Python workbench!
What’s next!
You have done an excellent job on setting up your Python environment. Next, we will get used to Jupyter, basic Python programming, and the most excited part, data analytics! So, see you in my future posts.
Pingback: Jupyter Notebook - Data Science from a Practical Perspective
Pingback: NumPy Arrays - Data Science from a Practical Perspective