an illustrations of basic NumPy operations including adding arrays, functions, and dot product

Previously, we have talked about creating and slicing NumPy arrays. Now, let us see what else we can do with them. In short, a lot! The library has a huge amount of tools for numbers, vectors, matrices, tensors, etc. For tabular data analytics, there are also numerous useful NumPy operations that we can use. In this post, I will introduce the basic elementwise operations and some common functions in NumPy. You can access the complete codes in this notebook.

Basic elementwise NumPy operations

All the basic mathematical calculations in Python including +, -, *, /, %, //, and ** are applicable to NumPy arrays. Similarly, comparisons that are <, <=, ==, >=, >, != are also usable. All of these operations are elementwise, meaning that all items in the arrays go through the same calculations and generate an array of all the results. You can see an illustration in the following figure.

an illustration of elementwise NumPy operations between arrays and numbers

For examples (by the way, do not forget to import and alias NumPy in your new Jupyter session), you can see that +10, -5, and **2 are performed on the whole array as below.

In [1]:

import numpy as np

an_array = np.array([1, 5, 4, 9 ,7])

In [3]:

an_array + 10

Out[3]:

array([11, 15, 14, 19, 17])

In [4]:

an_array - 5

Out[4]:

array([-4,  0, -1,  4,  2])

In [6]:

an_array ** 2

Out[6]:

array([ 1, 25, 16, 81, 49])

Likewise, comparisons between an array and an individual number result in each item going through the same expressions. The result is now an array of the Boolean type.

In [10]:

an_array > 5

Out[10]:

array([False, False, False,  True,  True])

In [11]:

an_array < 5

Out[11]:

array([ True, False,  True, False, False])

In [14]:

an_array == 5

Out[14]:

array([False,  True, False, False, False])

You can also have all the previously mentioned operations between two arrays. In the simplest case, the two arrays must have the same size (same numbers of rows and columns). Generally, this condition is not that strict due to NumPy broadcasting, but this is the topic for another day. For now, let us assume that the two arrays have the same sizes. This type of NumPy operations then generates a new array in which each item is the result from items at the same positions in the two inputs. An illustration and examples are below.

An illustration of elementwise operations between arrays

In [19]:

array1 = np.array([5,1,8,6,2])
array2 = np.array([7,4,3,0,5])

In [20]:

array1 + array2

Out[20]:

array([12,  5, 11,  6,  7])

In [21]:

array1 * array2

Out[21]:

array([35,  4, 24,  0, 10])

In [22]:

array1 > array2

Out[22]:

array([False, False,  True,  True, False])

Functions in NumPy Operations

Besides regular operations, NumPy also provides a big collections of functions from power, logarithm, to arithmetic, to trigonometrical, and much more. You can find a complete list in the library’s documentation. Some of the most common functions from my perspective are log(), exp(), sum(), mean(), std(), var(). All these functions come from NumPy, so we use the library’s name or alias to call them. For examples,

In [40]:

an_array = np.array([6,4,3,5,2])

In [43]:

np.log(an_array)

Out[43]:

array([1.79175947, 1.38629436, 1.09861229, 1.60943791, 0.69314718])

In [44]:

np.sin(an_array)

Out[44]:

array([-0.2794155 , -0.7568025 ,  0.14112001, -0.95892427,  0.90929743])

In short, trigonometrical and mathematical functions are elementwise – they yield an array having the results from applying the function on each item as you can see above with log() and sin(). On the other hand, statistical functions summarize the inputs and generate the result for the whole array, or each row/column in a smaller array. I will first showcase these functions applying on the whole inputs.

In [46]:

np.sum(an_array)

Out[46]:

In [47]:

np.mean(an_array)

Out[47]:

4.0

In [48]:

np.median(an_array)

Out[48]:

4.0

In [49]:

np.var(an_array)

Out[49]:

2.0

In [50]:

np.std(an_array)

Out[50]:

1.4142135623730951

The `axis` option

To use these functions to obtain the statistics of items along rows or columns, we need to add an argument axis=. axis=0 means the functions result one value for each column, and axis=1 yields one value for each row. Below are an illustration with mean() and different options for axis.

an illustration of summarized function mean() applying on a numpy array with axis=0 and axis=1

Now let us observe the code below. We can see that axis=0 generates four results with ranges similar to that of the four columns, so these are their means. In contrast, axis=1 creates five fairly similar numbers, showing that they are the mean for each row. You can do the same thing with the other statistical functions, so do try that out.

In [54]:

data = np.array([
    [3, 20, 100, 392],
    [2, 14, 89, 453],
    [5, 11, 153, 412],
    [1, 24, 121, 312],
    [3, 22, 90, 431]
])

In [55]:

np.mean(data, axis=0)

Out[55]:

array([  2.8,  18.2, 110.6, 400. ])

In [56]:

np.mean(data, axis=1)

Out[56]:

array([128.75, 139.5 , 145.25, 114.5 , 136.5 ])

Applications in data analytics

Needless to say, all the operations and functions that I mentioned, in combinations with array slicing, are widely use throughout data analysis. So, let me give a small data example and see what we can do with it. The data below contains three years of GPAs of several students:

Student ID	First year GPA	Second year GPA	Current GPA
0001252	3.12	3.31	3.54
0003215	2.57	2.59	2.55
0002324	2.39	2.78	3.11
0001012	3.21	2.91	2.73
0002151	3.52	3.55	3.62

First, we create the data as a NumPy array. You can notice that I do not include student IDs here because they are not of interests at the moment.

In [57]:

gpas = np.array([
    [3.12, 3.31, 3.54],
    [2.57, 2.59, 2.55],
    [2.39, 2.78, 3.11],
    [3.21, 2.91, 2.73],
    [3.52, 3.55, 3.62]
])

We will talk about exploratory analysis later, but means and standard deviations of columns are always nice to look at to have a general ideas of their values. We can now do that using NumPy. An example on how to interpret them is that the average GPA of students in the first year is 2.962, and on average, a student’s GPA deviate within 0.419 from 2.962. By the way, we can use functions as input to other functions as you can see here. I put these two in print() so that their outputs both show up in the same cell.

In [62]:

print('means of GPAs:', gpas.mean(axis=0))
print('standard deviations of GPAs:',gpas.std(axis=0))

means of GPAs: [2.962 3.028 3.11 ]
standard deviations of GPAs: [0.41920878 0.35193181 0.42497059]

To look at the changes in GPAs of the students throughout the years, we slice the columns and take their differences. For examples, the changes from year 1 to year 2, and year 2 to year 3. We can see which students made the most improvement, or who lost performances.

In [63]:

gpas[:,1] - gpas[:,0]

Out[63]:

array([ 0.19,  0.02,  0.39, -0.3 ,  0.03])

In [65]:

gpas[:,2] - gpas[:,1]

Out[65]:

array([ 0.23, -0.04,  0.33, -0.18,  0.07])

Finally, a log transformation is very commonly applied on data. We can perform that here very easily.

In [66]:

np.log(gpas)

Out[66]:

array([[1.137833  , 1.19694819, 1.26412673],
       [0.9439059 , 0.95165788, 0.93609336],
       [0.87129337, 1.02245093, 1.13462273],
       [1.16627094, 1.06815308, 1.00430161],
       [1.25846099, 1.2669476 , 1.28647403]])

Conclusion

In this post, I briefly introduce the basics of NumPy elementwise operations and some common functions. Of course, the library can do much more. Next, we will discuss the concept of concatenation. So, see you there!

Basic elementwise NumPy operations

Functions in NumPy Operations

The axis option

Applications in data analytics

Conclusion

2 Comments

The `axis` option