A collection is not a strange concept. It is just a bunch of objects put together for a certain reason. You may have a collection of pictures, songs, books, movies, stamps… almost anything. Collections in programming and collections in Python follows the same line of idea. They are objects of some types put together for ease of accesses and uses for later. There are many types of collections in Python. In this post, I will briefly describe three of them, namely lists, tuples, and dictionaries. You can download the complete notebook here.
Lists in Python
Lists are called ordered collections. This means items in a list are in orders, and we can access them using their indexes. First, let us talk about how to create a list. If you put stuffs together, separated by commas ,
, and wrapped them inside a pair of square brackets []
, you have just got yourself a list. More formally, the syntax is as follows
list_name = [<val_1>, <val_2>, ...]
with list_name
being the variable that will store the list, and var_1
, var_2
,… being the items you want to put into the list. In Python, these values can be pretty much anything. Furthermore, you can have as many items in a list as you want (as long as the computer has enough memory). And, a list can contains variables, as long as you have defined them before hand. If you access the variable that refers to a list, you will get all items.
#a list of number
a_list = [1,2,3,4,5]
#we can use print() to print out all items in a list
print(a_list)
[1, 2, 3, 4, 5]
#a list of string
another_list = ['a','b','c','d','e']
print(another_list)
['a', 'b', 'c', 'd', 'e']
#a list of variables, remember, you need to create them first
x = 10
y = 20
z = 30
list_4 = [x,y,z]
print(list_4)
[10, 20, 30]
Indexing in lists
Lists index items using integer numbers starting from 0. You can access an item using its index value. The syntax is list_name[<item_index>]
. For example
a_list = [10,4,6,6,12,61,78,34,90,73]
print(a_list[0])
print(a_list[1])
print(a_list[5])
print(a_list[9])
10 4 61 73
In the code above, a_list[0]
takes the first item in the list a_list
, which is 10
, and results in the first print()
displaying 10
as output. Similarly, a_list[5]
and a_list[9]
result in the sixth and tenth (also last) items, which are 61
and 73
. So, indexes of items in a list starts from 0
and ends at list size - 1
. Additionally, Python has a negative index system to access a list from the end, which starts from -1
as the last item and ends at - list size
for the first item. The cell below showcases some negative indexes.
print(a_list[-1])
print(a_list[-2])
print(a_list[-5])
73 90 61
Slicing lists
Index let us access individual items in a list. For multiple items, we use the slicing technique. The syntax of slicing is list_name[start:stop:step]
. Here, the start:stop:step
syntax generate an index sequence very similar to how range() does:
– Consists of integer number
– Begins from start
– Ends as close as possible to stop
– Incremented by step
.
You can omit either start
or stop
which implies slicing the list from the beginning or until the end. Omitting step
means increment is 1
. The few cells below demonstrate slicing of a_list
create previously.
a_list[0:5]
[10, 4, 6, 6, 12]
a_list[3:9]
[6, 12, 61, 78, 34, 90]
a_list[2:-1]
[6, 6, 12, 61, 78, 34, 90]
a_list[1:-2:2]
[4, 6, 61, 34]
a_list[8:3:-2]
[90, 78, 12]
a_list[::-1]
[73, 90, 34, 78, 61, 12, 6, 6, 4, 10]
As you can see, step
can be negative, in which case, we will slice the list from the end back to the beginning.
Lists and for loop
Lists match nicely with for loop in that we can use a for loop to iterate through each item in a list. We do that by simply replace the range()
part in a for loop the list’s name. Besides printing, you can use accumulator to get the sum of all items. For examples,
for item in a_list:
print(item)
10 4 6 6 12 61 78 34 90 73
accumulator = 0
for item in a_list:
accumulator = accumulator + item
print(accumulator)
374
Tuples in Python
Tuples are similar to list in that they both store items in order. You can also access items in tuples using indexes and slices. The difference between tuples and lists is that lists are mutable and tuples are immutable. More specifically, you can change items in a list after creating it, but items in a tuple cannot be modified. The cells below demonstrate the mutability and immutability of lists and tuples. By the way, creating tuples is just like creating lists but simply replacing the brackets []
with parentheses ()
.
a_list = [10,20,30]
a_list[1] = 100
print(a_list)
[10, 100, 30]
a_tuple = (10,20,30)
print(a_tuple)
(10, 20, 30)
a_tuple[1] = 100
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-99-91bc41c29b37> in <module> ----> 1 a_tuple[1] = 100 TypeError: 'tuple' object does not support item assignment
Due to their immutability, tuples are safer to use because you cannot accidently change their contents. They are also a bit faster than lists. So, depending on your needs, you can choose between lists and tuples.
Dictionaries in Python
Dictionaries are another type of collections in Python which store elements that are called key-value pairs. A key-value pair is similar to a word and its meaning that you can look up in a real dictionary. With Python dictionaries, you can use a key to locate and obtain a specific value. Dictionaries do not order their items, so we cannot use indexes and slices on them. The syntax to create a dictionary is as follows
dictionary_name = {
โkey_1 : value_1,
โkey_2 : value_2,
โ...
}
In a key:value
pair, the key
is usually a string (it can be something else, but that is a story for another day), and the value
can be pretty much anything. Like lists, you can have as many items in dictionaries as you want. To obtain the value of a key, we use the syntax dictionary_name[key]
which is similar to a list, but the “index” is a key. Below are some examples of creating a dictionary and accessing its items.
state_capitals={
'New York': 'Albany',
'New Jersey': 'Trenton',
'Georgia' : 'Atlanta',
'Texas' : 'Austin',
'Washington' : 'Olympia'
}
state_capitals
{'New York': 'Albany', 'New Jersey': 'Trenton', 'Georgia': 'Atlanta', 'Texas': 'Austin', 'Washington': 'Olympia'}
print(state_capitals)
{'New York': 'Albany', 'New Jersey': 'Trenton', 'Georgia': 'Atlanta', 'Texas': 'Austin'}
state_capitals['New York']
'Albany'
state_capitals['Georgia']
'Atlanta'
Finally, we can add new key-value pairs to a dictionary by the syntax dictionary_name[new_key] = new_value
. Be careful though, because if new_key
is already in the dictionary, the old value will be overwritten. For example
state_capitals['Florida'] = 'Tallahassee'
state_capitals['Alabama'] = 'Montgomery'
state_capitals['Florida']
'Tallahassee'
state_capitals['New York'] = "I don't know!"
state_capitals['New York']
"I don't know!"
Conclusion
In this post, I discussed three common types of collections in Python, lists, tuples, and dictionaries. You will see them pretty often in data analysis, so understanding them from now will be helpful. Also, the concepts of indexing and slicing are highly important because we do that on data sets as well. Please do practice with these two skills until you are comfortable. I will stop this long post now, so see you next time!
Pingback: NumPy Arrays - Data Science from a Practical Perspective
Pingback: Loading Data with Pandas - Data Science from a Practical Perspective