8. Basic Programming#
Machine dreams hold a special vertigo.
–William Gibson[1]
Up to this point in the book I’ve tried hard to avoid using the word “programming” too much because – at least in my experience – it’s a word that can cause a lot of fear. For one reason or another, programming (like mathematics and statistics) is often perceived by people on the “outside” as a black art, a magical skill that can be learned only by some kind of super-nerd. I think this is a shame. It’s certainly true that advanced programming is a very specialised skill: several different skills actually, since there’s quite a lot of different kinds of programming out there. However, the basics of programming aren’t all that hard, and you can accomplish a lot of very impressive things just using those basics.
With that in mind, the goal of this chapter is to discuss a few basic programming concepts and how to apply them in Python. However, before I do, I want to make one further attempt to point out just how non-magical programming really is, via one very simple observation: you already know how to do it. Stripped to its essentials, programming is nothing more (and nothing less) than the process of writing out a bunch of instructions that a computer can understand. To phrase this slightly differently, when you write a computer program, you need to write it in a programming language that the computer knows how to interpret. Python is one such language. Although I’ve been having you type all your commands at the command prompt, and all the commands in this book so far have been shown as if that’s what I were doing, it’s also quite possible (and as you’ll see shortly, shockingly easy) to write a program using these Python commands. In other words, if this is the first time reading this book, then you’re only one short chapter away from being able to legitimately claim that you can program in Python, albeit at a beginner’s level.
8.1. Scripts#
Computer programs come in quite a few different forms: the kind of program that we’re mostly interested in from the perspective of everyday data analysis using Python is known as a script. The idea behind a script is that, instead of typing your commands into the Python console one at a time, instead you write them all in a text file, or in a “notebook”, if you are using Jupyter Notebooks. Then, once you’ve finished writing them and saved the file, you can get Python to execute all the commands at once. In a moment I’ll show you exactly how this is done, but first I’d better explain why you should care.
8.1.1. Why use scripts?#
Before discussing scripting and programming concepts in any more detail, it’s worth stopping to ask why you should bother. After all, if you look at the Python commands that I’ve used everywhere else this book, you’ll notice that they’re all formatted as if I were typing them at the command line. Outside this chapter you won’t actually see any scripts. Do not be fooled by this. The reason that I’ve done it that way is purely for pedagogical reasons. My goal in this book is to teach statistics and to teach Python. To that end, what I’ve needed to do is chop everything up into tiny little slices: each section tends to focus on one kind of statistical concept, and only a smallish number of Python functions. As much as possible, I want you to see what each function does in isolation, one command at a time. By forcing myself to write everything as if it were being typed at the command line, it imposes a kind of discipline on me: it prevents me from piecing together lots of commands into one big script. From a teaching (and learning) perspective I think that’s the right thing to do… but from a data analysis perspective, it is not. When you start analysing real world data sets, you will rapidly find yourself needing to write scripts.
To understand why scripts are so very useful, it may be helpful to consider the drawbacks to typing commands directly at the command prompt. The approach that we’ve been adopting so far, in which you type commands one at a time, and Python sits there patiently in between commands, is referred to as the interactive style. Doing your data analysis this way is rather like having a conversation … a very annoying conversation between you and your data set, in which you and the data aren’t directly speaking to each other, and so you have to rely on Python to pass messages back and forth. This approach makes a lot of sense when you’re just trying out a few ideas: maybe you’re trying to figure out what analyses are sensible for your data, or maybe just you’re trying to remember how the various Python functions work, so you’re just typing in a few commands until you get the one you want. In other words, the interactive style is very useful as a tool for exploring your data. However, it has a number of drawbacks:
It’s hard to save your work effectively. You can save the workspace, so that later on you can load any variables you created. You can save your plots as images. And you can even save the history or copy the contents of the Python console to a file. Taken together, all these things let you create a reasonably decent record of what you did. But it does leave a lot to be desired. It seems like you ought to be able to save a single file that Python could use (in conjunction with your raw data files) and reproduce everything (or at least, everything interesting) that you did during your data analysis.
It’s annoying to have to go back to the beginning when you make a mistake. Suppose you’ve just spent the last two hours typing in commands. Over the course of this time you’ve created lots of new variables and run lots of analyses. Then suddenly you realise that there was a nasty typo in the first command you typed, so all of your later numbers are wrong. Now you have to fix that first command, and then spend another hour or so combing through the Python history to try and recreate what you did.
You can’t leave notes for yourself. Sure, you can scribble down some notes on a piece of paper, or even save a Word document that summarises what you did. But what you really want to be able to do is write down an English translation of your Python commands, preferably right “next to” the commands themselves. That way, you can look back at what you’ve done and actually remember what you were doing. In the simple exercises we’ve engaged in so far, it hasn’t been all that hard to remember what you were doing or why you were doing it, but only because everything we’ve done could be done using only a few commands, and you’ve never been asked to reproduce your analysis six months after you originally did it! When your data analysis starts involving hundreds of variables, and requires quite complicated commands to work, then you really, really need to leave yourself some notes to explain your analysis to, well, yourself.
It’s nearly impossible to reuse your analyses later, or adapt them to similar problems. Suppose that, sometime in January, you are handed a difficult data analysis problem. After working on it for ages, you figure out some really clever tricks that can be used to solve it. Then, in September, you get handed a really similar problem. You can sort of remember what you did, but not very well. You’d like to have a clean record of what you did last time, how you did it, and why you did it the way you did. Something like that would really help you solve this new problem.
It’s hard to do anything except the basics. There’s a nasty side effect of these problems. Typos are inevitable. Even the best data analyst in the world makes a lot of mistakes. So the chance that you’ll be able to string together dozens of correct Python commands in a row are very small. So unless you have some way around this problem, you’ll never really be able to do anything other than simple analyses.
It’s difficult to share your work other people. Because you don’t have this nice clean record of what Python commands were involved in your analysis, it’s not easy to share your work with other people. Sure, you can send them all the data files you’ve saved, and your history and console logs, and even the little notes you wrote to yourself, but odds are pretty good that no-one else will really understand what’s going on (trust me on this: I’ve been handed lots of random bits of output from people who’ve been analysing their data, and it makes very little sense unless you’ve got the original person who did the work sitting right next to you explaining what you’re looking at)
Ideally, what you’d like to be able to do is something like this… Suppose you start out with a data set myrawdata.csv
. What you want is a single document – let’s call it mydataanalysis.py
– that stores all of the commands that you’ve used in order to do your data analysis. It would only include the commands that you want to keep for later. Then, later on, instead of typing in all those commands again, you’d just tell Python to run all of the commands that are stored in mydataanalysis.py
. Also, in order to help you make sense of all those commands, what you’d want is the ability to add some notes or comments within the file, so that anyone reading the document for themselves would be able to understand what each of the commands actually does. But these comments wouldn’t get in the way: when you try to get Python to run mydataanalysis.py
it would be smart enough would recognise that these comments are for the benefit of humans, and so it would ignore them. Later on you could tweak a few of the commands inside the file (maybe in a new file called mynewdatanalaysis.py
) so that you can adapt an old analysis to be able to handle a new problem. And you could email your friends and colleagues a copy of this file so that they can reproduce your analysis themselves.
In other words, what you want is a script. The mechanics of exactly where you write your script, and how you run it are a bit beyond what I can cover here. You could write your script in a text file, save it with a file name that ends in .py, and then run it with a terminal command. You could use a so-called IDE (Integrated Development Environment), basically a program for writing programs. At the time I am writing this, a very popular IDE which can support many different programming languages is Visual Studio Code, but there are many other good ones out there. I’m not even going to try to list them, because I would be leaving too many good options out.
Another very popular way to write scripts is to use something called Jupyter Notebooks. Jupyter Notebooks are a way to write your code in little cells, rather than in one long text document. This allows you to run individual parts of your script seperately, rather than running the whole thing at once, and this can be very useful for figuring stuff out. You can work on one cell until you get it to do what you want, then work on the next cell. Later, you may start combining cells so that you can run bigger and bigger chunks of code at once. Eventually you may find you want to simply copy all the code out of your Jupyter Notebook and paste it into a text document with a .py file extenstion, so you can just run it all at once. For learning coding, and for developing new ideas and analyses, Jupyter Notebooks are a very good option. Some programs, like Visual Studio Code, allow you to run Jupyter Notebooks inside an IDE. That’s what I’m doing as I type these words.
But I digress. There are 101 different ways to write, save, and run Python scripts. The key message here is to find a way that works for you, with the resources you have available to you. Let’s leave the question of where you will write your code behind, and turn to something more exciting: some of the basic concepts of programming.
8.2. Loops#
Oh boy, loops. For some reason, the concept of loops is often very difficult for people to grasp when they are first exposed to it. I’m not sure why, because we use loops constantly in our daily life.
Imagine you are putting sugar in your tea or coffee. Let’s say you like it very sweet. So you dip your spoon in the sugar bowl, and pour the sugar in the tea. Then you dip you spoon in the sugar bowl, and then you pour the sugar into the tea. And then you do it one more time. After three spoonfulls, you’re done.
Here’s another example. You’re baking a cake, and the recipe calls for four eggs. You line the eggs up on the counter, and then, one by one, you crack each one and pour the yoke and egg white into the mixing bowl, and throw away the shell. You do the same thing for each egg.
Both of these are examples of loops: you perform the same action over and over, until you’re done. In programming terms, the sugar-in-the-tea loop is an example of a while loop. You could put as many or as few spoonfulls of sugar in, but you keep going until the tea is sweet enough for you. Put differently, while the tea is not sweet enough, you keep doing the sugar loop. As soon as the condition has been met (the tea is no longer not swee enough), you stop. The eggs are an example of a for loop. There is a finite number of eggs (in this case four), and you keep performing the same action until you have made it through all of them. That is, for each egg, you do the same thing (crack, pour, dispose) until there are no more eggs.
Programming loops work exactly the same way. Let’s take a look.
8.2.1. While loops#
I’ll stick with the tea example for demonstrating the syntax for while loops, because I’m not very imaginative. Also, I could totally go for a cup of tea right now, but I have to stay here and type this instead.
Take a look at the code below. Python needs to know when to stop doing the same thing again and again, so we start by defining an end point. Here, I have declared that my tea will be sweet enough when Python has put 3 spoonfulls of sugar in: I have set the variable sweet_enough
to 0. In the next line, I have set a starting point. Before the loop starts, there is no sugar in my tea, so I have set the variable num_sugar_spoons
to 0. The comes the loop: as long as (while) the value of num_sugar_spoons
is less than the stopping point sweet_enough
, the loop continues adding 1 to num_sugar_spoons
. Just to keep track of the progress of the loop, I have thrown a print
statement in as well.
sweet_enough = 3
num_sugar_spoons = 0
while num_sugar_spoons < sweet_enough:
num_sugar_spoons = num_sugar_spoons + 1
print('I have added', num_sugar_spoons, 'spoons of sugar')
I have added 1 spoons of sugar
I have added 2 spoons of sugar
I have added 3 spoons of sugar
Pretty easy, right?
8.2.2. for loops#
For
loops have a very similar structure. Let’s crack some eggs:
eggs = ['egg', 'egg', 'egg']
for egg in eggs:
print('I cracked an egg!')
I cracked an egg!
I cracked an egg!
I cracked an egg!
Alright, I admit this was a pretty silly example. But hopefully you get the idea: I started with a finite number of items (in this case a list that contains three instances of the string ‘egg’), then I had Python do something for each item in the list. Let’s do a few more, slightly (but not very) more interesting ones:
ingredients = ['eggs', 'flour', 'sugar', 'cinnamon', 'salt']
for item in ingredients:
print('I added', item)
I added eggs
I added flour
I added sugar
I added cinnamon
I added salt
words = ['eggs', 'flour', 'sugar', 'cinnamon', 'salt']
for word in words:
print('There are', len(word), 'letters in', word.upper())
There are 4 letters in EGGS
There are 5 letters in FLOUR
There are 5 letters in SUGAR
There are 8 letters in CINNAMON
There are 4 letters in SALT
As these (admittedly silly) examples illustrate, we can use for
loops to step through a series of items, and do the same thing with each one. In the first example, we just printed the same string each time. In the second one, we took each item from the list, and printed it as the final word in a sentence. In the third example, we took each word, counted the number of letters in it, and then printed the answer in a sentence in which we also converted each word to all upper case letters. These were just very, very simple examples, but I want to underscore how extremely powerfull these loops are. Anything you could do to one item in a list, you can get Python to do to every item in the list. If you have a lot of actions to perform on a lot of items, this allows you to do things you would probaby never be able to do manually.
8.2.3. Loop syntax#
A final thing about loops for now: if you look at both the for
and while
loops above, you may notice some similarities in the structure. First, there is a statement of the conditions (while num_sugar_spoons < sweet_enough
, or for word in words
), followed by a colon (:
). Then, the next line or lines are indented. This syntax is crucial for telling Python that you want it run a loop. It is customary in most programming languages to use indentation to show what things are “inside” the loop, and what things are “outside”. In Python, the indentation is mandatory. Without the proper indentation, Python will not run the loop.
Another thing to mention: when you define for
loops, you also declare new variable. As an example, when we said for word in words:
, we declared a new variable, word
. Like all variables in Python, we could have called it anything we liked. Writing for x in words:
, for item in words:
, or for flibbertigibbet in words:
, would all work the same. But the same advice that applies to other variables also applies here: don’t get too cute. Give the variable a name that makes sense to you.
Finally, be aware that this new variable that you declare at the beginning of your loop will inherit its type from the item in the list that it represents. This has an impact on what you can do with it in the loop. For example, suppose we have a list like this:
random_stuff = ['apple', ' pear', 42]
for thing in random_stuff:
new_thing = thing + 's'
print(new_thing)
apples
pears
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 4
1 random_stuff = ['apple', ' pear', 42]
3 for thing in random_stuff:
----> 4 new_thing = thing + 's'
5 print(new_thing)
TypeError: unsupported operand type(s) for +: 'int' and 'str'
For the first two item in the list, Python has no problem adding a an “s” to the end of them, because they are strngs, and so is “s”, and Python understands what it means to add a string to a string. But the final item in the list is an integer, and Python has no way to add a string to an integer, and so it complains when it gets to that point in the loop.
8.3. Conditional statements#
Together with variables and loops, conditional statements are probably the third most important programming concept for you to learn. Once you master variables, loops, and conditionals, there’s not much you can’t do. Like loops, conditionals are also all around us in everyday life. Unlike loops, they don’t seem to be as difficult for most people to grasp. This semester, I teach at 8:00 AM on Wednesdays. Yikes! That means, that if it is a Wednesday, I need to leave the house earlier. Otherwise, I can leave at the normal time. It’s not hard to imagine this idea re-written as code:
day = 'Wednesday'
if day == 'Wednesday':
print('Leave early!')
Leave early!
Ok, since I started by defining day
as “Wednesday”, it was pretty clear what was going to happen. But still, did you notice the use of ==
to make the comparison between day
and the string Wednesday
?
To make this example ever-so-slightly more realistic, we could define a list variable that encompasses the whole week, and places the if
statement inside a for
loop, to do a daily check, to tell Python what to do if it is Wednesday, and an else
statement, to tell Python what to do if it is not Wednesday.
week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
for day in week:
if day == 'Wednesday':
print('Today is', day + '.', 'You need to leave early!')
else:
print('Today is', day + '.', 'You can leave at the normal time today.')
Today is Monday. You can leave at the normal time today.
Today is Tuesday. You can leave at the normal time today.
Today is Wednesday. You need to leave early!
Today is Thursday. You can leave at the normal time today.
Today is Friday. You can leave at the normal time today.
Today is Saturday. You can leave at the normal time today.
Today is Sunday. You can leave at the normal time today.
Then again, it’s not just Wednesdays I need to be aware of. If it is Saturday or Sunday, then I don’t want to leave at all; it’s the weekend. So we can add another if
statement and an or
statment to create a more complicated set of choices for Python to navigate.
week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
for day in week:
if day == 'Wednesday':
print('Today is', day + '.', 'You need to leave early!')
elif day == 'Saturday' or day == 'Sunday':
print('Today is', day + '.', 'You can sleep late!')
else:
print('Today is', day + '.', 'You can leave at the normal time today.')
Today is Monday. You can leave at the normal time today.
Today is Tuesday. You can leave at the normal time today.
Today is Wednesday. You need to leave early!
Today is Thursday. You can leave at the normal time today.
Today is Friday. You can leave at the normal time today.
Today is Saturday. You can sleep late!
Today is Sunday. You can sleep late!
8.4. Functions#
Strictly speaking, you won’t need to write any functions to do any of the exercises in this book, so if you’re not interested, you can skip this. On the other hand, functions are pretty cool, and once you get the hang of how they work, they can really speed things up for you, make your code simpler and easier to read, and maybe even save you from making some silly mistakes.
If you have been following along in the book, you’ve actually already been using functions, you just may not have known it. Consider the following code:
word = 'perspicacious'
print(word)
perspicacious
type(word)
str
len(word)
13
print
, type
, and len
are all functions. They are little machines that accept an input, in this case we have given them the variable word
, which contains the string “perspicacious”, and they use that input to provide some kind of output. These are all built-in functions in Python, and thank goodness for that, because they are very useful. But we can also write our own functions, and this becomes very useful when we want to do the same sort of task again and again.
Often functions are most useful when they do something fairly complicated, but just to illustrate how they work, lets look at something quite simple. Let’s imagine that the len
function didn’t exist, or we didn’t know about it, and we wanted a way to count the number of letters in a word. We could write our own function that would achieve this result.
Let’s think about how we might do this. If we couldn’t just tell Python to count the number of letters in a word directly, we could still get it to do it more manually, right? One way would be to use a loop:
letter_count = 0
for letter in 'perspicacious':
letter_count = letter_count + 1
print('There are', letter_count, 'letters in PERSPICACIOUS')
There are 13 letters in PERSPICACIOUS
This works great, but if we want to use this to count the letters in any word other than “perspicacious”, we would need to alter our script in two places: once in the for
loop where we tell it what word to count the letters in, and again in the print
statement, where we would have to retype the new word with capital letters.
However, if take this basic code, and put it into a function, we can re-use it again and again.
def measure_word(word):
letter_count = 0
for letter in word:
letter_count = letter_count + 1
print('There are', letter_count, 'letters in', word.upper())
The first line in the function above tells Python we want to define a function called measure_word
, and that this function will accept one “argument” (input). Then, indented, inside the function, we can just put our code from the loop above. The only change is that now, instead of hard-coding the strings “perspicacious” and “PERSPICACIOUS” into the code, we just replace these with the variable which contains the input argument. If you, like me, are using a Jupyter Notebook and you run a cell with only the function definition in it, nothing will appear to happen. But now that the function has been defined, we can use it to measure words:
measure_word('cat')
There are 3 letters in CAT
measure_word('perspicacious')
There are 13 letters in PERSPICACIOUS
measure_word('supercalifragilisticexpialidocious')
There are 34 letters in SUPERCALIFRAGILISTICEXPIALIDOCIOUS
We can even use our new function inside a loop:
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
for word in words:
measure_word(word)
There are 2 letters in IF
There are 3 letters in YOU
There are 3 letters in SAY
There are 2 letters in IT
There are 4 letters in LOUD
There are 6 letters in ENOUGH
There are 3 letters in YOU
There are 4 letters in WILL
There are 6 letters in ALWAYS
There are 5 letters in SOUND
There are 10 letters in PRECOCIOUS
Often we don’t want our functions to print something out; instead, we want them to return a variable, which we can use later. To achieve this, we can modify our function slightly, so that instead of asking it to print
the output, we ask it to return
the output:
def measure_word(word):
letter_count = 0
for letter in word:
letter_count = letter_count + 1
return('There are', letter_count, 'letters in', word.upper())
Then, when we use the function, we can assign the output to a variable, like so:
output = measure_word('cat')
In this case, the output is a tuple
type(output)
tuple
And the contents look like this:
print(output)
('There are', 3, 'letters in', 'CAT')
If we wanted to, we could e.g. get only the number of letters out of the result, and skip the rest. The number representing the letter count is in position 1 of our function’s output, so if we only want the number of letters, and not all the surrounding text, we could write:
output = measure_word('cat')
output[1]
3
Or we could even just write
measure_word('cat')[1]
3
although at that point, we might as well just use len()
!
Functions can easily be much more complex and useful than these examples suggest. For example, You can write your functions so that they accept more than one input, and return more than one output. It wouldn’t take much to modify our code to provide the word we input, the number of letters in that word, and the first letter (in lowercase) of each input:
def measure_word(word):
first_letter = word[0].lower()
letter_count = 0
for letter in word:
letter_count = letter_count + 1
return(word,first_letter, letter_count)
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
for word in words:
print(measure_word(word))
('If', 'i', 2)
('you', 'y', 3)
('say', 's', 3)
('it', 'i', 2)
('loud', 'l', 4)
('enough', 'e', 6)
('you', 'y', 3)
('will', 'w', 4)
('always', 'a', 6)
('sound', 's', 5)
('precocious', 'p', 10)
Or imagine if, for some reason, we only wanted to know the first letter of words that were longer than 3 letters….
for word in words:
if measure_word(word)[2] > 3:
print(measure_word(word)[1])
l
e
w
a
s
p
I’ll stop now. I hope you get the idea!
8.5. List comprehensions: A different kind of loop#
Python has another way of doing loops, when working with lists. I don’t want to go too deep on these, but you are very likely to come across them if you spend any time looking at other people’s Python code, and they can be very handy, so I will briefly mention them. List comprehensions let you take the items in a list, and then make a new list based on the old list. I think the easiest way to describe how list comprehensions work is to give some examples, starting with some very simple ones, and then some ever-so-slightly more complicated ones. Here goes:
# make a new list that has all the same elements as the old list
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
new_words = [x for x in words]
print(new_words)
['If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
# make a new list that contains all the words from the old list that have an "o" in them
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
new_words = [x for x in words if 'o' in x]
print(new_words)
['you', 'loud', 'enough', 'you', 'sound', 'precocious']
# make a new list with only the words from the old list that are longer than 3 characters
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
new_words = [x for x in words if len(x)>3]
print(new_words)
['loud', 'enough', 'will', 'always', 'sound', 'precocious']
# make a new list with underscores before and after each word in the old list
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
new_words = ['_' + x + '_' for x in words]
print(new_words)
['_If_', '_you_', '_say_', '_it_', '_loud_', '_enough_', '_you_', '_will_', '_always_', '_sound_', '_precocious_']
# make a new list with underscores before and after each word in the old list if the word is longer than 4 characters
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
new_words = ['_' + x + '_' for x in words if len(x) > 4]
print(new_words)
['_enough_', '_always_', '_sound_', '_precocious_']
# compare two lists and make a new list that only has words that are in both lists
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
keywords = ['you', 'enough', 'sound', 'dog', 'bunny', 'nihilist']
new_words = [x for x in words if x in keywords]
print(new_words)
['you', 'enough', 'you', 'sound']
# compare two lists and make a new list with only words from the old list that are NOT in both lists
words = [ 'If', 'you', 'say', 'it', 'loud', 'enough', 'you', 'will', 'always', 'sound', 'precocious']
keywords = ['you', 'enough', 'sound', 'dog', 'bunny', 'nihilist']
new_words = [x for x in words if x not in keywords]
print(new_words)
['If', 'say', 'it', 'loud', 'will', 'always', 'precocious']
# add 100 to each number in the old list and convert the sum to a string in the new list
numbers = [1, 2, 3, 4, 5]
new_numbers = [str(x + 100) for x in numbers]
print(new_numbers)
['101', '102', '103', '104', '105']
# convert integers to strings within a list of lists while still maintaining the list of list structure
list_of_lists = [[1,2,3,4], [5,6,7,8]]
list_of_string_lists = [[str(y) for y in x] for x in list_of_lists]
print(list_of_string_lists)
[['1', '2', '3', '4'], ['5', '6', '7', '8']]
# "flatten" a list of lists into a single list
list_of_lists = [[1,2,3,4], [5,6,7,8]]
flattened_lists = [[y for x in list_of_lists for y in x]]
print(flattened_lists)
[[1, 2, 3, 4, 5, 6, 7, 8]]
Ok, I put that last one there for myself, because I can never remember how to do it, and now I have a place to look it up easily! Anyway, the basic idea is that you perform some function for every item in the old list, and put the output in the new list. That function could range from nothing (simply putting all the old items in the new list) to something quite complicated. Like I say, you don’t necessarily need to know about list comprehensions for the purposes of this book, but now you’ve seen them, so you can recognize them if you spot them in the wild, and you’ll have an idea what is going on. All of these could also be done with regular old for
loops, but list comprehensions are short and cool and make you look super pythonic.
By the way, you can also use list comprehensions to overwrite a list, if you so desire:
numbers = [1, 2, 3, 4]
numbers = [x*100 for x in numbers]
print(numbers)
[100, 200, 300, 400]
8.6. Nesting#
8.6.1. Nested conditionals#
We have already talked about loops and conditionals. These are powerful on their own, but you can really take them to the next level when you start nesting them, or combining them with other types of logic. A common situation is to check whether an item meets more than one condition. Maybe I have a list of numbers like this:
numbers = [4, 2, 54, 823452, 324, 2, 4435, 4, 9070878072634, 3421, 4345]
For reasons that are best known to myself[2], I want to print all of the even numbers from the list which are longer than one digit.
Now, to find numbers that are longer than one digit, we first need to change them from integers to strings, so we can use len()
on them[3]:
for num in numbers:
if len(str(num)) > 1:
print(num)
54
823452
324
4435
9070878072634
3421
4345
To figure out whether a number is even or odd, an easy way is to use %
. This is known by the fancy name of modulo operator, and you can follow the link if you want to learn more, but basically it just works like division, except it gives you the remainder of the answer. If you divide an even number by 2, there will be nothing left over (no remainder), and so modulo 2 of an even number will be 0. So, to check if the numbers from our list are even, we could do
for num in numbers:
if num % 2 == 0:
print(num)
4
2
54
823452
324
2
4
9070878072634
Great! But we want to know which numbers are both even and more than two digits. We can use nested loops to combine these two conditions, to give us an IF AND logic:
for num in numbers:
if len(str(num)) > 1:
if num % 2 == 0:
print(num)
54
823452
324
9070878072634
You can keep adding nested if
s as much as you like. Say we wanted to print all the numbers from the list that are even, more than one digit, and end with a 4, why we could just do:
for num in numbers:
if len(str(num)) > 1:
if num % 2 == 0:
if int(str(num)[-1]) == 4:
print(num)
54
324
9070878072634
By the way, if you are wondering what’s going on with all that if int(str(num)[-1])
stuff, well… I wanted to get the last digit of the numbers, so I figured I’d use [-1]
. But it turns out you can’t do that with integers, so I used str(num)
to turn the integers into strings so I could use [-1]
to get the last number. But then I wanted to compare whatever the last digit was with the number 4. But by now num
was a string, because I had changed it into one so I could find the last digit, and 4 is an integer, so it told me none of the last numbers were 4, which I knew was a baldfaced lie because I could see that two of them were, but then I realized that I could just use int()
to turn the whole shebang back into an integer so I could compare it with 4. This the way programming works! You keep futzing around until you get it to do the thing you want.
8.6.2. Nested loops#
Just like we can nest conditionals within each other, we can also nest loops. Say we have a list of lists, like this:
list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
Maybe we want to add 10 to each number in the sublists. We can use nested loops to loop through the list of lists, pausing to loop through all the items of each sublist, and adding 10 to each one:
for i in list_of_lists:
for j in i:
print(j+10)
11
12
13
14
15
16
17
18
19
By the way, I used i
and j
here, insted of list
and sublist
, because list
already has a meaning in Python, so you shouldn’t name your list list
. i
and j
are often used for this sort of thing, so I just went with that.
We can of course combine nested loops with conditionals as well (and even with nested conditionals if we want), to create just about any kind of logic we want. Here is an example of a nested loop with a conditional that goes through all the numbers in each of the sublists and adds 10 to the odd ones, but leaves the even ones untouched:
for i in list_of_lists:
for j in i:
if j % 2 != 0:
print(j+10)
else:
print(j)
11
2
13
4
15
6
17
8
19
8.7. Abstraction, generalization, and patterns#
As you become more familiar with programming, you will inevitably find yourself reusing bits of code; you will begin to see how specific programming problems you have solved are really just cases of a larger set of problems. For example, in the section on functions we looked at this simple loop:
letter_count = 0
for letter in 'perspicacious':
letter_count = letter_count + 1
This loop solves a very specific and fairly unusual problem: how to count the number of letters in the word “perspicacious”. But this is a specific case of a much more general problem: how to step through items in a series while also keeping track of how many items you have stepped through. The code above is an example of one way to solve this problem: before the loop, set up an empty “counter” variable (in this case letter_count
, which we have set to 0), then every time the loop goes around, add 1 to the counter variable.
This is an example of a type of an even more general pattern that will be very useful to you: setting up variables “outside” the loop, and then modifying them from within the loop.
Another example of this type of pattern is the “append to list” pattern. You might[4] remember when we talked about the various methods associated with different variable types. One of the methods that list variables have available to them is .append()
, the ability to stick something onto the end of a list. This comes in super handy in all sorts of situations. Here is an example.
Let’s say you have a list of all the different flowers in your garden. It looks like this:
all_flowers = ['Begonias', 'Lilacs', 'Roses', 'Pansies', 'Foxgloves', 'Buttercups', 'Sunflowers']
Now, say you want a list with only the flowers that start with the letter “B”. I don’t know why you want this, you just do, ok? To get achieve this, you could start by setting up an empty b_flowers
list outside of a loop, and then loop through your all_flowers
loop, checking each item to see if it starts with a “B”. If it does, you append it to the b_flowers
loop, like so:
all_flowers = ['Begonias', 'Lilacs', 'Roses', 'Pansies', 'Foxgloves', 'Buttercups', 'Sunflowers']
b_flowers = []
for flower in all_flowers:
if flower.startswith('B'):
b_flowers.append(flower)
print(b_flowers)
['Begonias', 'Buttercups']
Again, the specifics of the example are not important. But take a long look at the general pattern:
Empty list before the loop
Loop with a conditional
Append to the list outside the loop
It will serve you well!
Now, I know what you’re thinking: couldn’t I solve this with a list comprehension? Sure, of course you could!
all_flowers = ['Begonias', 'Lilacs', 'Roses', 'Pansies', 'Foxgloves', 'Buttercups', 'Sunflowers']
b_flowers = [x for x in all_flowers if x.startswith('B')]
print(b_flowers)
['Begonias', 'Buttercups']
This is another great Python pattern, and it will also serve you well! The end result is exactly the same, so in this case you could use whichever you like better. The point is to learn to see past the immediate problem you are trying to solve, and begin to identify classes of problems, which will begin to match up with classes of solutions.
So that’s what I mean by patterns in coding. It could be things like appending to a list outside a loop, or it could be things like using the modulo operator to check whether a number is odd or even, like we did earlier. But what about abstraction and generalization?
Functions are a good example of what I am talking about here. Often when we write code[5], we start by writing code that does something very specific, but as we reuse that code again and again, it starts to get tedious to change all the variables to match the new situation. Eventually, it becomes more practical to write a function that will accept any variable of a certain kind, and do something to it.
Say, for instance, that we often find ourselves checking lists like the list of flowers from before. But sometimes they aren’t flowers, and sometimes we want to check for a different first letter, not just “B”. And maybe sometimes the words start with capital letters, and sometimes they start with lower-case letters. It might be time to take our specific code from before, and generalize it to a function:
Just as a reminder, this is the code from before:
b_flowers = []
for flower in all_flowers:
if flower.startswith('B'):
b_flowers.append(flower)
Now, we’ll take this code, but make it more generic, and dump the whole thing into a function. Let’s call our function check_first_letter
def check_first_letter(input_list, target_letter):
output_list = []
for item in input_list:
if item.lower().startswith(target_letter.lower()):
output_list.append(item)
print(output_list)
check_first_letter(all_flowers, 'b')
['Begonias', 'Buttercups']
foods = ['Pears', 'pickles', 'Beets', 'burgers', 'cheese', 'peppers', 'Pizza', 'bananas', 'cardamom', 'Cabbage']
check_first_letter(foods, 'P')
['Pears', 'pickles', 'peppers', 'Pizza']
check_first_letter(foods, 'p')
['Pears', 'pickles', 'peppers', 'Pizza']
check_first_letter(foods, 'B')
['Beets', 'burgers', 'bananas']
check_first_letter(foods, 'c')
['cheese', 'cardamom', 'Cabbage']
Now we have a gerneric, generalized and abstracted function that will check the first letter of strings in a list. All we have to do is tell it which list to look at, and what letter to check for. Additionally, it doesn’t matter wheter we give it the capital or lower-case version of the letter as a target, and it doesn’t matter whether the item in the list starts with a capital or a lower-case letter. Our function will find them all!