Introduction to Python Programming

Interactive Jupyter Notebook

This notebook provides an introduction to Python programming fundamentals, including an overview of basic programing concepts, common data structures, and simple visualization. This notebook was created by Becky Vandewalle based off of prior work by Dandong Yin.

Introduction

Python is a commonly used scripting language that is popular due to its accessibility. While this notebook covers Python 2.7, many concepts are similar to Python 3.

General documentation: https://docs.python.org/2.7/
Python tutorial: https://docs.python.org/2.7/tutorial/index.html

Setup

Run this cell for the rest of the notebook to work!

In [1]:
# import required libraries

%matplotlib inline
import os
import json
import rasterio
import time

#execfile(os.path.join('/share/pyintro_resources/','highlight_feats.py'))

filename = os.path.join('pyintro_resources/','highlight_feats.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))

Python Fundamentals

This section will provide a brief overview of basic concepts and operations in Python.

Python as a Calculator

A simple, yet powerful attribute of Python is how you can use it to calculate basic numeric operations. This is useful for getting a first introduction to the language. You can type a sequence of numbers and operators into a cell to get a result like a calculator. Parentheses can be used to order operations.

See a list of useful operators here.

Try running these cells to see how it works:

In [2]:
3 + 4
Out[2]:
7
In [3]:
2 * 4
Out[3]:
8
In [4]:
2 ** 4
Out[4]:
16
In [5]:
10 / 3
Out[5]:
3.3333333333333335
In [6]:
10.0 / 3
Out[6]:
3.3333333333333335
In [7]:
10 % 3
Out[7]:
1
In [8]:
250 / (5 + 5) * (7 - 3)
Out[8]:
100.0

Pressing Return within a cell will create a new line in the cell code. When you run a cell, it will print the last value calculated unless you use Python's print statement to print earlier values.

In [9]:
2 + 3
4 + 6
Out[9]:
10

In the cell above, the value of 2 + 3 isn't shown because 4 + 6 is calculated and returned after.

In [10]:
print(2 + 3)
4 + 6
5
Out[10]:
10

Now both values are shown because print was explicitly called for 2 + 3.

Note that some operators are not automatically available in the core Python library. For example, did you see a square root operator above? You can access additional functions using additional Python libraries (more on libraries later).

In [11]:
import math
math.sqrt(16)
Out[11]:
4.0

The square root function is available with the Math library.

Comments

Using a pound sign (#) in a Python line will create a comment. Comments don't have to be at the start of a line, but note that any part of the line after the pound sign will be considered a comment (unless it is part of a string), so you can't place comments in the middle of a data type that is expecting a specific character (like a closing a parenthesis or bracket in a list).

In [12]:
# this is a comment

4 + 2 # this is another comment
Out[12]:
6
In [13]:
# uncomment the '#mylist...' line below and try to run the cell
# this cell will fail to run!

#mylist = [1, 2, # comment]
In [14]:
# but this works (more on lists below)

mylist = [1, 2, # comment
         ]
In [15]:
# this pound sign does not start a comment
# it is within a string!

mystring = 'hi # comment?'
mystring
Out[15]:
'hi # comment?'

Creating variables

To create a simple variable, type the variable name, the equals (`=`) sign, and the variable value. For example:

In [16]:
a = 1
b = 4

Here a is a variable, and its value is set to 1. You can print the variable name to show its value, or type the variable name in the last line of the cell to see the value.

In [17]:
print(a)
b
1
Out[17]:
4

Variable names must begin with an alphabetic character (a-z), digit (0-9), or underscore (_). Variable names are case sensitive!

In [18]:
One = 1
one = 1.0
print(One) # these are different
print(one)
1
1.0
In [19]:
# uncomment the '#*hi...' line below and try to run the cell
# this will fail - not a valid variable name

#*hi = 'hi'

Whitespace

Blank lines between code lines are ignored, so you can use blank lines to group things for ease of reading.

In [20]:
a = 1
b = 4

c = 'cat'

White space within a line is often ignored, so you can condense, align, or spread out code for clarity.

In [21]:
# these are all equivalent

a=3#comment
a = 3 # comment
a    =    3       #      comment

However, space is needed after or between key words and commands so Python can parse them properly.

In [22]:
# Not a good Example for Python 3
# prints a

print(a)
3
In [23]:
# uncomment the '#printa' line below and try to run the cell
# fails - no command or variable called 'printa' exists

#printa

White space in the front of a line is very important for python to work correctly. When typing code in a cell, make sure each regular line starts at the very beginning of the line with no leading space.

In [24]:
# this works

a = 2
b = 3
In [25]:
# uncomment the '# b = 3' line below (keeping the space before the b) and try to run the cell
# this will fail

a = 2
# b = 3

Indentation, typically using Tab, represents a group of lines (called a code block) that are related to the last unindented line. Each indented line in a cell needs to match up with all the other lines that are indented the same amount of times and the same amount of space (again usually in increments of Tab) needs to occur before each indented line. Although you can indent code blocks with spaces instead of Tabs, it is often easier to use Tab (and you need to be consistent throughout a script or within a cell).

In [26]:
# example of indented block

a = 2
if a:
    print('a exists')
a exists
In [27]:
# you can have multiple indentation levels

a = 2
if a:
    print('a exists')
b = 3
if b:
    print('b exists')
if a:
    if b:
        print('a and b exist')
a exists
b exists
a and b exist
In [28]:
# indent with spaces
# this works, but Jupyter notebook will highlight keywords in red because it expects Tab
a = 2
if a:
  print('a exists')
b = 3
if b:
  print ('b exists')
a exists
b exists
In [29]:
# this works but is --NOT-- recommended - make sure your indents match!

a = 2
if a:
  print ('a exists')
b = 3
if b:
    print ('b exists')
a exists
b exists
In [30]:
# uncomment the last line below (keeping the space before 'print') and try to run the cell

# this doesn't work
# indentation is not consistent within a code block

a = 2
if a:
  print ('a exists')
#    print ('not sure if b exists')
a exists

Basic Object Types

Python has a variety of basic variable types, and we have already seen a few! See some further examples below. The type function can indicate an object or variable's type.

Basic numeric types:

In [31]:
1    # integer
1.0  # float
Out[31]:
1.0
In [32]:
print(type(1))
print(type(1.0))
<class 'int'>
<class 'float'>
In [33]:
# convert between types

print(float(1))
print(int(1.23)) # truncates
print(int(1.83)) # does not round
1.0
1
1

None type:

None is a special designation to indicate that the variable exists but has not been assigned to a particular value.

In [34]:
a = None
print(a)
None

Boolean types:

A special type is a Boolean variable. This designates a variable as True or False (note the case!).

In [35]:
a = True
b = False
In [36]:
# uncomment the last line below and try to run the cell
# this fails because a variable 'true' hasn't been defined

#a = true

You can check if a variable is True or False like this:

In [37]:
a is True
Out[37]:
True
In [38]:
print(a is False)
print(b is True)
print(b is False)
False
False
True

There are special cases where other types of variables evaluate to True or False. While most variable values evaluate to True, variables set to None, equal to 0, 0.0 or equivalent, or empty are False. Note that evaluating to True or False is not the same as being assigned to True or False.

In [39]:
# will evaluate code block if a evaluates to true

a = 3
if a:
    print('a')
a
In [40]:
# here, b evaluates to false; nothing prints

b = 0
if b:
    print ('b')
In [41]:
# a evaluates to True but does not equal true

a = 3
a is True
Out[41]:
False

Strings:

A string is a sequence of alpha-numeric characters:

In [42]:
'hello'
'cloud9'
Out[42]:
'cloud9'
In [43]:
type('hello')
Out[43]:
str

Accents and Unicode characters are supported, but may cause issues with string functions if not handled carefully.

In [44]:
cafe_var = 'café'
cafe_var
Out[44]:
'café'
In [45]:
print(cafe_var)
café

A 'u' in front of a string designates it as unicode. You can copy unicode characters to use to set a variable like this:

In [46]:
hello = u'你好'
hello
Out[46]:
'你好'
In [47]:
print (hello)
你好

Or you can define unicode characters using an escape sequence. Here '\u4f60' refers to 你.

In [48]:
hello2 = u'\u4f60\u597d'
hello2
Out[48]:
'你好'
In [49]:
print (hello2)
你好

Escaping characters:

Escape characters reference special characters that are part of a string but can't be directly typed into a string. For example, you can not include a single quote (') in a string unless it is escaped. To escape a special character, prefix it with the back slash (\).

See a list of escape characters here

In [50]:
# a new line \n is a common escape character

new_line_str = 'hi\nhi2'
new_line_str
Out[50]:
'hi\nhi2'
In [51]:
# it prints with the new line

print (new_line_str)
hi
hi2
In [52]:
print ('don\'t', 'path\\to\\file')
don't path\to\file

'Smart Quotes':

Be careful when copying text that uses 'smart quotes', these are quotation marks and apostrophes that are curved. Python does't recognize these characters!

Use this!Not this! Use this!Not this!
" '
" '
In [53]:
# uncomment the last line below and try to run the cell
# this cell will fail beacuse it has smart quotes

#copy_text = “Hello there”

Other types of sequences include lists and tuples. Elements in a list can be changed, but elements in a tuple can not unless a new tuple is created.

In [54]:
# lists are created using square brackets

mylist = [1, 2, 3]
mylist
Out[54]:
[1, 2, 3]
In [55]:
# you can add a value to a list after making it

mylist.append(4)
mylist
Out[55]:
[1, 2, 3, 4]
In [56]:
# tuple are created using parentheses

mytuple = (1, 2, 3)
mytuple
Out[56]:
(1, 2, 3)
In [57]:
# uncomment the last line below and try to run the cell
# you can't add a value to a tuple

#mytuple.append(4)
In [58]:
# this works because newtuple is a new tuple, but may not work as you would expect!

newtuple = (mytuple, 4)
newtuple
Out[58]:
((1, 2, 3), 4)

You can select a specific value of a list or tuple using square brackets around the item's index. These need to be directly at the end of the variable name. Python index values start from 0.

In [59]:
# select by index

print (mylist[2])
print (mytuple[0])
3
1

: is a special value that will select all items.

In [60]:
# select all

print (mylist[:])
[1, 2, 3, 4]

Using a negative index value will count from the back. It starts with -1.

In [61]:
print (mytuple[-1])
3

It is possible to stack indices to select an item in a multi-level list.

In [62]:
# multi-level index

nested_list = [[1, 2], [3, 4]]
nested_list[0][1] # select first list, then second item
Out[62]:
2

You can change and delete values from a list using the index.

In [63]:
# change last list item

nested_list[-1] = [4, 5]
nested_list
Out[63]:
[[1, 2], [4, 5]]
In [64]:
# delete list value

del nested_list[0][0]
nested_list
Out[64]:
[[2], [4, 5]]

Dictionaries:

Dictionaries are a collection of unordered key-value pairs.

In [65]:
# lists are created using curly braces

pet_list = {'alice':'cat', 'becky':'cat', 'chaoli': 'parrot', 'dan':'dog'}
pet_list
Out[65]:
{'alice': 'cat', 'becky': 'cat', 'chaoli': 'parrot', 'dan': 'dog'}
In [66]:
print (pet_list)
{'alice': 'cat', 'becky': 'cat', 'chaoli': 'parrot', 'dan': 'dog'}

Dictionaries have keys and values. This is similar to a physical dictionary - you look up a word to find its definition.

In [67]:
# list all keys

pet_list.keys()
Out[67]:
dict_keys(['alice', 'becky', 'chaoli', 'dan'])
In [68]:
# list all values

pet_list.values()
Out[68]:
dict_values(['cat', 'cat', 'parrot', 'dog'])

You can find which specifically value goes with which key by using the key as the index.

In [69]:
pet_list['dan']
Out[69]:
'dog'

Like lists, you can change dictionary keys and values after the fact.

In [70]:
# add a key/value pair

pet_list['ewan'] = 'bunny'
pet_list
Out[70]:
{'alice': 'cat',
 'becky': 'cat',
 'chaoli': 'parrot',
 'dan': 'dog',
 'ewan': 'bunny'}

It's good to check if a key/value pair exists before deleting a value.

In [71]:
# delete a key/value pair

if 'alice' in pet_list.keys():
    del pet_list['alice']
pet_list
Out[71]:
{'becky': 'cat', 'chaoli': 'parrot', 'dan': 'dog', 'ewan': 'bunny'}

Dictionaries can be nested.

In [72]:
pet_list_ext = {'alice': {'type':'cat', 'age':3}, 
            'becky': {'type':'cat', 'age':9}, 
            'chaoli': {'type':'parrot', 'age':23}, 
            'dan': {'type':'dog', 'age':7.5}}
pet_list
Out[72]:
{'becky': 'cat', 'chaoli': 'parrot', 'dan': 'dog', 'ewan': 'bunny'}

Use the double named index selection to retrieve values in nested dictionaries.

In [73]:
pet_list_ext['chaoli']['type']
Out[73]:
'parrot'

Boolean Operators and Comparisons

Boolean Operators are used to evaluate combinations of either Boolean variables or other variables through evaluation. The operators are `and`, `or`, and `not`.

Try to guess what will be returned for each combination below!

In [74]:
True and True
Out[74]:
True
In [75]:
True and False
Out[75]:
False
In [76]:
False and False
Out[76]:
False
In [77]:
True or True
Out[77]:
True
In [78]:
True or False
Out[78]:
True
In [79]:
not True
Out[79]:
False
In [80]:
not False
Out[80]:
True
In [81]:
if (1 and 'hi'): # through evaluation
    print('OK')
OK
In [82]:
if (0 and 'hi'): # through evaluation
    print('OK')

Comparisons are used to evaluate relative values (ex. is x greater than y), equivalence, or identity. A few examples are shown below.

In [83]:
1 > 2
Out[83]:
False
In [84]:
1 < 2
Out[84]:
True
In [85]:
1 >= 2
Out[85]:
False
In [86]:
1 <= 2
Out[86]:
True

NOTE! Testing for equivalence needs two equal signs, not one!

In [87]:
# are these equal?

1 == 1 
Out[87]:
True
In [88]:
# uncomment the last line below and try to run the cell
# this fails

#1 = 1
In [89]:
1 != 2 # is not equal to
Out[89]:
True

`is` and `is not` can also be used for comparisons.

In [90]:
1 is 2
Out[90]:
False
In [91]:
1 is not 2
Out[91]:
True

You can use `in` and `not in` to see if a value is part of a sequence.

In [92]:
1 in (1, 2, 3)
Out[92]:
True
In [93]:
1 not in (1, 2, 3)
Out[93]:
False

Importing Libraries

There are a few different ways to import a Python library or specific function from a library.

If you import an entire library, you need to preface a function in that library with the library name. For some commonly used libraries or ones with long names, it is common to give it a nickname when importing. If you import a specific function from a library, you can use that function without prefixing it with the library name.

In [94]:
import time                        # import entire library
import numpy as np                 # call numpy using np
from math import sqrt              # just import square root function from math library
from math import factorial as fac  # just import factorial function from math library, call it fac

Be careful with your nicknames because you could potentially conflict with an existing function.

In [95]:
# prints current time (seconds since January 1, 1970)

print(time.time())
1612204013.2838109
In [96]:
# call numpy function using nickname np for numpy

np.array([2,3,4])
Out[96]:
array([2, 3, 4])
In [97]:
# can call sqrt function without having 'math.' in front

sqrt(16) 
Out[97]:
4.0
In [98]:
# can call factorial function by nickname without having 'math.' in front

fac(5)
Out[98]:
120

Control Flow

Most of the time python programs run line by line, executing each statement in order from top to bottom. However, there are cases when certain lines should be skipped if some condition occurs, or a certain section of code should be run many times. Control flow tools are used to change the order or number of times lines or code sections are run.

In [99]:
# if a exists, print

a = 3
if a:
    print ('a =', a)
a = 3
In [100]:
# print elements in list

mylist = [1, 2, 3]
for i in mylist:
    print (i, end=" ")
1 2 3 

The range function returns a list of numbers from 0 to the specified number.

In [101]:
range(5)
Out[101]:
range(0, 5)
In [102]:
# print numbers in a certain range

for i in range(5):
    print (i, end=" ")
0 1 2 3 4 

Certain keywords can affect how the loop functions:

In [103]:
# stop if 7 is reached

for i in range(10):
    if i == 7:
        break
    print (i, end=" ")
0 1 2 3 4 5 6 
In [104]:
# prints '- no break' if loop completed without break

for i in range(10):
    if i == 12:
        break
    print (i, end=" ")
else:
    print ('- no break')
0 1 2 3 4 5 6 7 8 9 - no break
In [105]:
# skips even numbers, but continues through loop after

for i in range(10):
    if i % 2 == 0:
        continue
    print (i, end=" ")
1 3 5 7 9 

Sometimes it is useful to have a placeholder in a loop. Here the loop loops, but due to the pass keyword it does nothing.

In [106]:
# do nothing

for i in range(10):
    pass

While loops are useful to continue for an unspecified amount of time until a certain condition is met. If there is no condition specified or nothing changes this loop will keep looping!

In [107]:
# while loop

a = 0
while a < 10:
    print (a, end=" ")
    a += 1
0 1 2 3 4 5 6 7 8 9 

The try, except, and finally keywords are used to catch things that have failed. finally will always run, but except will only run if the specified error occurred.

In [108]:
try:
    1 / 0
except ZeroDivisionError:
    print("that didn't work")
finally:
    print('end')
that didn't work
end
In [109]:
try:
    1 / 1
except ZeroDivisionError:
    print ("that didn't work")
finally:
    print ('end')
end

You can also have a general except clause to catch any type of error.

In [110]:
try:
    1 / 0
except:
    print ("that didn't work")
finally:
    print ('end')
that didn't work
end

List Comprehension

List Comprehension is a quick way to run through a loop. The following two cells create the same resulting list.

In [111]:
mylist = []
for i in range(5):
    mylist.append(i * 2)
mylist
Out[111]:
[0, 2, 4, 6, 8]
In [112]:
mylist = [i*2 for i in range(5)]
mylist
Out[112]:
[0, 2, 4, 6, 8]

Custom Functions

It is useful to create custom functions when you want to reuse sections of code many times.

The def keyword is used to start a function definition. Arguments that the function expects to receive are listed between parentheses before the :.

In [113]:
# define a function with no arguments

def myfunct():
    print ('hello')
In [114]:
# define a function with one argument

def myfunct2(name):
    print ('hello,', name)
In [115]:
# call the functions

myfunct()
myfunct2('Iekika')
hello
hello, Iekika

If you forget the parentheses in the function call Python will tell you about the function rather than calling it.

In [116]:
myfunct
Out[116]:
<function __main__.myfunct()>

File Operations

You can open, read, write files using Python.

In [117]:
# open a file

myfile = open('test_file.txt')
myfile
Out[117]:
<_io.TextIOWrapper name='test_file.txt' mode='r' encoding='UTF-8'>
In [118]:
# read file lines

lines = myfile.readlines()
lines
Out[118]:
['line 1\n', 'second line\n', '\n', '4th line\n', 'this is a test file']
In [119]:
# print each line

for line in lines:
    print(line)
line 1

second line



4th line

this is a test file
In [120]:
# print a specific line

print(lines[3])
4th line

It is important to close the file when you are finished accessing it!

In [121]:
# close file

myfile.close()

A trick is to use the with statement to read a file instead. The file will be closed automatically.

In [122]:
# open with 'with' statement

with open('test_file.txt') as newfile:
    newlines = newfile.read()
    
newlines
Out[122]:
'line 1\nsecond line\n\n4th line\nthis is a test file'
In [123]:
# get current time

nowtime = time.time()
nowtime
Out[123]:
1612204013.6704671
In [124]:
# write to a file

with open('write_me.txt', 'w') as wfile:

    wfile.write('Hi there! ' + str(nowtime))
In [125]:
# read written file

with open('write_me.txt') as rfile:
    rlines = rfile.read()
    
rlines
Out[125]:
'Hi there! 1612204013.6704671'

Geospatial Data Processing

This last section will briefly cover raster and vector data and show a few introductory ways to work with these data types.

Raster data

The idea of raster data is extended from digital photography, where a matrix is used to represent a continuous part of the world. A GeoTIFF extends the TIFF image format by including geospatial context of the corresponding image.

Generic image readers/libraries ignore the geospatial info and retrieve only the image content. Geospatially-aware software/libraries are needed to extract complete information from this image format.

RasterIO

RasterIO is a light-weight raster processing library that provides enough utility and flexibility for a good range of common needs. Refer to this example as a start.

In [126]:
# load raster data

chicago_tif = rasterio.open(os.path.join('pyintro_resources/data','Chicago.tif'))
In [127]:
# see type

type(chicago_tif)
Out[127]:
rasterio.io.DatasetReader
In [128]:
# find the shape of the array (rows vs columns)

chicago_tif.shape
Out[128]:
(929, 699)
In [129]:
# assign the first image band to a variable

band1 = chicago_tif.read(1)

Vector data

Vector data describe the world with explicit coordinates and attributes. A GeoJson is a straight-forward format derived from Json. It packs vector data in a way easy for both humans and machines to read/write.

In [130]:
# load chicago vector data

chicago = json.load(open(os.path.join('pyintro_resources/data','Chicago_Community.geojson')))
In [131]:
# Json is represented in Python as a dictionary

type(chicago)
Out[131]:
dict
In [132]:
# we can see the dictionary keys

chicago.keys()
Out[132]:
dict_keys(['type', 'features'])
In [133]:
# the value of 'type' is 'FeatureCollection': a collection of vector features

chicago['type']
Out[133]:
'FeatureCollection'
In [134]:
# 'features' contains a list of feature values

type(chicago['features'])
Out[134]:
list
In [135]:
# what are the keys for the first feature in the list

chicago['features'][0].keys()
Out[135]:
dict_keys(['type', 'properties', 'geometry'])
In [136]:
# what are the properties for the first feature in the list

chicago['features'][0]['properties']
Out[136]:
{'community': 'DOUGLAS',
 'area': '0',
 'shape_area': '46004621.1581',
 'perimeter': '0',
 'area_num_1': '35',
 'area_numbe': '35',
 'comarea_id': '0',
 'comarea': '0',
 'shape_len': '31027.0545098'}

Basic Image Visualization

Matplotlib is a powerful library commonly used to display vector data, but one that can handle raster data. Use the %matplotlib inline command to help display plots as cell output.

In [137]:
%matplotlib inline
import matplotlib.pyplot as plt
In [138]:
# plot the band with Matplotlib

fig = plt.figure(figsize=(12,10))
plt.imshow(band1, cmap='gray', extent=chicago_tif.bounds)
Out[138]:
<matplotlib.image.AxesImage at 0x7f1467a42490>

Basic Plotting

Matplotlib is powerful for generating graphs. Here is a simple example graph:

In [139]:
plt.plot([1,2,3,4])
plt.title('My Plot')
plt.ylabel('some numbers')
plt.show()   

Python libraries optimized for visualizing geospatial vector data will be covered in a later notebook!

Enjoy getting to know Python through Jupyter Notebooks!