Further Python Features
Sets
Easy – e.g.
A = {1, 2, 3, 3} B = {3, 4, 5, 6, 7} A & B
Comprehension
A list comprehension is a way to create a list by a process that includes a test to filter values, and a map to transform values
list1 = [1, 2, 3, 4, 5] print([x*2 for x in list1 if x < 4])
Here’s a longer example –
def quicksort(aList): if aList: # i.e., not an empty list pivot = aList[len(aList)//2] # In Python 3, // is integer divide. return (quicksort([x for x in aList if x < pivot]) + [x for x in aList if x == pivot] + quicksort([x for x in aList if x > pivot])) else: return [] p = [3, 5, 6 , 7 , 1, 12, 9] q = quicksort(p) print(q)
You need to be careful with the notation.
my_list = [i * i for i in range(10)] print(my_list)
produces a list
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
my_dict = {i: i * i for i in range(10)} print(my_dict)
produces a dictionary
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
and
my_set = {i * i for i in range(10)} print(my_set)
produces a set
{0, 1, 64, 4, 36, 9, 16, 49, 81, 25}
Using defaultdict
Suppose you have this dictionary
citycountries={"Milano":"Italy","London":"England","Manchester":"England","Brighton":"England","Roma":"Italy"}
To find all the cities in each country you could to this, creating a new dictionary containing a list of cities for each country –
country_to_stationsdict={} # create empty dictionary for city in citycountries: country=citycountries[city] if country not in country_to_stationsdict: # create a new entry containing a list with one city in it country_to_stationsdict[country]=[city] else: country_to_stationsdict[country].append(city)
The following is tidier
from collections import defaultdict # The next line says that the value of each dictionary entry is a list, # which avoids having to treat the first mention of a country as a special case country_to_stations=defaultdict(list) for city in citycountries: country=citycountries[city] country_to_stations[country].append(city)
In each case, the resulting dictionary can be printed using
for country in country_to_stationsdict: print(country + "- ",end="") for city in country_to_stationsdict[country]: print(city + " ",end="") print()
Default values when accessing dictionaries
The following creates a dictionary defining the 3 entries of a sparse 2D array
matrix = {(0, 3): 1, (2, 1): 2, (4, 3): 3}
so that matrix
would look something like this –
1 | |||
2 | |||
3 | |||
You’d like matrix[(1, 3)]
(accessing an entry that hasn’t been defined) to have the value 0. Alas, it crashes. Fortunately the following is possible, returning 0 if the entry doesn’t exist
matrix.get((1, 3), 0)
Collections
These offer some facilities rather like those of the C++ Standard Templates. Here’s how to create and use a double-ended queue with maximum length of 3
import collections last_three = collections.deque(maxlen=3) for i in range(10): last_three.append(i) print (', '.join(str(x) for x in last_three))
And here’s how to find the most common number in a list
import collections A = collections.Counter([1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7]) A.most_common(1)
outputs
[(3, 4)]
i.e. 3 is the most common number. It appears 4 times
Iterators
Iterators are variables that can be used to access a sequence of values. A simple example is
x=[0, 1, 2, 3, 4] i=iter(x) next(i) next(i) next(i)
itertools
Some examples –
import itertools # 0 to 16 in binary for p in itertools.product([0, 1], repeat=4): print (''.join(str(x) for x in p)) # permutations for p in itertools.permutations([1, 2, 3, 4]): print (''.join(str(x) for x in p)) # Sliding windows - from # http://sahandsaba.com/thirty-python-language-features-and-tricks-you-may-not-know.html a = [1, 2, 3, 4] from itertools import islice def n_grams(a, n): z = (islice(a, i, None) for i in range(n)) return list(zip(*z)) n_grams(a, 3) n_grams(a, 2) n_grams(a, 4)
An example of using iter and lambda functions
a = [1, 2, 3, 4] # display non-overlapping sequences of length k group_adjacent = lambda a, k: list(zip(*([iter(a)] * k))) group_adjacent(a, 3) group_adjacent(a, 2) group_adjacent(a, 1)
Range
With old versions of Python, range(0,3)
directly produced a list – [0, 1, 2]
. Nowadays it produces a special object. To get a list you need to use list(range(0,3))
. The reason for this is efficiency. Consider this rather artificial piece of code
target=3 for i in list(range(0,1000)): if i==target: print("target found!") break
It creates a list of a 1000 numbers and looks through them until it finds what it’s looking for. Compare that with
target=3 for i in range(0,1000): if i==target: print("target found!") break
This finds the target too. The difference is that a long list isn’t created first – each i
is given a value only when the comparison with target is about to happen.
It’s for this reason that generators are useful too – they generate the values in a sequence only if they’re required.
Generators
Doing
mygenerator1 = (x*2 for x in range(3))
creates a generator, not a list. So does the following, because it uses yield
rather than return
def createGenerator(): mylist = range(3) for i in mylist: yield i*2 mygenerator1 = createGenerator()
Generators are iterators, but they calculate the values on the fly.
for i in mygenerator1: print(i)
works the first time it’s tried, but generators can only be used once.
Closures
These are functions that return functions. In C++ you might use templates instead. Suppose you want to create a function add10
to return 10 more than its argument, and add5
to return 5 more than its argument. These 2 functions will be very similar, so rather than duplicate code, a closure can be used.
def add_number(num): def adder(number): return num + number return adder add10=add_number(10) add5=add_number(5) add10(37) add5(37)
Maps, filters, vectorising etc
Many of these examples are from http://www.eg.bucknell.edu/~hyde/Python3/IntroductionToPython.pdf
The following finds the absolute value of the elements (note that abs([-5,-42, 20, -1])
doesn’t work).
print(list(map(abs, [-5,-42, 20, -1])))
The filter
command is also useful –
def even(x): return x % 2 == 0 a = [1, 2, 3, 4, 5] print(list(filter(even, a)))
Here’s another way to run a function on each of the elements in a list.
import numpy as np def H(x): return (0 if x<0 else 1) x=np.linspace(-10, 10, 5) H_vec=np.vectorize(H) print(H_vec(x))
reduce
repeatly uses the provided function on 2 arguments to produce a result that will be used as an argument for the next iteration. Both the uses of reduce
below produce 10 - the acculumated total.
def add2(x, y): return x + y import functools a = [1, 2, 3, 4] print(functools.reduce(add2, a)) print(functools.reduce(lambda x, y : x + y, a))
functools
The functools package has some useful features -
cache
from functools import cache, lru_cache @cache def myfun(num): print("calculating") return num*2 @lru_cache(maxsize=5) def myfun2(num): print("calculating") return num*2 range1=range(10) foo=list(range(10)) foo.reverse() range2=foo print("Call myfun with a list of numbers using a cache") for i in range1: print(myfun(i)) print("Call myfun with the same list of numbers, reversed") print("Note that because of caching, myfun isn't called,") print("which might save a lot of time if myfun is slow") for i in range2: print(myfun(i)) print("Call myfun2 with a list of numbers using an lru_cache, size 5") for i in range1: print(myfun2(i)) print("Call myfun2 with the same list of numbers, reversed.") print("Note that because of the smaller cache, behaviour is different") for i in range2: print(myfun2(i)) print(myfun2.cache_info())
Eval
How do you make Python process a stored string as if you'd typed it in? Like this
s="3*7-2" eval(s)
outputs
19
Else
The use of else
isn't restricted to being used with if
. In
n = 5 while n != 0: print n n -= 1 else: print "what the..."
the else clause is executed
- by hitting the loop condition
- by falling off the bottom of a try block.
It is not executed if
- you break or return out of a block
- you raise an exception.
It works for not only while and for loops, but also try blocks. See
https://docs.python.org/3/reference/compound_stmts.html#the-while-statement
Finding out about an object
test = [1, 3, 5, 7] print(dir(test))
will display the methods available for test. E.g. it shows you that test.reverse()
is possible.
How much memory does an object use?
import sys x=1 print(sys.getsizeof(x))
will display 28
, meaning that the x
variable uses 28 bytes.
File handling
# this closes the file at the end of the 'with' scope with open('notes') as fp: for line in fp: print(line);
# json - Javascript Object Notation import json x=[1,2,3]; with open('test.json','w') as fp: json.dump(x,fp);
Performance
There are often several ways to perform a task. Especially with
large data sets, there may be great differences in speed. Here's an example adapted from "Python for Finance".
import time from math import * import numpy as np import numexpr abignumber=30000000 a=range(1,abignumber) aa=np.arange(1,abignumber) numexpr.set_num_threads(8) def f(x): return 3*log(x)+cos(x)**2 #Method 1 start_time = time.time() r=[f(x) for x in a] end_time = time.time() print("Time taken=",end_time - start_time, " seconds") #Method 2 start_time = time.time() rr= 3*np.log(aa)+np.cos(aa)**2 end_time = time.time() print("Time taken=",end_time - start_time, " seconds") #Method 3 start_time = time.time() f='3*log(aa)+cos(aa)**2' rrr=numexpr.evaluate(f) end_time = time.time() print("Time taken=",end_time - start_time, " seconds")
When I tried this on one of our terminal terminals (in the "DPO")
and on a multi-CPU machine ("ts-access") I got these results
Method | DPO time | ts-access time |
1 | 51 | 34 |
2 | 25 | 3 |
3 | 4 | 0.5 |
Pandas
The pandas package is installed for data manipulation, analysis and data visualization (of numerical tables and time series). Here's an example adapted from Minutes to pandas
import pandas as pd import numpy as np import matplotlib.pyplot as plt # create some data s = pd.Series([1,3,5,np.nan,6,8]) dates = pd.date_range('20130101', periods=6) # Create a dataframe df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) # View the data that's in the dataframe df # Get some stats and do some sorting df.describe() df.sort_index(axis=1, ascending=False) df.sort_values(by='B') # Create more data ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000)) ts = ts.cumsum() ts.plot() plt.show()