Creating new collections
We’ve seen collections like lists, strings and tuples that allow indexed access
mylist[0]
And we’ve seen collections like dict and set that allow keyed access
menu_prices_dict['hamburger'] = 59
In this chapter we will learn how the magic behind collections and access works and how to create our own containers or customize existing containers
Note:
In this lesson we’re going to rely heavily on decorators, duck-typing, protocols, and ABCs - which we’ve covered in lessons 02 and 03
Iterator protocol
Lets start with one of the simplest python protocols: the iterator protocol.
What are iterators in Python?
Iterators are everywhere in Python. They are elegantly implemented within for loops, comprehensions, generators etc. but hidden in plain sight.
Iterator in Python is simply an object that can be iterated upon. An object which will return data, one element at a time.
Technically speaking, Python iterator object must implement two special methods, __iter__()
and __next__()
, collectively called the iterator protocol.
An object is called iterable if we can get an iterator from it. Most of built-in containers in Python like: list, tuple, string etc. are iterables.
The iter() function (which in turn calls the __iter__()
method) returns an iterator from them.
Iterator Types
Python supports a concept of iteration over containers. This is implemented using two distinct methods; these are used to allow user-defined classes to support iteration. Sequences, described below in more detail, always support the iteration methods.
One method needs to be defined for container objects to provide iteration support:
container.__iter__()
Return an iterator object. The object is required to support the iterator protocol described below. If a container supports different types of iteration, additional methods can be provided to specifically request iterators for those iteration types. (An example of an object supporting multiple forms of iteration would be a tree structure which supports both breadth-first and depth-first traversal.) This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.
The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:
iterator.__iter__()
Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements. This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.
iterator.__next__()
Return the next item from the container. If there are no further items, raise the StopIteration exception. This method corresponds to the tp_iternext slot of the type structure for Python objects in the Python/C API.
Python defines several iterator objects to support iteration over general and specific sequence types, dictionaries, and other more specialized forms. The specific types are not important beyond their implementation of the iterator protocol.
Once an iterator’s next() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.
Example
Below we show an example class that implements the iterator protocol
class VersionedObject :
"""
VersionedObject is a type that remembers all the past values it held
and can return the history of its values with a for loop
"""
def __init__(self, value=None):
self.__values = [ value ]
def update(self, value):
self.__values.append(value)
def latest(self):
return self.__values[-1]
def __iter__(self):
return self.Iterator(self.__values)
class Iterator:
def __init__(self, values):
self.__index = 0
self.__values = values
def __next__(self):
# Return the next item from the container.
# If there are no further items, raise the StopIteration exception
if self.__index is None or len(self.__values) <= self.__index:
# Once an iterator’s next() method raises StopIteration, it must continue to do so
self.__index = None
raise StopIteration()
value = self.__values[self.__index]
self.__index += 1
return value
def __iter__(self):
# Return the iterator object itself.
# This is required to allow both containers and iterators to be used with the for and in statements
return self #
x = VersionedObject([1])
x.update(2)
x.update("third version")
x.update(4)
for older_value in x: # calls the __iter__ method on x
print(older_value) # calls the __next__ method on the iterator
"""
Lets simplify the code for VersionObject, by using the fact that lists []
also support the iterator protocol themselves
"""
class VersionedObject_2 :
def __init__(self, value=None):
self.__values = [ value ]
def update(self, value):
self.__values.append(value)
def latest(self):
return self.__values[-1]
def __iter__(self):
return iter(self.__values) # calls self.__values iterator
x = VersionedObject_2([1])
x.update(2)
x.update("third version")
x.update(4)
for older_value in x: # calls the __iter__ method on x
print(older_value) # calls the __next__ method on the iterator
Sequence protocol
There are three basic sequence types: lists, tuples, and range objects.
Sequences support the following operations:
Operation Result
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s + t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k
len(s) length of s
min(s) smallest item of s
max(s) largest item of s
s.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i and before index j)
s.count(x) total number of occurrences of x in s
This is a rather long list of operations … also it doesn’t tell us which methods to implement and how …
to help us with this difficult task, python provides an abstract base class(ABC) collections.abc.Sequence
that implements the most of the sequence protocol and only asks us to implement two abstract function: __len__
and __getitem__
- if you’re interested in the details of how the Sequence class implements other functions such as
index
,count
or__contains__
just by using__len__
and__getitem__
see the_collections_abc.py
module in the python standard library
Lets see collections.abc.Sequence
in action by implementing an incredibly simple sequence that we’re already familiar with - range.
import collections.abc
import math
class MyRange(collections.abc.Sequence):
def __init__(self, start, stop, step=1):
self.__start = start
self.__stop = stop
self.__step = step
self.__n = max(0, math.ceil((stop-start) / step))
super().__init__()
def __len__(self):
return self.__n
def __getitem__(self, offset):
if isinstance(offset, slice):
return itertools.islice(self, offset.start, offset.stop, offset.step)
if self.__n <= offset:
raise IndexError('range object index out of range')
return self.__start + offset * self.__step
def __repr__(self):
return f"{type(self).__name__}({self.__start},{self.__stop},{self.__step})"
# Let's use MyRange
range5 = MyRange(0, 5)
# convert to list
print(list(range5)) # [0, 1, 2, 3, 4]
# use indexing
print(range5[0], range5[1], range5[2]) # 0 1 2
# use 'in' keyword
print(3 in range5) # true
print(100 in range5) # false
# min/max/count
print(min(range5)) # 0
print(max(range5)) # 4
print(range5.count(4)) # 1
Mapping protocol
A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary.
A dictionary’s keys are almost arbitrary values. key have to be hashable, which means that keys must be immutable. that is, objects containing lists, dictionaries or other mutable types may not be used as keys.
mappings should support the following operations:
len(d)
Return the number of items in the dictionary d.
d[key]
Return the item of d with key key. Raises a KeyError if key is not in the map.
d[key] = value
Set d[key] to value.
del d[key]
Remove d[key] from d. Raises a KeyError if key is not in the map.
key in d
Return True if d has a key key, else False.
key not in d
Equivalent to not key in d.
iter(d)
Return an iterator over the keys of the dictionary. This is a shortcut for iter(d.keys()).
clear()
Remove all items from the dictionary.
copy()
Return a shallow copy of the dictionary.
@classmethod
fromkeys(iterable[, value])
Create a new dictionary with keys from iterable and values set to value.
get(key[, default])
Return the value for key if key is in the dictionary, else default.
If default is not given, it defaults to None, so that this method never raises a KeyError.
items()
Return a new view of the dictionary’s items ((key, value) pairs). See the documentation of view objects.
keys()
Return a new view of the dictionary’s keys. See the documentation of view objects.
pop(key[, default])
If key is in the dictionary, remove it and return its value, else return default.
If default is not given and key is not in the dictionary, a KeyError is raised.
popitem()
Remove and return a (key, value) pair from the dictionary. Pairs are returned in LIFO order.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default.
default defaults to None.
update([other])
Update the dictionary with the key/value pairs from other, overwriting existing keys. Return None.
values()
Return a new view of the dictionary’s values.
😱
Yes, that’s quite a big number of things. again, the collections
module has a class called MutableMapping
which takes care of most things and only asks that we implement __getitem__, __setitem__, __delitem__, __iter__, and __len__
functions
Example
Lets use MutableMapping
to create a new type of dictionary, one that sends a message to observers whenever it changes
from collections.abc import MutableMapping
class Observable: # from ex 02
def __init__(self):
self.handlers = []
def register(self, callable_):
self.handlers.append(callable_)
def notify(self, event, *args, **kwargs):
for handler in self.handlers:
handler(event, *args, **kwargs)
class ObservableMapping(MutableMapping):
def __init__(self, dict_ = {}, dicttype = None):
dicttype = dicttype or type(dict_)
self.data = dicttype(dict_)
self.events = Observable()
def __setitem__(self, key, value):
self.events.notify('set', self, key, value)
return self.data.__setitem__(key, value)
def __delitem__(self, key):
self.events.notify('del', self, key)
return self.data.__delitem__(key)
def __getitem__(self, key):
return self.data.__getitem__(key)
def __iter__(self):
return self.data.__iter__()
def __len__(self):
return self.data.__len__()
def handler(event, obj, key, *args, **kwargs):
print(event, repr(key), '-->>', *args, **kwargs)
d = ObservableMapping()
d.events.register(handler)
d['name'] = 'Sir Launcelot of Camelot'
d['favorite color'] = 'blue'
d.popitem()