Overview¶
Duck Typing¶
prefixtree provides two container classes, PrefixDict and PrefixSet. They are implementations of the MutableMapping and MutableSet Abstract Base Classes. Any modules that adhere to the principle of duck-typing should be able to accept a PrefixtDict or PrefixSet in place of a dict or set.
Compatability¶
prefixtree is implemented to be compatible with Python 2.x and Python 3.x. It has been tested against the following Python implementations:
- CPython 2.6
- CPython 2.7
- CPython 3.2
- PyPy 1.9.0
Continuous integration testing is provided by Travis CI.
Benchmarks¶
Benchmarks show that prefixtree is 200 times slower than the builtin dict and requires 10 times the memory.
Collection | Memory | Run Time |
---|---|---|
dict | 40MB | 0.34s |
PrefixDict | 453MB | 67s |
The following script was used to benchmark the memory usage and cpu utilisation of PrefixDict.
"""Test memory consumption and processing time with PrefixDict.
Use words from '/usr/share/dict/words' as keys for PrefixDict and measure
memory consumption and load time.
"""
import resource
import sys
from prefixtree import PrefixDict
if __name__ == '__main__':
start = resource.getrusage(resource.RUSAGE_SELF)
glossary = PrefixDict()
with open('/usr/share/dict/words') as words:
for word in words:
glossary[word.strip()] = None
stop = resource.getrusage(resource.RUSAGE_SELF)
rss_mb = (stop.ru_maxrss- start.ru_maxrss) / 1024.0 / 1024.0
tused = (stop.ru_utime + stop.ru_stime)
sys.stdout.write('{0} MB\n'.format(rss_mb))
sys.stdout.write('{0} seconds\n'.format(tused))
They are compared to the results for the following script testing the builtin dict:
"""Test memory consumption and processing time with builtin dict.
Use words from '/usr/share/dict/words' as keys for dict and measure memory
consumption and load time.
"""
import resource
import sys
if __name__ == '__main__':
start = resource.getrusage(resource.RUSAGE_SELF)
glossary = {}
with open('/usr/share/dict/words') as words:
for word in words:
glossary[word.strip()] = None
stop = resource.getrusage(resource.RUSAGE_SELF)
rss_mb = (stop.ru_maxrss- start.ru_maxrss) / 1024.0 / 1024.0
tused = (stop.ru_utime + stop.ru_stime)
sys.stdout.write('{0} MB\n'.format(rss_mb))
sys.stdout.write('{0} seconds\n'.format(tused))
The benchmarks where run using:
- CPython 3.2, 64-bit
- Max OSX 10.7.4
- 2010 Macbook Pro
The benchamrks values were averaged from three runs of each benchmark script.