Learn Python Series (#28) - Using Pickle and Shelve
Learn Python Series (#28) - Using Pickle and Shelve
Repository
https://github.com/python/cpython
What will I learn?
- In this episode of the
Learn Python Series
you will learn about two additional ways to serialize and de-serialize Python objects for persistent storage:pickle
andshelve
, - you will learn when to (not) use
pickle
over JSON, and when to (not) useshelve
over a "real" database environment, - also we'll discuss some dangers to be aware of when using
pickle
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution;
- The ambition to learn Python programming.
Difficulty
- Beginner
Curriculum (of the Learn Python Series
):
- Learn Python Series - Intro
- Learn Python Series (#2) - Handling Strings Part 1
- Learn Python Series (#3) - Handling Strings Part 2
- Learn Python Series (#4) - Round-Up #1
- Learn Python Series (#5) - Handling Lists Part 1
- Learn Python Series (#6) - Handling Lists Part 2
- Learn Python Series (#7) - Handling Dictionaries
- Learn Python Series (#8) - Handling Tuples
- Learn Python Series (#9) - Using Import
- Learn Python Series (#10) - Matplotlib Part 1
- Learn Python Series (#11) - NumPy Part 1
- Learn Python Series (#12) - Handling Files
- Learn Python Series (#13) - Mini Project - Developing a Web Crawler Part 1
- Learn Python Series (#14) - Mini Project - Developing a Web Crawler Part 2
- Learn Python Series (#15) - Handling JSON
- Learn Python Series (#16) - Mini Project - Developing a Web Crawler Part 3
- Learn Python Series (#17) - Roundup #2 - Combining and analyzing any-to-any multi-currency historical data
- Learn Python Series (#18) - PyMongo Part 1
- Learn Python Series (#19) - PyMongo Part 2
- Learn Python Series (#20) - PyMongo Part 3
- Learn Python Series (#21) - Handling Dates and Time Part 1
- Learn Python Series (#22) - Handling Dates and Time Part 2
- Learn Python Series (#23) - Handling Regular Expressions Part 1
- Learn Python Series (#24) - Handling Regular Expressions Part 2
- Learn Python Series (#25) - Handling Regular Expressions Part 3
- Learn Python Series (#26) - pipenv & Visual Studio Code
- Learn Python Series (#27) - Handling Strings Part 3 (F-Strings)
Additional sample code files
The full - and working! - iPython tutorial sample code file is included for you to download and run for yourself right here:
https://github.com/realScipio/learn-python-series/blob/master/pickle-tut01.ipynb
GitHub Account
Learn Python Series (#28) - Using Pickle and Shelve
Welcome to episode #28 of the Learn Python Series
! In episode #15 we focused our attention on the JSON file format and reading from and writing to .json
files. JSON is language- and platform-independent and human-readable as well. It can be used to serialize / deserialize JSON data to and from Python objects, and because .json
files can be stored on disk they can also be shared among processes and computer systems. However, great as JSON might be, for serializing / de-serializing Python objects, it does have its limitations, for not all Python object formats can be "JSON-i-fied": JSON for example doesn't properly differentiate between lists and tuples, object keys are required to be strings, and datetime objects could be customized to work with JSON but not "out-of-the-box" (requires custom (de)serialization). Also, there are situations where "human-readable" could be considered a security risk, ergo not in every situation using JSON is preferred.
Pickle vs JSON
Pickle
, which as a module is part of most Python distributions (including Anaconda), can also be used for serializing and de-serializing Python objects. "Pickling" out-of-the-box converts almost any Python object (apart from a few edge-case scenarios, like generators and lambda functions, which we haven't discussed yet in earlier episodes) into a character stream that can be saved to disk, where the character stream contains all the information that's needed to rebuild the object by the same or another Python program.
You could pickle the following object types:
- normal and unicode strings
- integers, floats, complex numbers
- lists, dictionaries, tuples, sets
- None, True and False
- (built-in) functions and classes defined at a module's top level
Nota bene:
As opposed to JSON however, Pickle is not platform independent (it can even vary per Python version), it's rather slow, uses a binary format (ergo not human-readable), and could be a security risk for executing arbitrary code, contained in the pickle, while de-serializating. So while this last sentence might not sound like a great sales-pitch to make a case for using Pickle, if you don't have language interoperability requirements for exchanging serialized objects, if you don't have to deal with untrusted data sources and if a binary format is OK or even preferred, then Pickle works great!
Let's see how Pickle works!
Working with pickle
In order to work with pickle, first import it:
import pickle
Serializing (pickling, dumping)
Now, say, we want to pickle a list of our favorite cryptos, like these:
fav_cryptos = [
"Steem",
"Steem Backed Dollars",
"Bitcoin",
"IoTeX",
"Litecoin",
"Stellar",
"Byteball",
"Tether"
]
Like json, pickle also has two main methods:
dump
, to serialize and "dump" a Python object to file, andload
, to de-serialize a pickled file object.
For writing the fav_cryptos
list to file via pickle, we need to first specify the filename:
filename = "fav_cryptos.p"
Next we define the file object, we open the file for writing via the open()
function, to which we pass in two arguments: the filename
, and wb
for writing in binary mode.
fileobject = open(filename, 'wb')
Now that the file is opened for writing to, use pickle.dump()
and pass in the object you want to pickle (in this case our fav_cryptos
list) as its first argument, and the fileobject
as its second argument:
pickle.dump(fav_cryptos, fileobject)
Then close the file object.
fileobject.close()
At this point, the object fav_cryptos
is saved on disk as the pickled file fav_cryptos.p
!
You could also use the following shorthand notation via with
, that will automatically close the file for you (for this example I'll use another list and pickle dumpfile):
import pickle
colors = [
'Green',
'Yellow',
'Orange',
'Red',
'Blue',
'Brown',
'White',
'Black'
]
with open('colors.p', 'wb') as f:
pickle.dump(colors, f)
De-serializing (unpickling, loading)
Unpickling a pickled file is quite similar: open()
the file again but now use the rb
flag (for reading in binary mode), and use pickle.load()
to assign it to a new variable:
import pickle
fileobject = open('fav_cryptos.p', 'rb')
unpickled_cryptos = pickle.load(fileobject)
fileobject.close()
print(type(unpickled_cryptos), unpickled_cryptos)
<class 'list'> ['Steem', 'Steem Backed Dollars', 'Bitcoin', 'IoTeX', 'Litecoin', 'Stellar', 'Byteball', 'Tether']
Or, again use the shorthand notation using with
:
import pickle
with open('colors.p', 'rb') as f:
unpickled_colors = pickle.load(f)
print(type(unpickled_colors), unpickled_colors)
<class 'list'> ['Green', 'Yellow', 'Orange', 'Red', 'Blue', 'Brown', 'White', 'Black']
As you can see, while printing the unpickled object, I also checked to see the unpickled file types, and they're both correctly typed as a list
. You could go a step further and compare the original to the unpickled objects to see if they are the same:
print(fav_cryptos == unpickled_cryptos)
print(colors == unpickled_colors)
True
True
Again, a word of caution using pickle
with untrusted data sources
As explained, because functions and classes could also be pickled and executed while unpickling, as a rule of thumb, simply never use pickle with unknown systems. Yet if you must for some reason, make sure to use an encrypted network connection, and/or cryptographically sign and verify the pickle, and/or restrict file system permissions.
Working with shelve
shelve
is built on top of pickle
, and it acts somewhat like a database. In fact, you can use shelve
as a persistent Python object store when you don't want to or can't use a "real" database. Shelved objects are also pickled, but via shelve
the objects are associated with a string key. This means you can access your pickled objects via their key, just like you would with a Python dictionary! shelve
is pretty convenient when serializing many objects.
In order to work with shelve
first import it:
import shelve
Serializing (shelving, dumping)
The shelve
syntax is pretty similar to pickle
. Let's shelve the objects that we unpickled before!
with shelve.open('test_shelf') as shelf:
shelf['cryptos'] = unpickled_cryptos
shelf['colors'] = unpickled_colors
At this point, a shelved database file is stored (test_shelf.db
on macOS, but on other systems, depending on the specific DBM implementation that is used, you might get output files with no extension, or with the extensions .bak
, .dat
, .dir
, or .pag
.)
De-serializing (ununshelving, loading)
In order to access the shelved data, just open the shelf via shelve.open()
and use it like you would with a "normal" Python dictionary:
with shelve.open('test_shelf') as shelf:
shelved_colors = shelf['colors']
print(shelved_colors)
['Green', 'Yellow', 'Orange', 'Red', 'Blue', 'Brown', 'White', 'Black']
If you don't know the keys that exist within the shelf, you can of course list them, like so:
with shelve.open('test_shelf') as shelf:
print(list(shelf.keys()))
['cryptos', 'colors']
Listing the key values (although quite slow), can be used via the values()
method:
with shelve.open('test_shelf') as shelf:
print(list(shelf.values()))
[['Steem', 'Steem Backed Dollars', 'Bitcoin', 'IoTeX', 'Litecoin', 'Stellar', 'Byteball', 'Tether'], ['Green', 'Yellow', 'Orange', 'Red', 'Blue', 'Brown', 'White', 'Black']]
Since shelves behave like dictionaries, if you want to iterate over all shelved items, you can:
with shelve.open('test_shelf') as shelf:
for key in shelf:
print(key, shelf[key])
cryptos ['Steem', 'Steem Backed Dollars', 'Bitcoin', 'IoTeX', 'Litecoin', 'Stellar', 'Byteball', 'Tether']
colors ['Green', 'Yellow', 'Orange', 'Red', 'Blue', 'Brown', 'White', 'Black']
Updating / modifying shelves
By default, a shelf doesn't track any updates / modifications on a de-serialized object.
So if you would try to do the following (to add Monero
to the shelved list of cryptos), the shelf itself isn't persisently updated:
with shelve.open('test_shelf') as shelf:
shelf['cryptos'].append('Monero')
with shelve.open('test_shelf') as shelf:
shelved_cryptos = shelf['cryptos']
print(shelved_cryptos)
['Steem', 'Steem Backed Dollars', 'Bitcoin', 'IoTeX', 'Litecoin', 'Stellar', 'Byteball', 'Tether']
As you can see, after re-loading the shelf Monero
is not contained in the shelved cryptos list.
You can of course do so, by two ways:
- de-serialize the shelf, create a copy, append the item (
Monero
) to the copy, and then store the entire copied item back to the shelf using its key:
with shelve.open('test_shelf') as shelf:
cryptos = shelf['cryptos']
cryptos.append('Monero')
shelf['cryptos'] = cryptos
with shelve.open('test_shelf') as shelf:
shelved_cryptos = shelf['cryptos']
print(shelved_cryptos)
['Steem', 'Steem Backed Dollars', 'Bitcoin', 'IoTeX', 'Litecoin', 'Stellar', 'Byteball', 'Tether', 'Monero']
And now Monero
is added to the persistent shelf.
- The second way, which is less verbose but is slower and demands more RAM usage, is by opening the shelve including the flag
writeback=True
, and directly appending the new item to the shelf (let's now addDash
sinceMonero
is already added):
with shelve.open('test_shelf', writeback=True) as shelf:
shelf['cryptos'].append('Dash')
with shelve.open('test_shelf') as shelf:
shelved_cryptos = shelf['cryptos']
print(shelved_cryptos)
['Steem', 'Steem Backed Dollars', 'Bitcoin', 'IoTeX', 'Litecoin', 'Stellar', 'Byteball', 'Tether', 'Monero', 'Dash']
Removing Dash
is of course as simple as:
with shelve.open('test_shelf', writeback=True) as shelf:
shelf['cryptos'].remove('Dash')
with shelve.open('test_shelf') as shelf:
shelved_cryptos = shelf['cryptos']
print(shelved_cryptos)
['Steem', 'Steem Backed Dollars', 'Bitcoin', 'IoTeX', 'Litecoin', 'Stellar', 'Byteball', 'Tether', 'Monero']
Deleting elements from a shelf
If you want to complete remove a shelf element, for example all cryptos stored in shelf['cryptos']
, then use the del
keyword:
with shelve.open('test_shelf') as shelf:
del shelf['cryptos']
with shelve.open('test_shelf') as shelf:
print(list(shelf.keys()))
['colors']
Shelf concurrency and Read-Only using flag=r
Please note that the underlying DBM module powering shelf databases, doesn't support concurrent writing, for example when multiple applications try to write to a shalve database at the same time / when opened.
DBM does however support concurrent reads, so a smart thing to do if you want to use concurrency, is to let the client that only wants to read from the shelve do so in read-only mode by passing flag=r
while opening the shelf.
For demonstration purposes, I'll now explicitly import the dbm
module as well, so that it can print an error message when the read-only shelve does try to write:
import dbm
with shelve.open('test_shelf', flag='r') as shelf:
try:
colors = shelf['colors']
colors.append('Pink')
shelf['colors'] = colors
print(shelf['colors'])
except dbm.error as err:
print(f"Woops! There's an error: {err}")
Woops! There's an error: cannot add item to database
What did we learn, hopefully?
Using pickle
and shelve
, although they should be treated with care (i.e. security issues when executing pickled code), is both pretty powerful and convenient, to me at least. If you would scroll back to my earlier #14 episode, on developing a mini-Steem crawler for account discovery, just imagine how much easier it would have been to simply apply a shelf and update the todo
, done
and all
files!
I hope you enjoyed this tutorial as much as I have writing it!
Woah, I loved how expository your code was. I am also a python programmer but am not too familiar with this module. Thanks for the tutorial. A question though. Why is the
dump
function there when a suitable alternative would be to commit the dictionary to a file simply using such code for exampleWouldn't this append the file to the document or is it for a more peculiar purpose?
Hi, and thx! ;-)
I've discussed 2 modules,
pickle
andshelve
(they're two different modules, whereshelve
is built on top ofpickle
). The key take-away of this episode is to explain thatdumping
viapickling
is a way to "serialize" a Python object (for example a dictionary or a list), which is kept in RAM and only works "inside the program), and is then "pickled" to save in a file in a binary mode, where the pickle contains the full instructions to un-pickle, ergo, to read it back, by another or the same program at another time, after the program was closed, and to put it back in RAM.Simple use case? When you're playing a game and you're saving your progress on level 14 before you to to sleep. Saving the "game state" == pickling ;-)
Ooh wow
That's elucidiated things
I fully get the application now
Its like saving a data state and all the changes with a single module. That's awesome
Thanks
Hey @yalzeee
Here's a tip for your valuable feedback! @Utopian-io loves and incentivises informative comments.
Contributing on Utopian
Learn how to contribute on our website.
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Thank you for your contribution.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]
Thx! It was pretty fun to write as well, especially this episode!
Thank you, scipio. Upvoted and resteemed!
@ArtTurtle is an upvote bot run by @Artopium dedicated to upvoting your art, music, fashion, video and books. Find out how you can get an upvote for every creative post you make by visitng @ArtTurtle and reading the latest report.
Hey @scipio
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!