Understand How Much Memory Your Python Objects Use

Python is a fantastic programming language. It is also known for being pretty slow, due mostly to its enormous flexibility and dynamic features. For many applications and domains, it is not a problem due to their requirements and various optimization techniques. It is less known that Python object graphs (nested dictionaries of lists and tuples and primitive types) take a significant amount of memory. This can be a much more severe limiting factor due to its effects on caching, virtual memory, multi-tenancy with other programs, and in general exhausting the available memory, which is a scarce and expensive resource.

It turns out that it is not difficult to figure out how much memory is actually consumed. In this article, I'll walk you through the intricacies of a Python object's memory management and show how to measure the consumed memory accurately.

In this article, I focus solely on CPython—the primary implementation of the Python programming language. The experiments and conclusions here don't apply to other Python implementations like IronPython, Jython, and PyPy.

Depending on the Python version, the numbers are sometimes a little different (especially for strings, which are always Unicode), but the concepts are the same. In my case, am using Python 3.10.

As of 1st January 2020, Python 2 is no longer supported, and you should have already upgraded to Python 3.

Hands-On Exploration of Python Memory Usage

First, let's explore a little bit and get a concrete sense of the actual memory usage of Python objects.

The `sys.getsizeof()` Built-in Function

The standard library's sys module provides the getsizeof() function. That function accepts an object (and optional default), calls the object's sizeof() method, and returns the result, so you can make your objects inspectable as well.

Measuring the Memory of Python Objects

Let's start with some numeric types:

1	import sys
2
3	sys.getsizeof(5)
4	28

Interesting. An integer takes 28 bytes.

1	sys.getsizeof(5.3)
2	24

Hmm… a float takes 24 bytes.

1	from decimal import Decimal
2	sys.getsizeof(Decimal(5.3))
3	104

Wow. 104 bytes! This really makes you think about whether you want to represent a large number of real numbers as floats or Decimals.

Let's move on to strings and collections:

sys.getsizeof('')
49
sys.getsizeof('1')
50
sys.getsizeof('12')
51
sys.getsizeof('123')
52
sys.getsizeof('1234')
53

OK. An empty string takes 49 bytes, and each additional character adds another byte. That says a lot about the tradeoffs of keeping multiple short strings where you'll pay the 49 bytes overhead for each one vs. a single long string where you pay the overhead only once.

The bytes object has an overhead of only 33 bytes.

1	sys.getsizeof(bytes())
2	33

Lets look at lists.

sys.getsizeof([])
56
sys.getsizeof([1])
64
sys.getsizeof([1, 2])
72
sys.getsizeof([1, 2,3])
80
sys.getsizeof([1, 2, 3, 4])
88

sys.getsizeof(['a long longlong string'])
64

What's going on? An empty list takes 56 bytes, but each additional int adds just 8 bytes, where the size of an int is 28 bytes. A list that contains a long string takes just 64 bytes.

The answer is simple. The list doesn't contain the int objects themselves. It just contains an 8-byte (on 64-bit versions of CPython) pointer to the actual int object. What that means is that the getsizeof() function doesn't return the actual memory of the list and all the objects it contains, but only the memory of the list and the pointers to its objects. In the next section I'll introduce the deep\_getsizeof() function, which addresses this issue.

sys.getsizeof(())
40
sys.getsizeof((1,))
48
sys.getsizeof((1,2,))
56
sys.getsizeof((1,2,3,))
64
sys.getsizeof((1, 2, 3, 4))
72
sys.getsizeof(('a long longlong string',))
48

The story is similar for tuples. The overhead of an empty tuple is 40 bytes vs. the 56 of a list. Again, this 16 bytes difference per sequence is low-hanging fruit if you have a data structure with a lot of small, immutable sequences.

sys.getsizeof(set())
216
sys.getsizeof(set([1))
216
sys.getsizeof(set([1, 2, 3, 4]))
216

sys.getsizeof({})
64
sys.getsizeof(dict(a=1))
232
sys.getsizeof(dict(a=1, b=2, c=3))
232

Sets and dictionaries ostensibly don't grow at all when you add items, but note the enormous overhead.

The bottom line is that Python objects have a huge fixed overhead. If your data structure is composed of a large number of collection objects like strings, lists and dictionaries that contain a small number of items each, you pay a heavy toll.

The `deep\_getsizeof()` Function

Now that I've scared you half to death and also demonstrated that sys.getsizeof() can only tell you how much memory a primitive object takes, let's take a look at a more adequate solution. The deep\_getsizeof() function drills down recursively and calculates the actual memory usage of a Python object graph.

from collections.abc import Mapping, Container
from sys import getsizeof

def deep\_getsizeof(o, ids):
    """Find the memory footprint of a Python object

    This is a recursive function that drills down a Python object graph
    like a dictionary holding nested dictionaries with lists of lists
    and tuples and sets.

    The sys.getsizeof function does a shallow size of only. It counts each
    object inside a container as pointer only regardless of how big it
    really is.

    :param o: the object
    :param ids:
    :return:
    """
    d = deep\_getsizeof
    if id(o) in ids:
        return 0

    r = getsizeof(o)
    ids.add(id(o))

    if isinstance(o, str) or isinstance(0, str):
        return r

    if isinstance(o, Mapping):
        return r + sum(d(k, ids) + d(v, ids) for k, v in o.iteritems())

    if isinstance(o, Container):
        return r + sum(d(x, ids) for x in o)

    return r 

There are several interesting aspects to this function. It takes into account objects that are referenced multiple times and counts them only once by keeping track of object ids. The other interesting feature of the implementation is that it takes full advantage of the collections module's abstract base classes. That allows the function very concisely to handle any collection that implements either the Mapping or Container base classes instead of dealing directly with myriad collection types like: string, Unicode, bytes, list, tuple, dict, frozendict, OrderedDict, set, frozenset, etc.

Let's see it in action:

1	x = '1234567'
2	deep\_getsizeof(x, set())
3	56

A string of length 7 takes 56 bytes (49 overhead + 7 bytes for each character).

1	deep\_getsizeof([], set())
2	56

An empty list takes 56 bytes (just overhead).

1	deep\_getsizeof([x], set())
2	120

A list that contains the string "x" takes 124 bytes (56 + 8 + 56).

1	deep\_getsizeof([x, x, x, x, x], set())
2	152

A list that contains the string "x" five times takes 156 bytes (56 + 5\*8 + 56).

The last example shows that deep\_getsizeof() counts references to the same object (the x string) just once, but each reference's pointer is counted.

Treats or Tricks

It turns out that CPython has several tricks up its sleeve, so the numbers you get from deep\_getsizeof() don't fully represent the memory usage of a Python program.

Reference Counting

Python manages memory using reference counting semantics. Once an object is not referenced anymore, its memory is deallocated. But as long as there is a reference, the object will not be deallocated. Things like cyclical references can bite you pretty hard.

Small Objects

CPython manages small objects (less than 256 bytes) in special pools on 8-byte boundaries. There are pools for 1-8 bytes, 9-16 bytes, and all the way to 249-256 bytes. When an object of size 10 is allocated, it is allocated from the 16-byte pool for objects 9-16 bytes in size. So, even though it contains only 10 bytes of data, it will cost 16 bytes of memory. If you allocate 1,000,000 objects of size 10, you actually use 16,000,000 bytes and not 10,000,000 bytes as you may assume. This 60% extra overhead is obviously not trivial.

Integers

CPython keeps a global list of all the integers in the range -5 to 256. This optimization strategy makes sense because small integers pop up all over the place, and given that each integer takes 28 bytes, it saves a lot of memory for a typical program.

It also means that CPython pre-allocates 266 * 28 = 7448 bytes for all these integers, even if you don't use most of them. You can verify it by using the id() function that gives the pointer to the actual object. If you call id(x) for any x in the range -5 to 256, you will get the same result every time (for the same integer). But if you try it for integers outside this range, each one will be different (a new object is created on the fly every time).

Here are a few examples within the range:

1	id(-3)
2	9788832
3
4	id(-3)
5	9788832
6
7	id(-3)
8	9788832
9
10	id(201)
11	9795360
12
13	id(201)
14	9795360
15
16	id(201)
17	9795360

Here are some examples outside the range:

1	id(257)
2	140276939034224
3
4	id(301)
5	140276963839696
6
7	id(301)
8	140276963839696
9
10	id(-6)
11	140276963839696
12
13	id(-6)
14	140276963839696

Python Memory vs. System Memory

CPython is kind of possessive. In many cases, when memory objects in your program are not referenced anymore, they are not returned to the system (e.g. the small objects). This is good for your program if you allocate and deallocate many objects that belong to the same 8-byte pool because Python doesn't have to bother the system, which is relatively expensive. But it's not so great if your program normally uses X bytes and under some temporary condition it uses 100 times as much (e.g. parsing and processing a big configuration file only when it starts).

Now, that 100X memory may be trapped uselessly in your program, never to be used again and denying the system from allocating it to other programs. The irony is that if you use the processing module to run multiple instances of your program, you'll severely limit the number of instances you can run on a given machine.

Memory Profiler

To gauge and measure the actual memory usage of your program, you can use the memory\_profiler module. I played with it a little bit and I'm not sure I trust the results. Using it is very simple. You decorate a function (could be the main function) with an @profiler decorator, and when the program exits, the memory profiler prints to standard output a handy report that shows the total and changes in memory for every line. Here is a sample program I ran under the profiler:

from memory\_profiler import profile

@profile
def main():
    a = []
    b = []
    c = []
    for i in range(100000):
        a.append(5)
    for i in range(100000):
        b.append(300)
    for i in range(100000):
        c.append('123456789012345678901234567890')
    del a
    del b
    del c

    print('Done!')
    
if __name__ == '__main__':
    main()

Here is the output:

1	Filename: python_obj.py
2
3	Line # Mem usage Increment Occurrences Line Contents
4	=============================================================
5	3 17.3 MiB 17.3 MiB 1 @profile
6	4 def main():
7	5 17.3 MiB 0.0 MiB 1 a = []
8	6 17.3 MiB 0.0 MiB 1 b = []
9	7 17.3 MiB 0.0 MiB 1 c = []
10	8 18.0 MiB 0.0 MiB 100001 for i in range(100000):
11	9 18.0 MiB 0.8 MiB 100000 a.append(5)
12	10 18.7 MiB 0.0 MiB 100001 for i in range(100000):
13	11 18.7 MiB 0.7 MiB 100000 b.append(300)
14	12 19.5 MiB 0.0 MiB 100001 for i in range(100000):
15	13 19.5 MiB 0.8 MiB 100000 c.append('123456789012345678901234567890')
16	14 18.9 MiB -0.6 MiB 1 del a
17	15 18.2 MiB -0.8 MiB 1 del b
18	16 17.4 MiB -0.8 MiB 1 del c
19	17
20	18 17.4 MiB 0.0 MiB 1 print('Done!')

As you can see, there is 17.3 MB of memory overhead. The reason the memory doesn't increase when adding integers both inside and outside the [-5, 256] range and also when adding the string is that a single object is used in all cases. It's not clear why the first loop of range(100000) on line 9 adds 0.8MB while the second on line 11 adds just 0.7MB and the third loop on line 13 adds 0.8MB. Finally, when deleting the a, b and c lists, -0.6MB is released for a, -0.8MB is released for b, and -0.8MB is released for c.

How To Trace Memory Leaks in Your Python application with tracemalloc

tracemalloc is a Python module that acts as a debug tool to trace memory blocks allocated by Python. Once tracemalloc is enabled, you can obtain the following information :

identify where the object was allocated
give statistics on allocated memory
detect memory leaks by comparing snapshots

Consider the example below:

import tracemalloc

tracemalloc.start()

a = []
b = []
c = []
for i in range(100000):
    a.append(5)
for i in range(100000):
    b.append(300)
for i in range(100000):
    c.append('123456789012345678901234567890')
# del a
# del b
# del c


snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno'):
    print(stat)
    print(stat.traceback.format())
    

Explanation

tracemalloc.start()—starts the tracing of memory
tracemalloc.take_snapshot()—takes a memory snapshot and returns the Snapshot object
Snapshot.statistics()—sorts records of tracing and returns the number and size of objects from the traceback. lineno indicates that sorting will be done according to the line number in the file.

When you run the code, the output will be:

['  File "python_obj.py", line 13', "    c.append('123456789012345678901234567890')"]
python_obj.py:11: size=782 KiB, count=1, average=782 KiB
['  File "python_obj.py", line 11', '    b.append(300)'] 
python_obj.py:9: size=782 KiB, count=1, average=782 KiB
['  File "python_obj.py", line 9', '    a.append(5)']    
python_obj.py:5: size=576 B, count=1, average=576 B
['  File "python_obj.py", line 5', '    a = []']
python_obj.py:12: size=28 B, count=1, average=28 B
['  File "python_obj.py", line 12', '    for i in range(100000):']

Conclusion

CPython uses a lot of memory for its objects. It also uses various tricks and optimizations for memory management. By keeping track of your object's memory usage and being aware of the memory management model, you can significantly reduce the memory footprint of your program.

This post has been updated with contributions from Esther Vaati. Esther is a software developer and writer for Envato Tuts+.