skip to navigation
skip to content

Python Wiki

Python Insider Blog

Python 2 or 3?

Help Fund Python

[Python resources in languages other than English]

Non-English Resources

Add an event to this calendar.

Add an event to this calendar.

PEP:454
Title:Add a new tracemalloc module to trace Python memory allocations
Version:6af6b504c635
Last-Modified:2013-10-03 17:47:02 +0200 (Thu, 03 Oct 2013)
Author:Victor Stinner <victor.stinner at gmail.com>
Status:Draft
Type:Standards Track
Content-Type:text/x-rst
Created:3-September-2013
Python-Version:3.4

Abstract

Add a new tracemalloc module to trace memory blocks allocated by Python.

Rationale

Common debug tools tracing memory allocations read the C filename and line number. Using such tool to analyze Python memory allocations does not help because most memory block are allocated in the same C function, in PyMem_Malloc() for example.

There are debug tools dedicated to the Python language like Heapy and PySizer. These tools analyze objects type and/or content. They are useful when most memory leaks are instances of the same type and this type is only instantiated in a few functions. The problem is when the object type is very common like str or tuple, and it is hard to identify where these objects are instantiated.

Finding reference cycles is also a difficult problem. There are different tools to draw a diagram of all references. These tools cannot be used on large applications with thousands of objects because the diagram is too huge to be analyzed manually.

Proposal

Using the PEP 445, it becomes easy to setup an hook on Python memory allocators. A hook can inspect Python internals to retrieve the Python tracebacks.

This PEP proposes to add a new tracemalloc module. It is a debug tool to trace memory blocks allocated by Python. The module provides the following information:

  • Compute the differences between two snapshots to detect memory leaks
  • Statistics on allocated memory blocks per filename and per line number: total size, number and average size of allocated memory blocks
  • Traceback where a memory block was allocated

The API of the tracemalloc module is similar to the API of the faulthandler module: enable(), disable() and is_enabled() functions, an environment variable (PYTHONFAULTHANDLER and PYTHONTRACEMALLOC), a -X command line option (-X faulthandler and -X tracemalloc). See the documentation of the faulthandler module.

The tracemalloc module has been written for CPython. Other implementations of Python may not provide it.

API

To trace most memory blocks allocated by Python, the module should be enabled as early as possible by setting the PYTHONTRACEMALLOC environment variable to 1, or by using -X tracemalloc command line option. The tracemalloc.enable() function can also be called to start tracing Python memory allocations.

By default, a trace of an allocated memory block only stores one frame. Use the set_traceback_limit() function to store more frames.

Python memory blocks allocated in the tracemalloc module are also traced by default. Use add_exclude_filter(tracemalloc.__file__) to ignore these these memory allocations.

At fork, the module is automatically disabled in the child process.

Main Functions

cancel_tasks() function:

Cancel all scheduled tasks.

See also the get_tasks() function.

clear_traces() function:

Clear all traces and statistics on Python memory allocations, and reset the get_arena_size() and get_traced_memory() counters.

disable() function:

Stop tracing Python memory allocations and cancel scheduled tasks.

See also enable() and is_enabled() functions.

enable() function:

Start tracing Python memory allocations.

At fork, the module is automatically disabled in the child process.

See also disable() and is_enabled() functions.

get_stats() function:

Get statistics on traced Python memory blocks as a dictionary {filename (str): {line_number (int): stats}} where stats in a (size: int, count: int) tuple, filename and line_number can be None.

Return an empty dictionary if the tracemalloc module is disabled.

See also the get_traces() function.

get_tasks() function:

Get the list of scheduled tasks, list of Task instances.

is_enabled() function:

True if the tracemalloc module is tracing Python memory allocations, False otherwise.

See also enable() and disable() functions.

Trace Functions

get_traceback_limit() function:

Get the maximum number of frames stored in the traceback of a trace of a memory block.

Use the set_traceback_limit() function to change the limit.

get_object_address(obj) function:

Get the address of the memory block of the specified Python object.

A Python object can be composed by multiple memory blocks, the function only returns the address of the main memory block.

See also get_object_trace() and gc.get_referrers() functions.

get_object_trace(obj) function:

Get the trace of a Python object obj as a (size: int, traceback) tuple where traceback is a tuple of (filename: str, lineno: int) tuples, filename and lineno can be None.

The function only returns the trace of the main memory block of the object. The size of the trace is smaller than the total size of the object if the object is composed by more than one memory block.

Return None if the tracemalloc module did not trace the allocation of the object.

See also get_object_address(), get_trace(), get_traces(), gc.get_referrers() and sys.getsizeof() functions.

get_trace(address) function:

Get the trace of a memory block as a (size: int, traceback) tuple where traceback is a tuple of (filename: str, lineno: int) tuples, filename and lineno can be None.

Return None if the tracemalloc module did not trace the allocation of the memory block.

See also get_object_trace(), get_stats() and get_traces() functions.

get_traces() function:

Get all traces of Python memory allocations as a dictionary {address (int): trace} where trace is a (size: int, traceback) and traceback is a list of (filename: str, lineno: int). traceback can be empty, filename and lineno can be None.

Return an empty dictionary if the tracemalloc module is disabled.

See also get_object_trace(), get_stats() and get_trace() functions.

set_traceback_limit(nframe: int) function:

Set the maximum number of frames stored in the traceback of a trace of a memory block.

Storing the traceback of each memory allocation has an important overhead on the memory usage. Use the get_tracemalloc_memory() function to measure the overhead and the add_filter() function to select which memory allocations are traced.

Use the get_traceback_limit() function to get the current limit.

Filter Functions

add_filter(filter) function:

Add a new filter on Python memory allocations, filter is a Filter instance.

All inclusive filters are applied at once, a memory allocation is only ignored if no inclusive filter match its trace. A memory allocation is ignored if at least one exclusive filter matchs its trace.

The new filter is not applied on already collected traces. Use the clear_traces() function to ensure that all traces match the new filter.

add_include_filter(filename: str, lineno: int=None, traceback: bool=False) function:

Add an inclusive filter: helper for the add_filter() method creating a Filter instance with the Filter.include attribute set to True.

Example: tracemalloc.add_include_filter(tracemalloc.__file__) only includes memory blocks allocated by the tracemalloc module.

add_exclude_filter(filename: str, lineno: int=None, traceback: bool=False) function:

Add an exclusive filter: helper for the add_filter() method creating a Filter instance with the Filter.include attribute set to False.

Example: tracemalloc.add_exclude_filter(tracemalloc.__file__) ignores memory blocks allocated by the tracemalloc module.

clear_filters() function:

Reset the filter list.

See also the get_filters() function.

get_filters() function:

Get the filters on Python memory allocations as list of Filter instances.

See also the clear_filters() function.

Metric Functions

The following functions can be used to add metrics to a snapshot, see the Snapshot.add_metric() method.

get_allocated_blocks() function:

Get the current number of allocated memory blocks.

get_arena_size() function:

Get the size in bytes of traced arenas.

See also the get_pymalloc_stats() function.

get_process_memory() function:

Get the memory usage of the current process as a (rss: int, vms: int) tuple, rss is the "Resident Set Size" in bytes and vms is the size of the virtual memory in bytes

Return None if the platform is not supported.

get_pymalloc_stats() function:

Get statistics on the pymalloc allocator as a dictionary.

Key Description
alignment Alignment of addresses returned to the user.
threshold Small block threshold in bytes: pymalloc uses PyMem_RawMalloc() for allocation greater than threshold.
nalloc Number of times object malloc called
arena_size Arena size in bytes
total_arenas Number of calls to new_arena(): total number of allocated arenas, including released arenas
max_arenas Maximum number of arenas
arenas Number of arenas currently allocated
allocated_bytes Number of bytes in allocated blocks
available_bytes Number of bytes in available blocks in used pools
pool_size Pool size in bytes
free_pools Number of unused pools
pool_headers Number of bytes wasted in pool headers
quantization Number of bytes in used and full pools wasted due to quantization, i.e. the necessarily leftover space at the ends of used and full pools.
arena_alignment Number of bytes for arena alignment padding

The function is not available if Python is compiled without pymalloc.

See also get_arena_size() and sys._debugmallocstats() functions.

get_traced_memory() function:

Get the current size and maximum size of memory blocks traced by the tracemalloc module as a tuple: (size: int, max_size: int).

get_tracemalloc_memory() function:

Get the memory usage in bytes of the tracemalloc module as a tuple: (size: int, free: int).

  • size: total size of bytes allocated by the module, including free bytes
  • free: number of free bytes available to store data

get_unicode_interned() function:

Get the size in bytes and the length of the dictionary of Unicode interned strings as a (size: int, length: int) tuple.

The size is the size of the dictionary, excluding the size of strings.

DisplayTop

DisplayTop() class:

Display the top of allocated memory blocks.

display(count=10, group_by="line", cumulative=False, file=None, callback=None) method:

Take a snapshot and display the top count biggest allocated memory blocks grouped by group_by.

callback is an optional callable object which can be used to add metrics to a snapshot. It is called with only one parameter: the newly created snapshot instance. Use the Snapshot.add_metric() method to add new metric.

Return the snapshot, a Snapshot instance.

display_snapshot(snapshot, count=10, group_by="line", cumulative=False, file=None) method:

Display a snapshot of memory blocks allocated by Python, snapshot is a Snapshot instance.

display_top_diff(top_diff, count=10, file=None) method:

Display differences between two GroupedStats instances, top_diff is a StatsDiff instance.

display_top_stats(top_stats, count=10, file=None) method:

Display the top of allocated memory blocks grouped by the GroupedStats.group_by attribute of top_stats, top_stats is a GroupedStats instance.

average attribute:

If True (default value), display the average size of memory blocks.

color attribute:

If True, always use colors. If False, never use colors. The default value is None: use colors if the file parameter is a TTY device.

compare_to_previous attribute:

If True (default value), compare to the previous snapshot. If False, compare to the first snapshot.

filename_parts attribute:

Number of displayed filename parts (int, default: 3). Extra parts are replaced with '...'.

metrics attribute:

If True (default value), display metrics: see Snapshot.metrics.

previous_top_stats attribute:

Previous GroupedStats instance, or first GroupedStats instance if compare_to_previous is False, used to display the differences between two snapshots.

size attribute:

If True (default value), display the size of memory blocks.

DisplayTopTask

DisplayTopTask(count=10, group_by="line", cumulative=False, file=sys.stdout, callback=None) class:

Task taking temporary snapshots and displaying the top count memory allocations grouped by group_by.

DisplayTopTask is based on the Task class and so inherit all attributes and methods, especially:

  • Task.cancel()
  • Task.schedule()
  • Task.set_delay()
  • Task.set_memory_threshold()

Modify the display_top attribute to customize the display.

display() method:

Take a snapshot and display the top count biggest allocated memory blocks grouped by group_by using the display_top attribute.

Return the snapshot, a Snapshot instance.

callback attribute:

callback is an optional callable object which can be used to add metrics to a snapshot. It is called with only one parameter: the newly created snapshot instance. Use the Snapshot.add_metric() method to add new metric.

count attribute:

Maximum number of displayed memory blocks.

cumulative attribute:

If True, cumulate size and count of memory blocks of all frames of each trace, not only the most recent frame. The default value is False.

The option is ignored if the traceback limit is less than 2, see the get_traceback_limit() function.

display_top attribute:

Instance of DisplayTop.

file attribute:

The top is written into file.

group_by attribute:

Determine how memory allocations are grouped: see Snapshot.top_by() for the available values.

Filter

Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False) class:

Filter to select which memory allocations are traced. Filters can be used to reduce the memory usage of the tracemalloc module, which can be read using the get_tracemalloc_memory() function.

match(filename: str, lineno: int) method:

Return True if the filter matchs the filename and line number, False otherwise.

match_filename(filename: str) method:

Return True if the filter matchs the filename, False otherwise.

match_lineno(lineno: int) method:

Return True if the filter matchs the line number, False otherwise.

match_traceback(traceback) method:

Return True if the filter matchs the traceback, False otherwise.

traceback is a tuple of (filename: str, lineno: int) tuples.

include attribute:

If include is True, only trace memory blocks allocated in a file with a name matching filename pattern at line number lineno.

If include is False, ignore memory blocks allocated in a file with a name matching filename :attr`pattern` at line number lineno.

lineno attribute:

Line number (int). If is is None or less than 1, it matches any line number.

pattern attribute:

The filename pattern can contain one or many * joker characters which match any substring, including an empty string. The .pyc and .pyo file extensions are replaced with .py. On Windows, the comparison is case insensitive and the alternative separator / is replaced with the standard separator \.

traceback attribute:

If traceback is True, all frames of the traceback are checked. If traceback is False, only the most recent frame is checked.

This attribute is ignored if the traceback limit is less than 2. See the get_traceback_limit() function.

GroupedStats

GroupedStats(timestamp: datetime.datetime, stats: dict, group_by: str, cumulative=False, metrics: dict=None) class:

Top of allocated memory blocks grouped by group_by as a dictionary.

The Snapshot.top_by() method creates a GroupedStats instance.

compare_to(old_stats: GroupedStats=None) method:

Compare to an older GroupedStats instance. Return a StatsDiff instance.

The StatsDiff.differences list is not sorted: call the StatsDiff.sort method to sort the list.

None values are replaced with an empty string for filenames or zero for line numbers, because str and int cannot be compared to None.

cumulative attribute:

If True, cumulate size and count of memory blocks of all frames of the traceback of a trace, not only the most recent frame.

metrics attribute:

Dictionary storing metrics read when the snapshot was created: {name (str): metric} where metric type is Metric.

group_by attribute:

Determine how memory allocations were grouped: see Snapshot.top_by() for the available values.

stats attribute:

Dictionary {key: stats} where the key type depends on the group_by attribute and stats is a (size: int, count: int) tuple.

See the Snapshot.top_by() method.

timestamp attribute:

Creation date and time of the snapshot, datetime.datetime instance.

Metric

Metric(name: str, value: int, format: str) class:

Value of a metric when a snapshot is created.

name attribute:

Name of the metric.

value attribute:

Value of the metric.

format attribute:

Format of the metric:

  • 'int': a number
  • 'percent': percentage, 1.0 means 100%
  • 'size': a size in bytes

Snapshot

Snapshot(timestamp: datetime.datetime, pid: int, traces: dict=None, stats: dict=None, metrics: dict=None) class:

Snapshot of traces and statistics on memory blocks allocated by Python.

Use TakeSnapshotTask to take regulary snapshots.

add_gc_metrics() method:

Add a metric on garbage collector:

  • gc.objects: total number of Python objects

See the gc module.

add_metric(name: str, value: int, format: str) method:

Helper to add a Metric instance to Snapshot.metrics. Return the newly created Metric instance.

Raise an exception if the name is already present in Snapshot.metrics.

add_process_memory_metrics() method:

Add metrics on the process memory:

  • process_memory.rss: Resident Set Size
  • process_memory.vms: Virtual Memory Size

These metrics are only available if the get_process_memory() function is available on the platform.

add_pymalloc_metrics() method:

Add metrics on the Python memory allocator (pymalloc):

  • pymalloc.blocks: number of allocated memory blocks
  • pymalloc.size: size of pymalloc arenas
  • pymalloc.max_size: maximum size of pymalloc arenas
  • pymalloc.allocated: number of allocated bytes
  • pymalloc.free: number of free bytes
  • pymalloc.fragmentation: fragmentation percentage of the arenas

These metrics are only available if Python is compiled in debug mode, except pymalloc.blocks which is always available.

add_tracemalloc_metrics() method:

Add metrics on the tracemalloc module:

  • tracemalloc.traced.size: size of memory blocks traced by the tracemalloc module
  • tracemalloc.traced.max_size: maximum size of memory blocks traced by the tracemalloc module
  • tracemalloc.traces: number of traces of Python memory blocks
  • tracemalloc.module.size: total size of bytes allocated by the tracemalloc module, including free bytes
  • tracemalloc.module.free: number of free bytes available for the tracemalloc module
  • tracemalloc.module.fragmentation: percentage of fragmentation of the memory allocated by the tracemalloc module
  • tracemalloc.arena_size: size of traced arenas

tracemalloc.traces metric is only present if the snapshot was created with traces.

add_unicode_metrics() method:

Add metrics on the Unicode interned strings:

  • unicode_interned.size: size of the dictionary, excluding size of strings
  • unicode_interned.len: length of the dictionary

apply_filters(filters) method:

Apply filters on the traces and stats dictionaries, filters is a list of Filter instances.

create(traces=False, metrics=True) classmethod:

Take a snapshot of traces and/or statistics of allocated memory blocks.

If traces is True, get_traces is called and its result is stored in the Snapshot.traces attribute. This attribute contains more information than Snapshot.stats and uses more memory and more disk space. If traces is False, Snapshot.traces is set to None.

If metrics is True, fill Snapshot.metrics with metrics using the following methods:

  • add_gc_metrics
  • add_process_memory_metrics
  • add_pymalloc_metrics
  • add_tracemalloc_metrics
  • add_unicode_metrics

If metrics is False, Snapshot.metrics is set to an empty dictionary.

Tracebacks of traces are limited to traceback_limit frames. Call set_traceback_limit() before calling Snapshot.create() to store more frames.

The tracemalloc module must be enabled to take a snapshot. See the the enable() function.

get_metric(name, default=None) method:

Get the value of the metric called name. Return default if the metric does not exist.

load(filename, traces=True) classmethod:

Load a snapshot from a file.

If traces is False, don't load traces.

top_by(group_by: str, cumulative: bool=False) method:

Compute top statistics grouped by group_by as a GroupedStats instance:

group_by description key type
'filename' filename str
'line' filename and line number (str, int)
'address' memory block address int

If cumulative is True, cumulate size and count of memory blocks of all frames of the traceback of a trace, not only the most recent frame. The cumulative parameter is ignored if group_by is 'address' or if the traceback limit is less than 2.

write(filename) method:

Write the snapshot into a file.

metrics attribute:

Dictionary storing metrics read when the snapshot was created: {name (str): metric} where metric type is Metric.

pid attribute:

Identifier of the process which created the snapshot, result of os.getpid().

stats attribute:

Statistics on traced Python memory, result of the get_stats() function.

traceback_limit attribute:

Maximum number of frames stored in a trace of a memory block allocated by Python.

traces attribute:

Traces of Python memory allocations, result of the get_traces() function, can be None.

timestamp attribute:

Creation date and time of the snapshot, datetime.datetime instance.

StatsDiff

StatsDiff(differences, old_stats, new_stats) class:

Differences between two GroupedStats instances.

The GroupedStats.compare_to method creates a StatsDiff instance.

sort() method:

Sort the differences list from the biggest difference to the smallest difference. Sort by abs(size_diff), size, abs(count_diff), count and then by key.

differences attribute:

Differences between old_stats and new_stats as a list of (size_diff, size, count_diff, count, key) tuples. size_diff, size, count_diff and count are int. The key type depends on the GroupedStats.group_by attribute of new_stats: see the Snapshot.top_by() method.

old_stats attribute:

Old GroupedStats instance, can be None.

new_stats attribute:

New GroupedStats instance.

Task

Task(func, *args, **kw) class:

Task calling func(*args, **kw). When scheduled, the task is called when the traced memory is increased or decreased by more than threshold bytes, or after delay seconds.

call() method:

Call func(*args, **kw) and return the result.

cancel() method:

Cancel the task.

Do nothing if the task is not scheduled.

get_delay() method:

Get the delay in seconds. If the delay is None, the timer is disabled.

get_memory_threshold() method:

Get the threshold of the traced memory. When scheduled, the task is called when the traced memory is increased or decreased by more than threshold bytes. The memory threshold is disabled if threshold is None.

See also the set_memory_threshold() method and the get_traced_memory() function.

schedule(repeat: int=None) method:

Schedule the task repeat times. If repeat is None, the task is rescheduled after each call until it is cancelled.

If the method is called twice, the task is rescheduled with the new repeat parameter.

The task must have a memory threshold or a delay: see set_delay() and set_memory_threshold() methods. The tracemalloc must be enabled to schedule a task: see the enable function.

The task is cancelled if the call() method raises an exception. The task can be cancelled using the cancel() method or the cancel_tasks() function.

set_delay(seconds: int) method:

Set the delay in seconds before the task will be called. Set the delay to None to disable the timer.

The timer is based on the Python memory allocator, it is not real time. The task is called after at least delay seconds, it is not called exactly after delay seconds if no Python memory allocation occurred. The timer has a resolution of 1 second.

The task is rescheduled if it was scheduled.

set_memory_threshold(size: int) method:

Set the threshold of the traced memory. When scheduled, the task is called when the traced memory is increased or decreased by more than threshold bytes. Set the threshold to None to disable it.

The task is rescheduled if it was scheduled.

See also the get_memory_threshold() method and the get_traced_memory() function.

func attribute:

Function, callable object.

func_args attribute:

Function arguments, tuple.

func_kwargs attribute:

Function keyword arguments, dict. It can be None.

TakeSnapshotTask

TakeSnapshotTask(filename_template: str="tracemalloc-$counter.pickle", traces: bool=False, metrics: bool=True, callback: callable=None) class:

Task taking snapshots of Python memory allocations and writing them into files.

TakeSnapshotTask is based on the Task class and so inherit all attributes and methods, especially:

  • Task.cancel()
  • Task.schedule()
  • Task.set_delay()
  • Task.set_memory_threshold()

take_snapshot() method:

Take a snapshot and write it into a file. Return (snapshot, filename) where snapshot is a Snapshot instance and filename type is str.

callback attribute:

callback is an optional callable object which can be used to add metrics to a snapshot. It is called with only one parameter: the newly created snapshot instance. Use the Snapshot.add_metric() method to add new metric.

filename_template attribute:

Template to create a filename. The template supports the following variables:

  • $pid: identifier of the current process
  • $timestamp: current date and time
  • $counter: counter starting at 1 and incremented at each snapshot, formatted as 4 decimal digits

The default template is 'tracemalloc-$counter.pickle'.

metrics attribute:

Parameter passed to the Snapshot.create() function.

traces attribute:

Parameter passed to the Snapshot.create() function.