| PEP: | 428 |
|---|---|
| Title: | The pathlib module -- object-oriented filesystem paths |
| Version: | 53a41d76241c |
| Last-Modified: | 2013-03-01 23:24:24 +0100 (Fri, 01 Mar 2013) |
| Author: | Antoine Pitrou <solipsis at pitrou.net> |
| Status: | Draft |
| Type: | Standards Track |
| Content-Type: | text/x-rst |
| Created: | 30-July-2012 |
| Python-Version: | 3.4 |
| Post-History: | http://mail.python.org/pipermail/python-ideas/2012-October/016338.html |
Contents
Abstract
This PEP proposes the inclusion of a third-party module, pathlib [1], in the standard library. The inclusion is proposed under the provisional label, as described in PEP 411. Therefore, API changes can be done, either as part of the PEP process, or after acceptance in the standard library (and until the provisional label is removed).
The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.
Implementation
The implementation of this proposal is tracked in the pep428 branch of pathlib's Mercurial repository [6].
Why an object-oriented API
The rationale to represent filesystem paths using dedicated classes is the same as for other kinds of stateless objects, such as dates, times or IP addresses. Python has been slowly moving away from strictly replicating the C language's APIs to providing better, more helpful abstractions around all kinds of common functionality. Even if this PEP isn't accepted, it is likely that another form of filesystem handling abstraction will be adopted one day into the standard library.
Indeed, many people will prefer handling dates and times using the high-level objects provided by the datetime module, rather than using numeric timestamps and the time module API. Moreover, using a dedicated class allows to enable desirable behaviours by default, for example the case insensitivity of Windows paths.
Proposal
Class hierarchy
The pathlib [1] module implements a simple hierarchy of classes:
+----------+
| |
---------| PurePath |--------
| | | |
| +----------+ |
| | |
| | |
v | v
+---------------+ | +------------+
| | | | |
| PurePosixPath | | | PureNTPath |
| | | | |
+---------------+ | +------------+
| v |
| +------+ |
| | | |
| -------| Path |------ |
| | | | | |
| | +------+ | |
| | | |
| | | |
v v v v
+-----------+ +--------+
| | | |
| PosixPath | | NTPath |
| | | |
+-----------+ +--------+
This hierarchy divides path classes along two dimensions:
- a path class can be either pure or concrete: pure classes support only operations that don't need to do any actual I/O, which are most path manipulation operations; concrete classes support all the operations of pure classes, plus operations that do I/O.
- a path class is of a given flavour according to the kind of operating system paths it represents. pathlib [1] implements two flavours: NT paths for the filesystem semantics embodied in Windows systems, POSIX paths for other systems (os.name's terminology is re-used here).
Any pure class can be instantiated on any system: for example, you can manipulate PurePosixPath objects under Windows, PureNTPath objects under Unix, and so on. However, concrete classes can only be instantiated on a matching system: indeed, it would be error-prone to start doing I/O with NTPath objects under Unix, or vice-versa.
Furthermore, there are two base classes which also act as system-dependent factories: PurePath will instantiate either a PurePosixPath or a PureNTPath depending on the operating system. Similarly, Path will instantiate either a PosixPath or a NTPath.
It is expected that, in most uses, using the Path class is adequate, which is why it has the shortest name of all.
No confusion with builtins
In this proposal, the path classes do not derive from a builtin type. This contrasts with some other Path class proposals which were derived from str. They also do not pretend to implement the sequence protocol: if you want a path to act as a sequence, you have to lookup a dedicate attribute (the parts attribute).
By avoiding to pass as builtin types, the path classes minimize the potential for confusion if they are combined by accident with genuine builtin types.
Immutability
Path objects are immutable, which makes them hashable and also prevents a class of programming errors.
Sane behaviour
Little of the functionality from os.path is reused. Many os.path functions are tied by backwards compatibility to confusing or plain wrong behaviour (for example, the fact that os.path.abspath() simplifies ".." path components without resolving symlinks first).
Comparisons
Paths of the same flavour are comparable and orderable, whether pure or not:
>>> PurePosixPath('a') == PurePosixPath('b')
False
>>> PurePosixPath('a') < PurePosixPath('b')
True
>>> PurePosixPath('a') == PosixPath('a')
True
Comparing and ordering Windows path objects is case-insensitive:
>>> PureNTPath('a') == PureNTPath('A')
True
Paths of different flavours always compare unequal, and cannot be ordered:
>>> PurePosixPath('a') == PureNTPath('a')
False
>>> PurePosixPath('a') < PureNTPath('a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: PurePosixPath() < PureNTPath()
Useful notations
The API tries to provide useful notations all the while avoiding magic. Some examples:
>>> p = Path('/home/antoine/pathlib/setup.py')
>>> p.name
'setup.py'
>>> p.ext
'.py'
>>> p.root
'/'
>>> p.parts
<PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
>>> list(p.parents())
[PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
>>> p.exists()
True
>>> p.st_size
928
Pure paths API
The philosophy of the PurePath API is to provide a consistent array of useful path manipulation operations, without exposing a hodge-podge of functions like os.path does.
Definitions
First a couple of conventions:
- All paths can have a drive and a root. For POSIX paths, the drive is always empty.
- A relative path has neither drive nor root.
- A POSIX path is absolute if it has a root. A Windows path is absolute if it has both a drive and a root. A Windows UNC path (e.g. \\some\share\myfile.txt) always has a drive and a root (here, \\some\share and \, respectively).
- A drive which has either a drive or a root is said to be anchored. Its anchor is the concatenation of the drive and root. Under POSIX, "anchored" is the same as "absolute".
Construction
We will present construction and joining together since they expose similar semantics.
The simplest way to construct a path is to pass it its string representation:
>>> PurePath('setup.py')
PurePosixPath('setup.py')
Extraneous path separators and "." components are eliminated:
>>> PurePath('a///b/c/./d/')
PurePosixPath('a/b/c/d')
If you pass several arguments, they will be automatically joined:
>>> PurePath('docs', 'Makefile')
PurePosixPath('docs/Makefile')
Joining semantics are similar to os.path.join, in that anchored paths ignore the information from the previously joined components:
>>> PurePath('/etc', '/usr', 'bin')
PurePosixPath('/usr/bin')
However, with Windows paths, the drive is retained as necessary:
>>> PureNTPath('c:/foo', '/Windows')
PureNTPath('c:\\Windows')
>>> PureNTPath('c:/foo', 'd:')
PureNTPath('d:')
Also, path separators are normalized to the platform default:
>>> PureNTPath('a/b') == PureNTPath('a\\b')
True
Extraneous path separators and "." components are eliminated, but not ".." components:
>>> PurePosixPath('a//b/./c/')
PurePosixPath('a/b/c')
>>> PurePosixPath('a/../b')
PurePosixPath('a/../b')
Multiple leading slashes are treated differently depending on the path flavour. They are always retained on Windows paths (because of the UNC notation):
>>> PureNTPath('//some/path')
PureNTPath('\\\\some\\path\\')
On POSIX, they are collapsed except if there are exactly two leading slashes, which is a special case in the POSIX specification on pathname resolution [7] (this is also necessary for Cygwin compatibility):
>>> PurePosixPath('///some/path')
PurePosixPath('/some/path')
>>> PurePosixPath('//some/path')
PurePosixPath('//some/path')
Calling the constructor without any argument creates a path object pointing to the logical "current directory":
>>> PurePosixPath()
PurePosixPath('.')
Representing
To represent a path (e.g. to pass it to third-party libraries), just call str() on it:
>>> p = PurePath('/home/antoine/pathlib/setup.py')
>>> str(p)
'/home/antoine/pathlib/setup.py'
>>> p = PureNTPath('c:/windows')
>>> str(p)
'c:\\windows'
To force the string representation with forward slashes, use the as_posix() method:
>>> p.as_posix() 'c:/windows'
To get the bytes representation (which might be useful under Unix systems), call bytes() on it, or use the as_bytes() method:
>>> bytes(p) b'/home/antoine/pathlib/setup.py'
To represent the path as a file URI, call the as_uri() method:
>>> p = PurePosixPath('/etc/passwd')
>>> p.as_uri()
'file:///etc/passwd'
>>> p = PureNTPath('c:/Windows')
>>> p.as_uri()
'file:///c:/Windows'
Properties
Seven simple properties are provided on every path (each can be empty):
>>> p = PureNTPath('c:/Downloads/pathlib.tar.gz')
>>> p.drive
'c:'
>>> p.root
'\\'
>>> p.anchor
'c:\\'
>>> p.name
'pathlib.tar.gz'
>>> p.basename
'pathlib.tar'
>>> p.suffix
'.gz'
>>> p.suffixes
['.tar', '.gz']
Deriving new paths
Joining
A path can be joined with another using the / operator:
>>> p = PurePosixPath('foo')
>>> p / 'bar'
PurePosixPath('foo/bar')
>>> p / PurePosixPath('bar')
PurePosixPath('foo/bar')
>>> 'bar' / p
PurePosixPath('bar/foo')
As with the constructor, multiple path components can be specified, either collapsed or separately:
>>> p / 'bar/xyzzy'
PurePosixPath('foo/bar/xyzzy')
>>> p / 'bar' / 'xyzzy'
PurePosixPath('foo/bar/xyzzy')
A joinpath() method is also provided, with the same behaviour. It can serve as a factory function:
>>> path_factory = p.joinpath
>>> path_factory('bar')
PurePosixPath('foo/bar')
Changing the path's final component
The with_name() method returns a new path, with the name changed:
>>> p = PureNTPath('c:/Downloads/pathlib.tar.gz')
>>> p.with_name('setup.py')
PureNTPath('c:\\Downloads\\setup.py')
It fails with a ValueError if the path doesn't have an actual name:
>>> p = PureNTPath('c:/')
>>> p.with_name('setup.py')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pathlib.py", line 875, in with_name
raise ValueError("%r has an empty name" % (self,))
ValueError: PureNTPath('c:\\') has an empty name
>>> p.name
''
The with_suffix() method returns a new path with the suffix changed. However, if the path has no suffix, the new suffix is added:
>>> p = PureNTPath('c:/Downloads/pathlib.tar.gz')
>>> p.with_suffix('.bz2')
PureNTPath('c:\\Downloads\\pathlib.tar.bz2')
>>> p = PureNTPath('README')
>>> p.with_suffix('.bz2')
PureNTPath('README.bz2')
Making the path relative
The relative() method computes the relative difference of a path to another:
>>> PurePosixPath('/usr/bin/python').relative('/usr')
PurePosixPath('bin/python')
ValueError is raised if the method cannot return a meaningful value:
>>> PurePosixPath('/usr/bin/python').relative('/etc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pathlib.py", line 926, in relative
.format(str(self), str(formatted)))
ValueError: '/usr/bin/python' does not start with '/etc'
Sequence-like access
The parts property provides read-only sequence access to a path object:
>>> p = PurePosixPath('/etc/init.d')
>>> p.parts
<PurePosixPath.parts: ['/', 'etc', 'init.d']>
Simple indexing returns the invidual path component as a string, while slicing returns a new path object constructed from the selected components:
>>> p.parts[-1]
'init.d'
>>> p.parts[:-1]
PurePosixPath('/etc')
Windows paths handle the drive and the root as a single path component:
>>> p = PureNTPath('c:/setup.py')
>>> p.parts
<PureNTPath.parts: ['c:\\', 'setup.py']>
>>> p.root
'\\'
>>> p.parts[0]
'c:\\'
(separating them would be wrong, since C: is not the parent of C:\\).
The parent() method returns an ancestor of the path:
>>> p.parent()
PureNTPath('c:\\python33\\bin')
>>> p.parent(2)
PureNTPath('c:\\python33')
>>> p.parent(3)
PureNTPath('c:\\')
The parents() method automates repeated invocations of parent(), until the anchor is reached:
>>> p = PureNTPath('c:/python33/bin/python.exe')
>>> for parent in p.parents(): parent
...
PureNTPath('c:\\python33\\bin')
PureNTPath('c:\\python33')
PureNTPath('c:\\')
Querying
is_relative() returns True if the path is relative (see definition above), False otherwise.
is_reserved() returns True if a Windows path is a reserved path such as CON or NUL. It always returns False for POSIX paths.
match() matches the path against a glob pattern:
>>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY')
True
normcase() returns a case-folded version of the path for NT paths:
>>> PurePosixPath('CAPS').normcase()
PurePosixPath('CAPS')
>>> PureNTPath('CAPS').normcase()
PureNTPath('caps')
Concrete paths API
In addition to the operations of the pure API, concrete paths provide additional methods which actually access the filesystem to query or mutate information.
Constructing
The classmethod cwd() creates a path object pointing to the current working directory in absolute form:
>>> Path.cwd()
PosixPath('/home/antoine/pathlib')
File metadata
The stat() method caches and returns the file's stat() result; restat() forces refreshing of the cache. lstat() is also provided, but doesn't have any caching behaviour:
>>> p.stat() posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
For ease of use, direct attribute access to the fields of the stat structure is provided over the path object itself:
>>> p.st_size 928 >>> p.st_mtime 1328287308.889562
Higher-level methods help examine the kind of the file:
>>> p.exists() True >>> p.is_file() True >>> p.is_dir() False >>> p.is_symlink() False
The file owner and group names (rather than numeric ids) are queried through matching properties:
>>> p = Path('/etc/shadow')
>>> p.owner
'root'
>>> p.group
'shadow'
Path resolution
The resolve() method makes a path absolute, resolving any symlink on the way. It is the only operation which will remove ".." path components.
Directory walking
Simple (non-recursive) directory access is done by iteration:
>>> p = Path('docs')
>>> for child in p: child
...
PosixPath('docs/conf.py')
PosixPath('docs/_templates')
PosixPath('docs/make.bat')
PosixPath('docs/index.rst')
PosixPath('docs/_build')
PosixPath('docs/_static')
PosixPath('docs/Makefile')
This allows simple filtering through list comprehensions:
>>> p = Path('.')
>>> [child for child in p if child.is_dir()]
[PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
Simple and recursive globbing is also provided:
>>> for child in p.glob('**/*.py'): child
...
PosixPath('test_pathlib.py')
PosixPath('setup.py')
PosixPath('pathlib.py')
PosixPath('docs/conf.py')
PosixPath('build/lib/pathlib.py')
File opening
The open() method provides a file opening API similar to the builtin open() method:
>>> p = Path('setup.py')
>>> with p.open() as f: f.readline()
...
'#!/usr/bin/env python3\n'
The raw_open() method, on the other hand, is similar to os.open:
>>> fd = p.raw_open(os.O_RDONLY) >>> os.read(fd, 15) b'#!/usr/bin/env '
Filesystem alteration
Several common filesystem operations are provided as methods: touch(), mkdir(), rename(), replace(), unlink(), rmdir(), chmod(), lchmod(), symlink_to(). More operations could be provided, for example some of the functionality of the shutil module.
Experimental openat() support
On compatible POSIX systems, the concrete PosixPath class can take advantage of *at() functions (openat() [8] and friends), and manages the bookkeeping of open file descriptors as necessary. Support is enabled by passing the use_openat argument to the constructor:
>>> p = Path(".", use_openat=True)
Then all paths constructed by navigating this path (either by iteration or indexing) will also use the openat() family of functions. The point of using these functions is to avoid race conditions whereby a given directory is silently replaced with another (often a symbolic link to a sensitive system location) between two accesses.
References
| [1] | (1, 2, 3) http://pypi.python.org/pypi/pathlib/ |
| [2] | https://github.com/jaraco/path.py |
| [3] | http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html |
| [4] | http://wiki.python.org/moin/AlternativePathClass |
| [5] | https://bitbucket.org/sluggo/unipath/overview |
| [6] | https://bitbucket.org/pitrou/pathlib/ |
| [7] | http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_11 |
| [8] | http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html |
Copyright
This document has been placed into the public domain.
