Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Doc/library/binascii.rst
Original file line number Diff line number Diff line change
Expand Up @@ -283,11 +283,16 @@ The :mod:`!binascii` module defines the following functions:

.. versionadded:: 3.15

.. function:: a2b_qp(data, header=False)
.. function:: a2b_qp(data, header=False, strip_ws=False)

Convert a block of quoted-printable data back to binary and return the binary
data. More than one line may be passed at a time. If the optional argument
*header* is present and true, underscores will be decoded as spaces.
If the optional argument *strip_ws* is true,
trailing whitespace is stripped from each line, as required by :rfc:`2045`.

.. versionchanged:: next
Added the *strip_ws* parameter.


.. function:: b2a_qp(data, quotetabs=False, istext=True, header=False)
Expand Down
8 changes: 8 additions & 0 deletions Doc/library/email.compat32-message.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,14 @@ Here are the methods of the :class:`Message` class:
defect property (:class:`~email.errors.InvalidBase64PaddingDefect` or
:class:`~email.errors.InvalidBase64CharactersDefect`, respectively).

.. note::

A ``quoted-printable`` payload is decoded without stripping
trailing whitespace, contrary to :rfc:`2045` but matching
common mail clients.
Use :func:`binascii.a2b_qp` with ``strip_ws=True``
(or ``email.quoprimime.decode``) for RFC-compliant decoding.

When *decode* is ``False`` (the default) the body is returned as a string
without decoding the :mailheader:`Content-Transfer-Encoding`. However,
for a :mailheader:`Content-Transfer-Encoding` of 8bit, an attempt is made
Expand Down
12 changes: 10 additions & 2 deletions Doc/library/quopri.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,19 @@ few nonprintable characters; the base64 encoding scheme available via the
:mod:`base64` module is more compact if there are many such characters, as when
sending a graphics file.

.. function:: decode(input, output, header=False)
.. function:: decode(input, output, header=False, strip_ws=False)

Decode the contents of the *input* file and write the resulting decoded binary
data to the *output* file. *input* and *output* must be :term:`binary file objects
<file object>`. If the optional argument *header* is present and true, underscore
will be decoded as space. This is used to decode "Q"-encoded headers as
described in :rfc:`1522`: "MIME (Multipurpose Internet Mail Extensions)
Part Two: Message Header Extensions for Non-ASCII Text".
If the optional argument *strip_ws* is true,
trailing whitespace is stripped from each line, as required by :rfc:`2045`.

.. versionchanged:: next
Added the *strip_ws* parameter.


.. function:: encode(input, output, quotetabs, header=False)
Expand All @@ -43,11 +48,14 @@ sending a graphics file.
as underscores as per :rfc:`1522`.


.. function:: decodestring(s, header=False)
.. function:: decodestring(s, header=False, strip_ws=False)

Like :func:`decode`, except that it accepts a source :class:`bytes` and
returns the corresponding decoded :class:`bytes`.

.. versionchanged:: next
Added the *strip_ws* parameter.


.. function:: encodestring(s, quotetabs=False, header=False)

Expand Down
10 changes: 10 additions & 0 deletions Doc/whatsnew/3.16.rst
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,16 @@ os
(Contributed by Maurycy Pawłowski-Wieroński in :gh:`149464`.)


quopri
------

* :func:`quopri.decode`, :func:`quopri.decodestring` and
:func:`binascii.a2b_qp` gained a *strip_ws* parameter. When true, trailing
whitespace is stripped from each line while decoding, as required by
:rfc:`2045` for a quoted-printable body.
(Contributed by Serhiy Storchaka in :gh:`62222`.)


re
--

Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_global_objects_fini_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Include/internal/pycore_global_strings.h
Original file line number Diff line number Diff line change
Expand Up @@ -826,6 +826,7 @@ struct _Py_global_strings {
STRUCT_FOR_ID(strict)
STRUCT_FOR_ID(strict_mode)
STRUCT_FOR_ID(string)
STRUCT_FOR_ID(strip_ws)
STRUCT_FOR_ID(sub_key)
STRUCT_FOR_ID(subcalls)
STRUCT_FOR_ID(symmetric_difference_update)
Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_runtime_init_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Include/internal/pycore_unicodeobject_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions Lib/email/quoprimime.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,14 @@ def body_encode(body, maxlinelen=76, eol=NL):

# BAW: I'm not sure if the intent was for the signature of this function to be
# the same as base64MIME.decode() or not...
def decode(encoded, eol=NL):
def decode(encoded, eol=NL, strip_ws=True):
"""Decode a quoted-printable string.

Lines are separated with eol, which defaults to \\n.

If strip_ws is true (the default), whitespace at the end of a line is
stripped, as required by RFC 2045 when decoding a quoted-printable body.
Pass strip_ws=False to keep it.
"""
if not encoded:
return encoded
Expand All @@ -242,7 +246,8 @@ def decode(encoded, eol=NL):
decoded = ''

for line in encoded.splitlines():
line = line.rstrip()
if strip_ws:
line = line.rstrip()
if not line:
decoded += eol
continue
Expand Down
34 changes: 21 additions & 13 deletions Lib/quopri.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,14 +109,16 @@ def encodestring(s, quotetabs=False, header=False):



def decode(input, output, header=False):
def decode(input, output, header=False, strip_ws=False):
"""Read 'input', apply quoted-printable decoding, and write to 'output'.
'input' and 'output' are binary file objects.
If 'header' is true, decode underscore as space (per RFC 1522)."""
If 'header' is true, decode underscore as space (per RFC 1522).
If 'strip_ws' is true, strip whitespace at the end of a line (per
RFC 2045)."""

if a2b_qp is not None:
data = input.read()
odata = a2b_qp(data, header=header)
odata = a2b_qp(data, header=header, strip_ws=strip_ws)
output.write(odata)
return

Expand All @@ -125,11 +127,19 @@ def decode(input, output, header=False):
i, n = 0, len(line)
if n > 0 and line[n-1:n] == b'\n':
partial = 0; n = n-1
# Strip trailing whitespace
while n > 0 and line[n-1:n] in b" \t\r":
n = n-1
# Separate off the line ending (keeping it to re-add after
# decoding) so that a trailing "=" -- possibly before the "\r" of
# a "\r\n" pair -- is recognized as a soft line break.
if n > 0 and line[n-1:n] == b'\r':
n = n-1; eol = b'\r\n'
else:
eol = b'\n'
else:
partial = 1
partial = 1; eol = b''
if strip_ws:
# Strip trailing whitespace (RFC 2045).
while n > 0 and line[n-1:n] in b" \t":
n = n-1
while i < n:
c = line[i:i+1]
if c == b'_' and header:
Expand All @@ -138,25 +148,23 @@ def decode(input, output, header=False):
new = new + c; i = i+1
elif i+1 == n and not partial:
partial = 1; break
elif i+1 < n and line[i+1:i+2] == ESCAPE:
new = new + ESCAPE; i = i+2
elif i+2 < n and ishex(line[i+1:i+2]) and ishex(line[i+2:i+3]):
new = new + bytes((unhex(line[i+1:i+3]),)); i = i+3
else: # Bad escape sequence -- leave it in
new = new + c; i = i+1
if not partial:
output.write(new + b'\n')
output.write(new + eol)
new = b''
if new:
output.write(new)

def decodestring(s, header=False):
def decodestring(s, header=False, strip_ws=False):
if a2b_qp is not None:
return a2b_qp(s, header=header)
return a2b_qp(s, header=header, strip_ws=strip_ws)
from io import BytesIO
infp = BytesIO(s)
outfp = BytesIO()
decode(infp, outfp, header=header)
decode(infp, outfp, header=header, strip_ws=strip_ws)
return outfp.getvalue()


Expand Down
19 changes: 19 additions & 0 deletions Lib/test/test_binascii.py
Original file line number Diff line number Diff line change
Expand Up @@ -1418,6 +1418,10 @@ def test_qp(self):
self.assertEqual(a2b_qp(type2test(b"=")), b"")
self.assertEqual(a2b_qp(type2test(b"= ")), b"= ")
self.assertEqual(a2b_qp(type2test(b"==")), b"=")
# A stray "=" is left in place and the next character is rescanned,
# so "=41" after it is decoded as a fresh escape (gh-62222).
self.assertEqual(a2b_qp(type2test(b"==41")), b"=A")
self.assertEqual(a2b_qp(type2test(b"==g")), b"==g")
self.assertEqual(a2b_qp(type2test(b"=\nAB")), b"AB")
self.assertEqual(a2b_qp(type2test(b"=\r\nAB")), b"AB")
self.assertEqual(a2b_qp(type2test(b"=\rAB")), b"") # ?
Expand All @@ -1431,6 +1435,21 @@ def test_qp(self):
self.assertEqual(a2b_qp(type2test(b'_')), b'_')
self.assertEqual(a2b_qp(type2test(b'_'), header=True), b' ')

# strip_ws strips whitespace at the end of a line (RFC 2045), but
# leaves whitespace that was encoded (=20/=09) untouched. By default
# trailing whitespace is kept.
self.assertEqual(a2b_qp(type2test(b"foo \n")), b"foo \n")
self.assertEqual(a2b_qp(type2test(b"foo \n"), strip_ws=True), b"foo\n")
self.assertEqual(a2b_qp(type2test(b"foo "), strip_ws=True), b"foo")
self.assertEqual(a2b_qp(type2test(b"a b \nc\n"), strip_ws=True), b"a b\nc\n")
self.assertEqual(a2b_qp(type2test(b"a=20 \n"), strip_ws=True), b"a \n")
self.assertEqual(a2b_qp(type2test(b"= \n"), strip_ws=True), b"")
self.assertEqual(a2b_qp(type2test(b"foo =\n"), strip_ws=True), b"foo ")
self.assertEqual(a2b_qp(type2test(b"foo \r\n"), strip_ws=True), b"foo\r\n")
# A bare CR is not a line separator (RFC 2045: CR occurs only in CRLF),
# so whitespace before it is kept.
self.assertEqual(a2b_qp(type2test(b"foo \rbar"), strip_ws=True), b"foo \rbar")

self.assertRaises(TypeError, b2a_qp, foo="bar")
self.assertEqual(a2b_qp(type2test(b"=00\r\n=00")), b"\x00\r\n\x00")
self.assertEqual(b2a_qp(type2test(b"\xff\r\n\xff\n\xff")),
Expand Down
8 changes: 8 additions & 0 deletions Lib/test/test_email/test_email.py
Original file line number Diff line number Diff line change
Expand Up @@ -4813,6 +4813,14 @@ def test_decode_one_line_trailing_spaces(self):
def test_decode_two_lines_trailing_spaces(self):
self._test_decode('hello \r\nworld \r\n', 'hello\nworld\n')

def test_decode_keep_trailing_spaces(self):
# strip_ws=False keeps trailing whitespace (the default strips it).
self.assertEqual(quoprimime.decode('hello \r\n', strip_ws=False),
'hello \n')
self.assertEqual(quoprimime.decode('hello \r\nworld \r\n',
strip_ws=False),
'hello \nworld \n')

def test_decode_quoted_word(self):
self._test_decode('=22quoted=20words=22', '"quoted words"')

Expand Down
30 changes: 28 additions & 2 deletions Lib/test/test_quopri.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,10 @@ def test_decodestring(self):
@withpythonimplementation
def test_decodestring_double_equals(self):
# Issue 21511 - Ensure that byte string is compared to byte string
# instead of int byte value
decoded_value, encoded_value = (b"123=four", b"123==four")
# instead of int byte value.
# A stray "=" not starting a valid escape is left in place, so the
# following "=four" is decoded as a fresh (invalid) escape too.
decoded_value, encoded_value = (b"123==four", b"123==four")
self.assertEqual(quopri.decodestring(encoded_value), decoded_value)

@withpythonimplementation
Expand Down Expand Up @@ -171,6 +173,30 @@ def test_embedded_ws(self):
self.assertEqual(quopri.encodestring(p, quotetabs=True), e)
self.assertEqual(quopri.decodestring(e), p)

@withpythonimplementation
def test_decode_strip_ws(self):
# By default trailing whitespace is kept; with strip_ws it is
# removed at the end of a line (RFC 2045), while encoded whitespace
# (=20/=09) is preserved.
self.assertEqual(quopri.decodestring(b"foo \n"), b"foo \n")
self.assertEqual(quopri.decodestring(b"foo \n", strip_ws=True), b"foo\n")
self.assertEqual(quopri.decodestring(b"a b \nc\n", strip_ws=True), b"a b\nc\n")
self.assertEqual(quopri.decodestring(b"a=20 \n", strip_ws=True), b"a \n")
self.assertEqual(quopri.decodestring(b"= \n", strip_ws=True), b"")
self.assertEqual(quopri.decodestring(b"foo =\n", strip_ws=True), b"foo ")
self.assertEqual(quopri.decodestring(b"foo \r\n", strip_ws=True), b"foo\r\n")
# A bare CR is not a line separator, so whitespace before it is kept.
self.assertEqual(quopri.decodestring(b"foo \rbar", strip_ws=True), b"foo \rbar")

@withpythonimplementation
def test_decode_soft_line_break(self):
# A "=" at the end of a line is a soft line break, for both "\n" and
# "\r\n" line endings; other line endings are preserved as-is.
self.assertEqual(quopri.decodestring(b"=\nAB"), b"AB")
self.assertEqual(quopri.decodestring(b"=\r\nAB"), b"AB")
self.assertEqual(quopri.decodestring(b"foo=\r\nbar"), b"foobar")
self.assertEqual(quopri.decodestring(b"foo\r\nbar"), b"foo\r\nbar")

@withpythonimplementation
def test_encode_header(self):
for p, e in self.HSTRINGS:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:func:`binascii.a2b_qp` and :mod:`quopri` now leave a stray ``=`` (one not
followed by two hexadecimal digits or a soft line break) in place and rescan
the following character, instead of consuming and discarding it; for example
``b'==41'`` now decodes to ``b'=A'``, matching the :mod:`email` package and
other quoted-printable decoders. A *strip_ws* parameter is also added to
:func:`binascii.a2b_qp`, :func:`quopri.decode`, :func:`quopri.decodestring`
and ``email.quoprimime.decode`` to strip trailing whitespace from each line
while decoding, as required by :rfc:`2045`; it is false by default everywhere
except ``email.quoprimime.decode``, preserving every existing behavior.
Loading
Loading