Skip to content

Commit c70a6ae

Browse files
committed
#17403: urllib.parse.robotparser normalizes the urls before adding to ruleline.
This helps in handling certain types invalid urls in a conservative manner.
1 parent eb4c9c7 commit c70a6ae

File tree

3 files changed

+17
-0
lines changed

3 files changed

+17
-0
lines changed

Lib/test/test_robotparser.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,18 @@ def RobotTest(index, robots_txt, good_urls, bad_urls,
234234

235235
RobotTest(15, doc, good, bad)
236236

237+
# 16. Empty query (issue #17403). Normalizing the url first.
238+
doc = """
239+
User-agent: *
240+
Allow: /some/path?
241+
Disallow: /another/path?
242+
"""
243+
244+
good = ['/some/path?']
245+
bad = ['/another/path?']
246+
247+
RobotTest(16, doc, good, bad)
248+
237249

238250
class NetworkTestCase(unittest.TestCase):
239251

Lib/urllib/robotparser.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@ def __init__(self, path, allowance):
157157
if path == '' and not allowance:
158158
# an empty value means allow all
159159
allowance = True
160+
path = urllib.parse.urlunparse(urllib.parse.urlparse(path))
160161
self.path = urllib.parse.quote(path)
161162
self.allowance = allowance
162163

Misc/NEWS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ Core and Builtins
2424
Library
2525
-------
2626

27+
- Issue #17403: urllib.parse.robotparser normalizes the urls before adding to
28+
ruleline. This helps in handling certain types invalid urls in a conservative
29+
manner.
30+
2731
- Issue #18025: Fixed a segfault in io.BufferedIOBase.readinto() when raw
2832
stream's read() returns more bytes than requested.
2933

0 commit comments

Comments
 (0)