crawl_google_results.py update, modularize, documentation and doctest #4847

appledora · 2021-10-01T14:35:52Z

Added large changes to web_programming/crawl_google_results.py with documentations and doctests. Modularized the script and made it more customizable. Fixed formatting using black and flake8.

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Documentation change?

Checklist:

…ocumentations and doctests

web_programming/crawl_google_results.py

Co-authored-by: Christian Clauss <cclauss@me.com>

cclauss · 2021-10-02T07:23:21Z

Please undo the changes to requirements.txt.

appledora · 2021-10-02T13:15:30Z

Please undo the changes to requirements.txt.

Done.

cclauss · 2021-10-02T13:25:06Z

I think that this is a different algorithm than the original so let's keep both. Please rename this file to get_google_search_results.py with an algorithmic function get_google_search_results() which returns a list or tuple of search results. Put the writing of those results to a file in a separate function. Crawling is traversing each search result to go deeper which this algorithm does not do. Please do not print in the get_google_search_results() function.

appledora · 2021-10-02T14:49:45Z

I think that this is a different algorithm than the original so let's keep both. Please rename this file to get_google_search_results.py with an algorithmic function get_google_search_results() which returns a list or tuple of search results. Put the writing of those results to a file in a separate function. Crawling is traversing each search result to go deeper which this algorithm does not do. Please do not print in the get_google_search_results() function.

On it!

…as suggested

ghost

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

web_programming/get_google_search_results.py

…as suggested

ghost

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

web_programming/get_google_search_results.py

…as suggested

appledora · 2021-10-06T19:19:30Z

@cclauss , any comments on this one?

appledora · 2021-10-17T18:23:11Z

@poyea I think I fixed the technical and conventional errors on this one. Could you take a look?

web_programming/get_google_search_results.py

Co-authored-by: John Law <johnlaw.po@gmail.com>

web_programming/get_google_search_results.py

poyea · 2021-10-19T09:44:34Z

So if we add

    import doctest

    doctest.testmod()

Then every single time we run the tests, it fires requests to google. And that those .txt files will be generated.

poyea · 2021-10-19T09:48:28Z

Please see https://github.com/TheAlgorithms/Python/blob/master/web_programming/instagram_crawler.py for what's current handling... Ideally we would have to mock the requests, but in the case we may further factor out the processing functions, and let's not test the request part.

changing constant name Co-authored-by: John Law <johnlaw.po@gmail.com>

appledora · 2021-10-19T12:47:01Z

@poyea, for the sake of clarification are you suggesting I follow the coding pattern in the https://github.com/TheAlgorithms/Python/blob/master/web_programming/instagram_crawler.py ?

poyea · 2021-10-19T13:34:32Z

@poyea, for the sake of clarification are you suggesting I follow the coding pattern in the https://github.com/TheAlgorithms/Python/blob/master/web_programming/instagram_crawler.py ?

Not necessarily. Now it's just a matter of how we write our test suite

appledora · 2021-10-20T17:33:12Z

@poyea, forgive me, but I believe I am still a little confused about the testing requirements. :|
What I "think" you ask for is, to only test the write_google_search_results() method, without actually calling the parse_results() method which sends out the actual REQUEST to google. And hence, I could write a mock test method like test_instagram_user() as given in this script . And by utilizing the doctest library, I will be calling this mock method?
Am I going in the right direction here?

cclauss · 2021-10-20T20:00:15Z

web_programming/get_google_search_results.py:30: error: Name "headers" is not defined

poyea · 2021-10-26T16:37:43Z

@poyea, forgive me, but I believe I am still a little confused about the testing requirements. :| What I "think" you ask for is, to only test the write_google_search_results() method, without actually calling the parse_results() method which sends out the actual REQUEST to google. And hence, I could write a mock test method like test_instagram_user() as given in this script . And by utilizing the doctest library, I will be calling this mock method? Am I going in the right direction here?

@appledora You may try to run the test locally

stale · 2022-04-28T16:02:24Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tianyizheng02 · 2023-08-15T21:32:34Z

web_programming/get_google_search_results.py

+}
+
+
+def parse_results(query: str = "") -> list:


Suggested change

def parse_results(query: str = "") -> list:

def parse_results(query: str = "") -> list[dict[str, str | None]]:

Annotate the list elements' type

tianyizheng02 · 2023-08-15T21:37:35Z

web_programming/get_google_search_results.py

+    next_page = []
+
+    for item in table_data:
+        next_page.append(item["href"])


Suggested change

next_page = []

for item in table_data:

next_page.append(item["href"])

next_page = (item["href"] for item in table_data)

tianyizheng02 · 2023-08-15T21:37:55Z

web_programming/get_google_search_results.py

+        new_link = BASE_URL + next_page_link
+        try:
+            response = requests.get(new_link, headers=HEADERS)
+


Suggested change

tianyizheng02 · 2023-08-15T21:39:38Z

web_programming/get_google_search_results.py

+    if filename == "":
+        filename = query + "-query.txt"


Suggested change

if filename == "":

filename = query + "-query.txt"

if not filename:

filename = query + "-query.txt"

tianyizheng02 · 2023-08-15T21:39:53Z

web_programming/get_google_search_results.py

+    if filename == "":
+        filename = query + "-query.txt"
+    elif not filename.endswith(".txt"):
+        filename = filename + ".txt"


Suggested change

filename = filename + ".txt"

filename += ".txt"

tianyizheng02 · 2023-08-15T22:14:01Z

web_programming/get_google_search_results.py

+    if query == "":
+        query = "potato"


Suggested change

if query == "":

query = "potato"

if not query:

query = "potato"

tianyizheng02 · 2023-08-15T22:15:45Z

web_programming/get_google_search_results.py

+    >>> write_google_search_results("python", "test") != None
+    True
+    >>> write_google_search_results("", "tet.html") != None
+    True
+    >>> write_google_search_results("python", "") != None
+    True
+    >>> write_google_search_results("", "") != None
+    True
+    >>> "test" in write_google_search_results("python", "test")
+    True
+    >>> "test1" in write_google_search_results("", "test1")
+    True
+    >>> "potato" in write_google_search_results("", "")
+    True


Make the doctests more strict by testing the outputs for equality:

Suggested change

>>> write_google_search_results("python", "test") != None

True

>>> write_google_search_results("", "tet.html") != None

True

>>> write_google_search_results("python", "") != None

True

>>> write_google_search_results("", "") != None

True

>>> "test" in write_google_search_results("python", "test")

True

>>> "test1" in write_google_search_results("", "test1")

True

>>> "potato" in write_google_search_results("", "")

True

>>> write_google_search_results("python", "test") == "test.txt"

True

>>> write_google_search_results("", "tet.html") == "tet.html.txt"

True

>>> write_google_search_results("python", "") == "python.txt"

True

>>> write_google_search_results("", "test1") == "test1.txt"

True

>>> write_google_search_results("", "") == "potato.txt"

True

tianyizheng02

Closing this PR because the code no longer works. I tried running this file locally and it wasn't able to get any Google results. I believe this is because the CSS identifiers have changed, so this file is no longer able to find the results in the HTTP response.

appledora added 2 commits October 1, 2021 20:17

Added large changes to web_programming/crawl_google_results.py with d…

be1c08a

…ocumentations and doctests

Added large changes to web_programming/crawl_google_results.py with d…

dec909c

…ocumentations and doctests

appledora requested a review from cclauss as a code owner October 1, 2021 14:35

ghost added awaiting reviews This PR is ready to be reviewed enhancement This PR modified some existing files labels Oct 1, 2021

appledora changed the title ~~Crawl google results~~ crawl_google_results.py update, modularize, documentation and doctest Oct 1, 2021

cclauss reviewed Oct 1, 2021

View reviewed changes

web_programming/crawl_google_results.py Outdated Show resolved Hide resolved

cclauss reviewed Oct 1, 2021

View reviewed changes

web_programming/crawl_google_results.py Outdated Show resolved Hide resolved

ghost added the tests are failing Do not merge until tests pass label Oct 1, 2021

appledora and others added 2 commits October 1, 2021 20:51

Update web_programming/crawl_google_results.py

6b31ad7

Co-authored-by: Christian Clauss <cclauss@me.com>

Update web_programming/crawl_google_results.py

4c4dc95

Co-authored-by: Christian Clauss <cclauss@me.com>

appledora requested a review from cclauss October 1, 2021 17:37

fixed typo in test

e1cc719

undo the changes to requirements.txt

8c1f319

Fixed doctest error

af468bf

Fixed doctest error, removed print() from method, created new script …

6b774c2

…as suggested

ghost added the require type hints https://docs.python.org/3/library/typing.html label Oct 2, 2021

ghost reviewed Oct 2, 2021

View reviewed changes

web_programming/get_google_search_results.py Outdated Show resolved Hide resolved

Fixed doctest error, removed print() from method, created new script …

5c9a337

…as suggested

ghost reviewed Oct 2, 2021

View reviewed changes

web_programming/get_google_search_results.py Outdated Show resolved Hide resolved

Fixed doctest error, removed print() from method, created new script …

2c8e276

…as suggested

ghost removed the require type hints https://docs.python.org/3/library/typing.html label Oct 2, 2021

Revert changes to crawl_google_results.py

eb9c3f2

ghost removed the tests are failing Do not merge until tests pass label Oct 2, 2021

Update get_google_search_results.py

331ebcc

ghost added the require tests Tests [doctest/unittest/pytest] are required label Oct 2, 2021

removed session related libraries because problem with mypy

6da76d2

appledora requested a review from cclauss October 5, 2021 14:00

ghost mentioned this pull request Oct 5, 2021

Crypto rsa #5047

Closed

14 tasks

poyea reviewed Oct 17, 2021

View reviewed changes

web_programming/get_google_search_results.py Outdated Show resolved Hide resolved

Update web_programming/get_google_search_results.py

ed77a9f

Co-authored-by: John Law <johnlaw.po@gmail.com>

ghost removed the tests are failing Do not merge until tests pass label Oct 18, 2021

appledora requested a review from poyea October 18, 2021 04:21

poyea reviewed Oct 19, 2021

View reviewed changes

web_programming/get_google_search_results.py Outdated Show resolved Hide resolved

poyea added the hacktoberfest-accepted Accepted to be counted towards Hacktoberfest label Oct 19, 2021

Update web_programming/get_google_search_results.py

783b91a

changing constant name Co-authored-by: John Law <johnlaw.po@gmail.com>

ghost added the tests are failing Do not merge until tests pass label Oct 19, 2021

Update get_google_search_results.py

7c9e276

ghost removed the tests are failing Do not merge until tests pass label Oct 21, 2021

stale bot added the stale Used to mark an issue or pull request stale. label Apr 28, 2022

stale bot removed stale Used to mark an issue or pull request stale. labels Mar 18, 2023

This comment was marked as off-topic.

Sign in to view

tianyizheng02 reviewed Aug 15, 2023

View reviewed changes

tianyizheng02 closed this Aug 15, 2023

	def parse_results(query: str = "") -> list:
	def parse_results(query: str = "") -> list[dict[str, str \| None]]:

Uh oh!

crawl_google_results.py update, modularize, documentation and doctest #4847

crawl_google_results.py update, modularize, documentation and doctest #4847

Uh oh!

Conversation

appledora commented Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist:

Uh oh!

Uh oh!

Uh oh!

cclauss commented Oct 2, 2021

Uh oh!

appledora commented Oct 2, 2021

Uh oh!

cclauss commented Oct 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

appledora commented Oct 2, 2021

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

Uh oh!

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

Uh oh!

Uh oh!

appledora commented Oct 6, 2021

Uh oh!

appledora commented Oct 17, 2021

Uh oh!

Uh oh!

Uh oh!

poyea commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poyea commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

appledora commented Oct 19, 2021

Uh oh!

poyea commented Oct 19, 2021

Uh oh!

appledora commented Oct 20, 2021 • edited by cclauss Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cclauss commented Oct 20, 2021

Uh oh!

poyea commented Oct 26, 2021

Uh oh!

stale bot commented Apr 28, 2022

Uh oh!

This comment was marked as off-topic.

tianyizheng02 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 Aug 15, 2023

appledora commented Oct 1, 2021 •

edited

Loading

cclauss commented Oct 2, 2021 •

edited

Loading

poyea commented Oct 19, 2021 •

edited

Loading

poyea commented Oct 19, 2021 •

edited

Loading

appledora commented Oct 20, 2021 •

edited by cclauss

Loading