Skip to content

Commit b54566b

Browse files
committed
Added Cutom-search CLI script (wasmerio#460)
1 parent 0b09cde commit b54566b

File tree

5 files changed

+148
-0
lines changed

5 files changed

+148
-0
lines changed

Custom-search CLI/output.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#,Title,Link

Custom-search CLI/readme.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Custom-search CLI
2+
A simple Python script that uses the **Google Custom Search API** to fetch search results and export them into a CSV file.
3+
4+
5+
## Requirements
6+
- Python 3.8+
7+
- A Google API key
8+
- A Google Custom Search Engine (CX) ID
9+
- Install dependencies:
10+
```bash
11+
pip install requests
12+
pip install beautifulsoup4
13+
pip install python-csv
14+
pip install argparse
15+
```
16+
17+
## Setup
18+
1. Get a Google API key from [Google Cloud Console](https://console.cloud.google.com/)
19+
2. Create a Custom Search Engine (CX) at [Google CSE](https://cse.google.com/cse/all)
20+
3. Run the script with your API key to create setting.json:
21+
22+
python main.py -sq [SEARCH_QUERY] --add_api_key [YOUR_API_KEY]
23+
24+
## Usage
25+
Search with query:
26+
```bash
27+
python scraper.py -sq "github"
28+
```
29+
Fetch multiple pages (10 results per page):
30+
```bash
31+
python scraper.py -sq "github" --pages 3
32+
```
33+
## Output
34+
- Results are saved in output.csv in the following columns:
35+
36+
\# , Title , Link
37+
38+
> [!NOTE] </br>
39+
> Free quota: 100 queries/day (10 results per query). </br>
40+
> If `setting.json` is missing or doesn’t have an API key, use `--add_api_key`.
41+
---

Custom-search CLI/scraper.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
import requests
2+
import json
3+
import os
4+
import csv
5+
import argparse
6+
from typing import List, Dict, Tuple, Any
7+
8+
SETTING_ROUTE = 'setting.json'
9+
DEFAULT_CX = 'b0264518c3d104eda'
10+
11+
12+
def load_settings(api_key: str | None = None) -> Dict[str, str]:
13+
"""
14+
Load API settings from setting.json, or create it if missing.
15+
"""
16+
if os.path.exists(SETTING_ROUTE):
17+
with open(SETTING_ROUTE, 'r', encoding="utf-8") as f:
18+
settings = json.load(f)
19+
20+
if not settings.get("API_KEY"):
21+
if api_key:
22+
settings["API_KEY"] = api_key
23+
with open(SETTING_ROUTE, 'w', encoding="utf-8") as f:
24+
json.dump(settings, f, indent=4)
25+
else:
26+
raise ValueError("API_KEY is missing in setting.json. Use --add_api_key to add one.")
27+
else:
28+
if not api_key:
29+
raise FileNotFoundError("No setting.json found. Please run with --add_api_key to create one.")
30+
settings = {"API_KEY": api_key, "CX": DEFAULT_CX}
31+
with open(SETTING_ROUTE, 'w', encoding="utf-8") as f:
32+
json.dump(settings, f, indent=4)
33+
34+
return settings
35+
36+
37+
def scrape(search_query: str, api_key: str, cx: str, pages: int = 1) -> Tuple[List[Dict[str, Any]], float]:
38+
"""
39+
Perform a Google Custom Search and return results.
40+
"""
41+
results = []
42+
search_time = 0.0
43+
44+
for page in range(pages):
45+
start = page * 10 + 1
46+
url = (
47+
f"https://www.googleapis.com/customsearch/v1"
48+
f"?key={api_key}&q={search_query}&cx={cx}&start={start}"
49+
)
50+
51+
response = requests.get(url)
52+
if response.status_code != 200:
53+
raise RuntimeError(f"API request failed: {response.status_code} {response.text}")
54+
55+
data = response.json()
56+
57+
if "items" not in data:
58+
print("No results found or error:", data)
59+
break
60+
61+
results.extend(data["items"])
62+
search_time += float(data['searchInformation']['searchTime'])
63+
64+
return results, search_time
65+
66+
67+
def export_to_csv(results: List[Dict[str, Any]], filename: str = "output.csv") -> None:
68+
"""
69+
Export search results to a CSV file.
70+
"""
71+
rows = [[i + 1, item.get("title", ""), item.get("link", "")] for i, item in enumerate(results)]
72+
73+
with open(filename, "w", encoding="utf-8", newline="") as f:
74+
writer = csv.writer(f)
75+
writer.writerow(["#", "Title", "Link"])
76+
writer.writerows(rows)
77+
78+
print(f"Exported {len(results)} results to {filename}")
79+
80+
81+
def main():
82+
parser = argparse.ArgumentParser(description="Google Custom Search scraper")
83+
parser.add_argument("-sq", "--search_query", required=True, help="Search query to search for")
84+
parser.add_argument("--add_api_key", type=str, help="Your Google API key")
85+
parser.add_argument("--pages", type=int, default=1, help="Number of pages of results to fetch")
86+
args = parser.parse_args()
87+
88+
settings = load_settings(args.add_api_key)
89+
api_key = settings["API_KEY"]
90+
cx = settings["CX"]
91+
92+
print(f"Using API key: {api_key}")
93+
94+
results, elapsed_time = scrape(args.search_query, api_key, cx, args.pages)
95+
96+
export_to_csv(results)
97+
print(f"Completed in {elapsed_time:.2f} seconds.")
98+
99+
100+
if __name__ == "__main__":
101+
main()

Custom-search CLI/setting.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"API_KEY": "",
3+
"CX": ""
4+
}

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ More information on contributing and the general code of conduct for discussion
5555
| CSV to Excel | [CSV to Excel](https://github.com/DhanushNehru/Python-Scripts/tree/main/CSV%20to%20Excel) | A Python script to convert a CSV to an Excel file. |
5656
| CSV_TO_NDJSON | [CSV to Excel](https://github.com/DhanushNehru/Python-Scripts/tree/main/CSV_TO_NDJSON) | A Python script to convert a CSV to an NDJSON files file. |
5757
| Currency Script | [Currency Script](https://github.com/DhanushNehru/Python-Scripts/tree/main/Currency%20Script) | A Python script to convert the currency of one country to that of another. |
58+
| Custom-search CLI | [Custom-search CLI](https://github.com/DhanushNehru/Python-Scripts/tree/main/Custom-search%20CLI) | Python script to search a query through internet and save the results in a .csv file. |
5859
| Digital Clock | [Digital Clock](https://github.com/DhanushNehru/Python-Scripts/tree/main/Digital%20Clock) | A Python script to preview a digital clock in the terminal. |
5960
| Display Popup Window | [Display Popup Window](https://github.com/DhanushNehru/Python-Scripts/tree/main/Display%20Popup%20Window) | A Python script to preview a GUI interface to the user. |
6061
| Distance Calculator | [Distance Calculator](https://github.com/Mathdallas-code/Python-Scripts/tree/main/Distance%20Calculator) | A Python script to calculate the distance between two points.

0 commit comments

Comments
 (0)