Skip to content

Commit b3128d3

Browse files
dacharyccbullinger
andauthored
(DOCSP-53139): Python code example comparison + sample data utilities (#14459)
* Create an implementation plan for Python comparison lib * Scaffold comparison lib, existing tests pass * Build out missing functionality, some tests fail * Fix JSONL splitting bug * Fix Python 2/3 normalization bugs * Fix date normalization in parser * Fix remaining test failure * Properly handle date time parsing * Add support for PyMongo Operations Objects * Handle unordered arrays, PyMongo object ellipsis, search index models * Handle positional args in Update operation * Handle datetime comparison in structured text comparisons * Fix failing test * Add Python sample data utility * Remove unneeded implementation plan * Add sample data in CI workflow * Use new comparison APIs for agg pipeline examples, fix missing newline in output * Update README for new utils * Add missed ComparisonOptions in preferred API, fix missed ellipsis requirements * Add comprehensive code docs for new utilities * Apply suggestions from code review Co-authored-by: cory <115956901+cbullinger@users.noreply.github.com> * Cleanup based on review feedback * Switch time series quick start to use comparison lib * Add a Python formatting tool, format files and add to snip script --------- Co-authored-by: cory <115956901+cbullinger@users.noreply.github.com>
1 parent dcbe215 commit b3128d3

File tree

62 files changed

+10195
-182
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+10195
-182
lines changed

.github/workflows/pymongo-driver-examples-check-formatting.yml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,12 @@ jobs:
1717
python-version: '3.13'
1818
- name: Install dependencies
1919
run: |
20-
pip install pylint
21-
- name: Check code example formatting
20+
pip install pylint black
21+
- name: Check code example formatting with Black
2222
run: |
2323
cd code-example-tests/python/pymongo
24-
pylint ./examples/
24+
black --check ./examples/
25+
- name: Lint code examples with Pylint
26+
run: |
27+
cd code-example-tests/python/pymongo
28+
pylint ./examples/

.github/workflows/pymongo-driver-examples-test-in-docker.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,14 @@ jobs:
1818
- name: Set up a local deployment using Atlas CLI
1919
run: |
2020
atlas deployments setup myLocalRs1 --type local --port 27017 --force
21+
- name: Install MongoDB Database Tools to load sample data
22+
run: |
23+
curl https://fastdl.mongodb.org/tools/db/mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb --output mdb-db-tools.deb
24+
sudo apt install ./mdb-db-tools.deb
25+
- name: Download sample data
26+
run: curl https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive
27+
- name: Add sample data to database
28+
run: mongorestore --archive=sampledata.archive --port=27017
2129
- name: Setup Python
2230
uses: actions/setup-python@v5
2331
with:

code-example-tests/python/pymongo/README.md

Lines changed: 230 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -127,18 +127,18 @@ to store the output alongside the example. For example:
127127
- `aggregation/pipelines/tutorial.py`
128128
- `aggregation/pipelines/tutorial-output.sh`
129129

130-
### Check formatting with Pylint
130+
### Check formatting with Black
131+
132+
This project uses [Black](https://github.com/psf/black) for Python code formatting.
131133

132134
For consistency, code example formatting is enforced automatically by a workflow that
133-
runs Pylint on every change in the `/examples` directory. You can also check formatting
135+
runs Black on every change in the `/examples` directory. You can also fix formatting
134136
locally by running Pylint from the command line.
135137

136138
```
137-
pylint ./examples/
139+
black ./examples/
138140
```
139141

140-
Fix any errors.
141-
142142
## To add a test for a new code example
143143

144144
To add a test for a new code example:
@@ -255,6 +255,40 @@ If you copied `test_example_stub.py`, make sure to do the following updates:
255255
the example code uses. This ensures the right database is being
256256
dropped between tests.
257257

258+
#### Writing tests that use sample data
259+
260+
If your code examples require MongoDB sample data, import the sample data utility:
261+
262+
```python
263+
from utils.sample_data import requires_sample_data
264+
```
265+
266+
Use the `@requires_sample_data()` decorator to require one or more databases or
267+
collections for test execution. Tests automatically skip when required sample
268+
databases are not available.
269+
270+
```python
271+
class TestMovieQueries(unittest.TestCase):
272+
273+
@requires_sample_data("sample_mflix")
274+
def test_find_movies(self):
275+
# This test will be skipped if sample_mflix database is not available
276+
# Your test implementation here
277+
pass
278+
279+
@requires_sample_data("sample_mflix", collections=["movies", "theaters"])
280+
def test_specific_collections(self):
281+
# This test requires specific collections to be present
282+
# Your test implementation here
283+
pass
284+
285+
@requires_sample_data(["sample_mflix", "sample_restaurants"])
286+
def test_multiple_databases(self):
287+
# This test requires multiple sample databases
288+
# Your test implementation here
289+
pass
290+
```
291+
258292
### Define logic to verify the output
259293

260294
You can verify the output in a few different ways:
@@ -290,11 +324,10 @@ filename matches the example - i.e. `tutorial-output.sh`. Then, read the
290324
contents of the file in the test and verify that the output matches what the
291325
test returns.
292326

293-
First, import the helper function to read the output from file:
327+
First, import the helper function from the comparison library:
294328

295329
```py
296-
# Import the helper function to read the file and parse its output
297-
import utils.test_helper as test_helper
330+
from utils.comparison.assert_helpers import assert_expected_file_matches_output
298331
```
299332

300333
Then, validate the actual output against the output we expect based on the
@@ -304,11 +337,137 @@ file:
304337
# Run the example
305338
actual_output = example_stub.example(TestExampleStub.CONNECTION_STRING)
306339

307-
# Read the content of the expected output
340+
# Use the comparison library to validate that the output matches
308341
output_filepath = 'examples/aggregation/pipelines/tutorial.sh'
309-
expected_output = test_helper.get_expected_output(output_filepath)
342+
assert_expected_file_matches_output(self, output_filepath, actual_output)
343+
```
344+
345+
##### Use options to specify comparison behaviors
346+
347+
First, import comparison options from the comparison library:
348+
349+
```py
350+
from utils.comparison.comparison import ComparisonOptions
351+
```
352+
353+
Choose the appropriate options based on your output characteristics:
354+
355+
##### Verify unordered output (default behavior)
356+
357+
For output that can be in any order (most common case):
358+
359+
```py
360+
# Pass the `comparisonType` option explicitly:
361+
options = ComparisonOptions(comparison_type="unordered")
362+
assert_expected_file_matches_output(self, output_filepath, actual_output, options)
363+
364+
# Omit the options object (unordered comparison is used by default)
365+
assert_expected_file_matches_output(self, output_filepath, actual_output)
366+
```
367+
368+
##### Verify ordered output
369+
370+
For output that must be in a specific order (e.g., when using sort operations):
371+
372+
```py
373+
options = ComparisonOptions(comparison_type="ordered")
374+
assert_expected_file_matches_output(self, output_filepath, actual_output, options)
375+
```
376+
377+
##### Handle variable field values
378+
379+
When your output contains fields that will have different values between test runs
380+
(such as ObjectIds, timestamps, UUIDs, or other auto-generated values), ignore
381+
specific fields during comparison:
382+
383+
```py
384+
options = ComparisonOptions(ignore_field_values={"_id"})
385+
assert_expected_file_matches_output(self, output_filepath, actual_output, options)
386+
```
387+
388+
This ensures the comparison only validates that the field names are present,
389+
without checking if the values match exactly. This is particularly useful for:
390+
391+
- **Database IDs**: `_id`, `userId`, `documentId`
392+
- **Timestamps**: `createdAt`, `updatedAt`, `timestamp`
393+
- **UUIDs and tokens**: `uuid`, `sessionId`, `apiKey`
394+
- **Auto-generated values**: Any field with dynamic content
395+
396+
##### Handle flexible content in output files
397+
398+
For output files that truncate the actual output to show only what's relevant
399+
to our readers, use ellipsis patterns (`...`) in your output files to enable
400+
flexible content matching. Our tooling automatically detects and handles these
401+
patterns.
402+
403+
###### Shorten string values
404+
405+
You can use an ellipsis at the end of a string value to shorten it in the
406+
example output. This will match any number of characters in the actual return
407+
after the `...`.
408+
409+
In the expected output file, add an ellipsis to the end of a string value:
410+
411+
```txt
412+
{
413+
plot: 'A young man is accidentally sent 30 years into the past...',
414+
}
415+
```
416+
417+
This matches the actual output of:
418+
419+
```txt
420+
{
421+
plot: 'A young man is accidentally sent 30 years into the past in a time-traveling DeLorean invented by his close friend, the maverick scientist Doc Brown.',
422+
}
423+
```
424+
425+
###### Omit unimportant values for keys
426+
427+
If it's not important to show the value or type for a given key at all,
428+
replace the value with an ellipsis in the expected output file.
429+
430+
```txt
431+
`{_id: ...}`
432+
```
433+
434+
Matches any value for the key `_id` in the actual output.
435+
436+
###### Omit any number of keys and values entirely
437+
438+
If actual output contains many keys and values that are not necessary to show
439+
to illustrate an example, add an ellipsis as a standalone line in your
440+
expected output file:
310441

311-
self.assertEqual(expected_output, actual_output)
442+
```txt
443+
{
444+
full_name: 'Carmen Sandiego',
445+
...
446+
}
447+
```
448+
449+
Matches actual output that contains any number of additional keys and values
450+
beyond the `full_name` field.
451+
452+
You can also interject standalone `...` lines between properties, similar to:
453+
454+
```txt
455+
{
456+
full_name: 'Carmen Sandiego',
457+
...
458+
address: 'Somewhere in the world...'
459+
}
460+
```
461+
462+
##### Complete options reference
463+
464+
The `ComparisonOptions` object supports these properties:
465+
466+
```py
467+
comparison_type: Optional[str] = None # "ordered", "unordered", or None (auto-select)
468+
ignore_field_values: Optional[set[str]] = None
469+
timeout_seconds: int = 30
470+
array_size_threshold: int = 50 # Max size for unordered backtracking
312471
```
313472

314473
#### Verify print output
@@ -339,6 +498,66 @@ Save the connection string for use in the next step. If needed, see
339498
[here](https://www.mongodb.com/docs/atlas/cli/current/atlas-cli-deploy-local/)
340499
for how to create a local deployment.
341500

501+
### Load sample data
502+
503+
Some of the tests in this project use the MongoDB sample data. The test suite
504+
automatically detects whether sample data is available and skips tests that
505+
require missing datasets, providing clear feedback about what's available.
506+
507+
#### Automatic sample data detection
508+
509+
The test suite includes built-in sample data detection that:
510+
511+
- **Automatically skips tests** when required sample datasets are not available
512+
- **Shows a status summary** at the start of test runs indicating available databases
513+
- **Provides concise warnings** about which specific tests are being skipped
514+
- **Caches detection results** to avoid repeated database queries during test runs
515+
- **Works seamlessly** - no special commands or scripts needed
516+
517+
When you run tests, you'll see a status summary like:
518+
519+
```
520+
📊 Sample Data Status: 3 database(s) available
521+
Found: sample_mflix, sample_restaurants, sample_analytics
522+
523+
⚠️ Skipping "Advanced Movie Analysis" - Missing: sample_training
524+
```
525+
526+
#### Atlas
527+
528+
To learn how to load sample data in Atlas, refer to this docs page:
529+
530+
- [Atlas](https://www.mongodb.com/docs/atlas/sample-data/)
531+
532+
#### Local deployment
533+
534+
If you're running MongoDB locally in a docker container:
535+
536+
1. Install the MongoDB Database Tools.
537+
538+
You must install the MongoDB Command Line Database Tools to access the
539+
`mongorestore` command, which you'll use to load the sample data. Refer to
540+
the Database Tools [Installation](https://www.mongodb.com/docs/database-tools/installation/)
541+
docs for details.
542+
543+
2. Download the sample database.
544+
545+
Run the following command in your terminal to download the sample data:
546+
547+
```shell
548+
curl https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive
549+
```
550+
551+
3. Load the sample data.
552+
553+
Run the following command in your terminal to load the data into your
554+
deployment, replacing `<port-number>` with the port where you're hosting the
555+
deployment:
556+
557+
```shell
558+
mongorestore --archive=sampledata.archive --port=<port-number>
559+
```
560+
342561
### Create a .env file
343562

344563
Create a file named '.env' at the root of the '/python' directory within this

code-example-tests/python/pymongo/examples/aggregation/pipelines/filter/filter_tutorial.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from datetime import datetime
22
from pymongo import MongoClient
33

4+
45
def example(CONNECTION_STRING):
56
client = MongoClient(CONNECTION_STRING)
67
try:
@@ -110,12 +111,12 @@ def example(CONNECTION_STRING):
110111
aggregation_result = person_coll.aggregate(pipeline)
111112
# :snippet-end:
112113

113-
document_list = [] # :remove:
114+
document_list = [] # :remove:
114115
for document in aggregation_result:
115116
print(document)
116-
document_list.append(document) # :remove:
117+
document_list.append(document) # :remove:
117118

118-
return document_list # :remove:
119+
return document_list # :remove:
119120

120121
finally:
121122
client.close()

code-example-tests/python/pymongo/examples/aggregation/pipelines/group/group_tutorial.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from datetime import datetime
22
from pymongo import MongoClient
33

4+
45
def example(CONNECTION_STRING):
56
client = MongoClient(CONNECTION_STRING)
67
try:

code-example-tests/python/pymongo/examples/aggregation/pipelines/join/multi_field_tutorial.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from datetime import datetime
22
from pymongo import MongoClient
33

4+
45
def example(CONNECTION_STRING):
56
client = MongoClient(CONNECTION_STRING)
67
try:

code-example-tests/python/pymongo/examples/aggregation/pipelines/join/one_to_one_tutorial.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from datetime import datetime
22
from pymongo import MongoClient
33

4+
45
def example(CONNECTION_STRING):
56
client = MongoClient(CONNECTION_STRING)
67
try:

0 commit comments

Comments
 (0)