zigzagdev
diff --git a/‎.github/workflows/pymongo-driver-examples-check-formatting.yml‎
Lines changed: 7 additions & 3 deletions b/‎.github/workflows/pymongo-driver-examples-check-formatting.yml‎
Lines changed: 7 additions & 3 deletions
diff --git a/‎.github/workflows/pymongo-driver-examples-test-in-docker.yml‎
Lines changed: 8 additions & 0 deletions b/‎.github/workflows/pymongo-driver-examples-test-in-docker.yml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎code-example-tests/python/pymongo/README.md‎
Lines changed: 230 additions & 11 deletions b/‎code-example-tests/python/pymongo/README.md‎
Lines changed: 230 additions & 11 deletions
diff --git a/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/filter/filter_tutorial.py‎
Lines changed: 4 additions & 3 deletions b/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/filter/filter_tutorial.py‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/group/group_tutorial.py‎
Lines changed: 1 addition & 0 deletions b/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/group/group_tutorial.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/join/multi_field_tutorial.py‎
Lines changed: 1 addition & 0 deletions b/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/join/multi_field_tutorial.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/join/one_to_one_tutorial.py‎
Lines changed: 1 addition & 0 deletions b/‎code-example-tests/python/pymongo/examples/aggregation/pipelines/join/one_to_one_tutorial.py‎
Lines changed: 1 addition & 0 deletions
@@ -17,8 +17,12 @@ jobs:
           python-version: '3.13'
       - name: Install dependencies
         run: |
-          pip install pylint
-      - name: Check code example formatting
+          pip install pylint black
+      - name: Check code example formatting with Black
         run: |
           cd code-example-tests/python/pymongo
-          pylint ./examples/
+          black --check ./examples/
+      - name: Lint code examples with Pylint
+        run: |
+          cd code-example-tests/python/pymongo
+          pylint ./examples/
@@ -18,6 +18,14 @@ jobs:
       - name: Set up a local deployment using Atlas CLI
         run: |
           atlas deployments setup myLocalRs1 --type local --port 27017 --force
+      - name: Install MongoDB Database Tools to load sample data
+        run: |
+          curl https://fastdl.mongodb.org/tools/db/mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb --output mdb-db-tools.deb
+          sudo apt install ./mdb-db-tools.deb
+      - name: Download sample data
+        run: curl  https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive
+      - name: Add sample data to database
+        run: mongorestore --archive=sampledata.archive --port=27017
       - name: Setup Python
         uses: actions/setup-python@v5
         with:
 
@@ -127,18 +127,18 @@ to store the output alongside the example. For example:
 - `aggregation/pipelines/tutorial.py`
 - `aggregation/pipelines/tutorial-output.sh`
 
-### Check formatting with Pylint
+### Check formatting with Black
+
+This project uses [Black](https://github.com/psf/black) for Python code formatting.
 
 For consistency, code example formatting is enforced automatically by a workflow that
-runs Pylint on every change in the `/examples` directory. You can also check formatting
+runs Black on every change in the `/examples` directory. You can also fix formatting
 locally by running Pylint from the command line.
 
 ```
-pylint ./examples/
+black ./examples/
 ```
 
-Fix any errors.
-
 ## To add a test for a new code example
 
 To add a test for a new code example:
@@ -255,6 +255,40 @@ If you copied `test_example_stub.py`, make sure to do the following updates:
    the example code uses. This ensures the right database is being
    dropped between tests.
 
+#### Writing tests that use sample data
+
+If your code examples require MongoDB sample data, import the sample data utility:
+
+```python
+from utils.sample_data import requires_sample_data
+```
+
+Use the `@requires_sample_data()` decorator to require one or more databases or
+collections for test execution. Tests automatically skip when required sample
+databases are not available.
+
+```python
+class TestMovieQueries(unittest.TestCase):
+
+  @requires_sample_data("sample_mflix")
+  def test_find_movies(self):
+      # This test will be skipped if sample_mflix database is not available
+      # Your test implementation here
+      pass
+
+  @requires_sample_data("sample_mflix", collections=["movies", "theaters"])
+  def test_specific_collections(self):
+      # This test requires specific collections to be present
+      # Your test implementation here
+      pass
+
+  @requires_sample_data(["sample_mflix", "sample_restaurants"])
+  def test_multiple_databases(self):
+      # This test requires multiple sample databases
+      # Your test implementation here
+      pass
+```
+
 ### Define logic to verify the output
 
 You can verify the output in a few different ways:
@@ -290,11 +324,10 @@ filename matches the example - i.e. `tutorial-output.sh`. Then, read the
 contents of the file in the test and verify that the output matches what the
 test returns.
 
-First, import the helper function to read the output from file:
+First, import the helper function from the comparison library:
 
 ```py
-# Import the helper function to read the file and parse its output
-import utils.test_helper as test_helper
+from utils.comparison.assert_helpers import assert_expected_file_matches_output
 ```
 
 Then, validate the actual output against the output we expect based on the
@@ -304,11 +337,137 @@ file:
 # Run the example
 actual_output = example_stub.example(TestExampleStub.CONNECTION_STRING)
 
-# Read the content of the expected output
+# Use the comparison library to validate that the output matches
 output_filepath = 'examples/aggregation/pipelines/tutorial.sh'
-expected_output = test_helper.get_expected_output(output_filepath)
+assert_expected_file_matches_output(self, output_filepath, actual_output)
+```
+
+##### Use options to specify comparison behaviors
+
+First, import comparison options from the comparison library:
+
+```py
+from utils.comparison.comparison import ComparisonOptions
+```
+
+Choose the appropriate options based on your output characteristics:
+
+##### Verify unordered output (default behavior)
+
+For output that can be in any order (most common case):
+
+```py
+# Pass the `comparisonType` option explicitly:
+options = ComparisonOptions(comparison_type="unordered")
+assert_expected_file_matches_output(self, output_filepath, actual_output, options)
+
+# Omit the options object (unordered comparison is used by default)
+assert_expected_file_matches_output(self, output_filepath, actual_output)
+```
+
+##### Verify ordered output
+
+For output that must be in a specific order (e.g., when using sort operations):
+
+```py
+options = ComparisonOptions(comparison_type="ordered")
+assert_expected_file_matches_output(self, output_filepath, actual_output, options)
+```
+
+##### Handle variable field values
+
+When your output contains fields that will have different values between test runs
+(such as ObjectIds, timestamps, UUIDs, or other auto-generated values), ignore
+specific fields during comparison:
+
+```py
+options = ComparisonOptions(ignore_field_values={"_id"})
+assert_expected_file_matches_output(self, output_filepath, actual_output, options)
+```
+
+This ensures the comparison only validates that the field names are present,
+without checking if the values match exactly. This is particularly useful for:
+
+- **Database IDs**: `_id`, `userId`, `documentId`
+- **Timestamps**: `createdAt`, `updatedAt`, `timestamp`
+- **UUIDs and tokens**: `uuid`, `sessionId`, `apiKey`
+- **Auto-generated values**: Any field with dynamic content
+
+##### Handle flexible content in output files
+
+For output files that truncate the actual output to show only what's relevant
+to our readers, use ellipsis patterns (`...`) in your output files to enable
+flexible content matching. Our tooling automatically detects and handles these
+patterns.
+
+###### Shorten string values
+
+You can use an ellipsis at the end of a string value to shorten it in the
+example output. This will match any number of characters in the actual return
+after the `...`.
+
+In the expected output file, add an ellipsis to the end of a string value:
+
+```txt
+{
+  plot: 'A young man is accidentally sent 30 years into the past...',
+}
+```
+
+This matches the actual output of:
+
+```txt
+{
+  plot: 'A young man is accidentally sent 30 years into the past in a time-traveling DeLorean invented by his close friend, the maverick scientist Doc Brown.',
+}
+```
+
+###### Omit unimportant values for keys
+
+If it's not important to show the value or type for a given key at all,
+replace the value with an ellipsis in the expected output file.
+
+```txt
+`{_id: ...}`
+```
+
+Matches any value for the key `_id` in the actual output.
+
+###### Omit any number of keys and values entirely
+
+If actual output contains many keys and values that are not necessary to show
+to illustrate an example, add an ellipsis as a standalone line in your
+expected output file:
 
-self.assertEqual(expected_output, actual_output)
+```txt
+{
+  full_name: 'Carmen Sandiego',
+  ...
+}
+```
+
+Matches actual output that contains any number of additional keys and values
+beyond the `full_name` field.
+
+You can also interject standalone `...` lines between properties, similar to:
+
+```txt
+{
+  full_name: 'Carmen Sandiego',
+  ...
+  address: 'Somewhere in the world...'
+}
+```
+
+##### Complete options reference
+
+The `ComparisonOptions` object supports these properties:
+
+```py
+comparison_type: Optional[str] = None  # "ordered", "unordered", or None (auto-select)
+ignore_field_values: Optional[set[str]] = None
+timeout_seconds: int = 30
+array_size_threshold: int = 50  # Max size for unordered backtracking
 ```
 
 #### Verify print output
@@ -339,6 +498,66 @@ Save the connection string for use in the next step. If needed, see
 [here](https://www.mongodb.com/docs/atlas/cli/current/atlas-cli-deploy-local/)
 for how to create a local deployment.
 
+### Load sample data
+
+Some of the tests in this project use the MongoDB sample data. The test suite
+automatically detects whether sample data is available and skips tests that
+require missing datasets, providing clear feedback about what's available.
+
+#### Automatic sample data detection
+
+The test suite includes built-in sample data detection that:
+
+- **Automatically skips tests** when required sample datasets are not available
+- **Shows a status summary** at the start of test runs indicating available databases
+- **Provides concise warnings** about which specific tests are being skipped
+- **Caches detection results** to avoid repeated database queries during test runs
+- **Works seamlessly** - no special commands or scripts needed
+
+When you run tests, you'll see a status summary like:
+
+```
+📊 Sample Data Status: 3 database(s) available
+   Found: sample_mflix, sample_restaurants, sample_analytics
+
+⚠️  Skipping "Advanced Movie Analysis" - Missing: sample_training
+```
+
+#### Atlas
+
+To learn how to load sample data in Atlas, refer to this docs page:
+
+- [Atlas](https://www.mongodb.com/docs/atlas/sample-data/)
+
+#### Local deployment
+
+If you're running MongoDB locally in a docker container:
+
+1. Install the MongoDB Database Tools.
+
+   You must install the MongoDB Command Line Database Tools to access the
+   `mongorestore` command, which you'll use to load the sample data. Refer to
+   the Database Tools [Installation](https://www.mongodb.com/docs/database-tools/installation/)
+   docs for details.
+
+2. Download the sample database.
+
+   Run the following command in your terminal to download the sample data:
+
+   ```shell
+   curl  https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive
+   ```
+
+3. Load the sample data.
+
+   Run the following command in your terminal to load the data into your
+   deployment, replacing `<port-number>` with the port where you're hosting the
+   deployment:
+
+   ```shell
+   mongorestore --archive=sampledata.archive --port=<port-number>
+   ```
+
 ### Create a .env file
 
 Create a file named '.env' at the root of the '/python' directory within this
 
@@ -1,6 +1,7 @@
 from datetime import datetime
 from pymongo import MongoClient
 
+
 def example(CONNECTION_STRING):
     client = MongoClient(CONNECTION_STRING)
     try:
@@ -110,12 +111,12 @@ def example(CONNECTION_STRING):
         aggregation_result = person_coll.aggregate(pipeline)
         # :snippet-end:
 
-        document_list = [] # :remove:
+        document_list = []  # :remove:
         for document in aggregation_result:
             print(document)
-            document_list.append(document) # :remove:
+            document_list.append(document)  # :remove:
 
-        return document_list # :remove:
+        return document_list  # :remove:
 
     finally:
         client.close()
@@ -1,6 +1,7 @@
 from datetime import datetime
 from pymongo import MongoClient
 
+
 def example(CONNECTION_STRING):
     client = MongoClient(CONNECTION_STRING)
     try:
 
@@ -1,6 +1,7 @@
 from datetime import datetime
 from pymongo import MongoClient
 
+
 def example(CONNECTION_STRING):
     client = MongoClient(CONNECTION_STRING)
     try:
 
@@ -1,6 +1,7 @@
 from datetime import datetime
 from pymongo import MongoClient
 
+
 def example(CONNECTION_STRING):
     client = MongoClient(CONNECTION_STRING)
     try: