ENH, DOC: Add JupyterLite-powered interactive examples for the `pandas` documentation #61061

agriyakhetarpal · 2025-03-05T19:08:25Z

follow-up for ENH: Add JupyterLite-powered shell for the website (reprise of #47428) #60758 and BLD, TST: Build and test Pyodide wheels for pandas in CI #57896; closes DOC: Add link to the "try pandas online" in the regular documentation #61060
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Description

This PR is the first step in a series of PRs to add WASM-based interactive documentation elements for the pandas documentation. Particularly, this relies on JupyterLite and jupyterlite-sphinx.

The interactive REPL added to the website has been incorporated to use the same JupyterLite deployment that jupyterlite-sphinx internally builds, so that we don't build a separate/duplicate deployment.

The TryExamples directive from jupyterlite-sphinx has been added globally to all Numpydoc-processed docstrings with an "Examples" section via a global_enable_try_examples configuration option. The buttons enabled by the directive have been styled accordingly.

Next steps

The follow-up steps after this PR will be to:

address the mismatch between the versions of pandas available to use from the Pyodide distribution and the version of pandas that users are viewing the documentation for (most likely when we release Pyodide 0.28)
expand interactive documentation elements for the long-form content in the "User Guide" section (this PR just makes the API examples interactive).

SciPy's interactive docs effort: DOC: Add interactive examples with jupyterlite-sphinx scipy/scipy#19729, DOC: Add support for interactive examples with jupyterlite-sphinx scipy/scipy#20019
Similarly, NumPy's interactive docs effort: ENH, DOC: Add support for interactive examples for NumPy with jupyterlite-sphinx numpy/numpy#26745
JupyterLite support in sphinx-gallery: https://sphinx-gallery.github.io/stable/configuration.html#jupyterlite
The TryExamples directive, used to enable API reference examples: https://jupyterlite-sphinx.readthedocs.io/en/latest/directives/try_examples.html
Interactive documentation for Scientific Python projects scientific-python/summit-2024#19

agriyakhetarpal

Here is a self-review where I leave some comments that I decided not to inline in code.

Unfortunately, for the interactive examples to work, the policy listed on the https://pandas.pydata.org/docs/development/contributing_docstring.html#section-7-examples page concerning the non-addition of import pandas as pd or import numpy statements might need to be revisited by the pandas maintainers. This is because jupyterlite-sphinx will convert the examples section to a mini-notebook that will be provided to JupyterLite, and rendered in-place. It is thus imperative for all docstring-based examples to be "self-contained", i.e., be copy-pasteable directly into a terminal or IPython shell where no other programs have been pre-imported.

I performed the same change for numpy/numpy#26814. This was also in discussion over at SciPy in scipy/scipy#13049, where the outcome was to add the necessary imports. I'm curious to know if this can be a possibility here, too.

I'll open this up and mark it ready for review, though it isn't ready for merging.

agriyakhetarpal · 2025-03-05T19:12:56Z

doc/source/getting_started/index.rst

+Try pandas online (experimental)
+--------------------------------
+
+Try our experimental `JupyterLite <https://jupyterlite.readthedocs.io/en/stable/>`__ live shell with ``pandas``, powered by `Pyodide <https://pyodide.org/en/stable/>`__.
+
+**Please note it can take a while (>30 seconds) before the shell is initialized and ready to run commands.**
+
+**Running it requires a reasonable amount of bandwidth and resources (>70 MiB on the first load), so
+it may not work properly on all devices or networks.**
+
+
+.. replite::
+  :kernel: pyodide
+  :height: 600px
+  :prompt: Try pandas online!
+  :execute: False
+  :prompt_color: #E70288
+
+  import pandas as pd
+  df = pd.DataFrame({"num_legs": [2, 4], "num_wings": [2, 0]}, index=["falcon", "dog"])
+  df
+


Hi @Dr-Irv, this should address #61060 – I've marked this PR as closing the issue.

Specifically, what I've added here is jupyterlite-sphinx's Replite directive, which brings the same interface as the one that now exists on https://pandas.pydata.org/try.html, and both Getting Started pages will use the same JupyterLite live shell, sharing the JupyterLite assets. This directive was designed for this purpose. This shell in the docs, just like the one on the website, does not load or consume any bandwidth until a user explicitly interacts with the REPL button (and goes further by not loading even the JupyterLite assets until prompted).

I can switch this to just a link instead of a REPL, if you wish.

agriyakhetarpal · 2025-03-05T19:13:31Z

doc/source/try_examples.json

+{
+  "global_min_height": "600px"
+}


We've documented this option here: https://jupyterlite-sphinx.readthedocs.io/en/latest/directives/try_examples.html#global-min-height

agriyakhetarpal · 2025-03-05T19:14:04Z

environment.yml

  # see the following links for more context:
  # 1. https://jupyterlite-pyodide-kernel.readthedocs.io/en/stable/#compatibility
  # 2. https://pyodide.org/en/stable/usage/packages-in-pyodide.html
-  - jupyterlite-core
+  - jupyterlite-sphinx


jupyterlite-sphinx depends on jupyterlite-core.

agriyakhetarpal · 2025-03-06T14:09:55Z

The remaining CI failures here look unrelated; I merged main to see if that makes a difference.

datapythonista

-1 on this

Probably fine to have a link in the docs to the interactive terminal, but I don't think we should add the terminal itself to the docs.

The wasm terminal seems to be working much better than before, but adding it to the docs it means that when we release we can't change that page anymore. And I don't think we're confident enough with this for that.

Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.

Also, my personal suggestion was not to move this to a separate page in pandas. Was to move all this to a separate repository, publish it in github pages, and then we can link to it from our docs as needed. I think this is a very cool feature, but I think it's still something new that needs much faster iterations than pandas itself, and I don't think updates, fixes... Should happen in this repository.

Is there any reason why we shouldn't get rid of all the wasm stuff here, and simply have the links and the iframe in the website?

CC: @Dr-Irv

Dr-Irv · 2025-03-16T19:12:45Z

Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.

I agree with this. With respect to what happens in the docs, it could be the same issue. We don't want people to have to wait for docs pages to load because a bunch of downloading has to happen.

I'd also like to see how this actually looks when rendered. It also seems that we'd need to update a LOT of docs pages if there is going to be a "try it" button for each one of our rendered API pages that works correctly. But maybe I'm missing something.

agriyakhetarpal · 2025-03-17T06:11:42Z

Thanks for the feedback, @datapythonista and @Dr-Irv!

Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.

I agree with this. With respect to what happens in the docs, it could be the same issue. We don't want people to have to wait for docs pages to load because a bunch of downloading has to happen.

This is answered here: #61061 (comment). No bandwidth consumption occurs on loading the Getting Started page in the docs, as the directive hides it behind a button that a user would need to interact with. Also, the terminal is shared between the website and the docs, as everything is hosted on the same domain at https://pandas.pydata.org/. If someone uses the one on the website, the assets are cached for the one on the docs, shall they use it (and vice versa).

my personal suggestion Was to move all this to a separate repository, publish it in github pages, and then we can link to it from our docs as needed. I think this is a very cool feature, but I think it's still something new that needs much faster iterations than pandas itself, and I don't think updates, fixes... Should happen in this repository.

Yes, I suggested this in the previous PR at #60758 (comment). If we are to incorporate this suggestion I'll need someone from the pandas team to coordinate with me to create such a repository from the template at https://github.com/jupyterlite/demo, and I can help maintain it and subsequently move things here.

I'd also like to see how this actually looks when rendered. It also seems that we'd need to update a LOT of docs pages if there is going to be a "try it" button for each one of our rendered API pages that works correctly. But maybe I'm missing something.

Oh, the "Try it" button is enabled across the API reference pages (for all valid Numpydoc-based "Examples" sections for a public API method's docstring) through the global_enable_try_examples configuration option, so it's not required to add it per API example manually. I'm not sure if pandas has a documentation preview on its PRs where we can test it out. You may try out a sample from the NumPy devdocs here: https://numpy.org/devdocs/reference/generated/numpy.strings.replace.html#numpy.strings.replace (the corresponding PR which added this has been linked in the PR description). It's also possible to exclude certain pages through a regex pattern if they don't make sense with being interactive (for example, numpy.distutils doesn't need to be, of course).

datapythonista · 2025-03-17T12:58:31Z

/preview

github-actions · 2025-03-17T12:59:34Z

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61061/

Removing the requested changes flag.

datapythonista

Sorry @agriyakhetarpal, my bad, I misunderstood what this was doing. I thought "try examples" meant only the terminal we've got in the getting started page, and clearly it didn't seem worth the added complexity and possible problems to me. Letting users run the API examples is something we wanted for a long time.

I created a new repo https://github.com/pandas-dev/pandas-jupyterlite so we can have the assets and the CI for the interactive terminal. I think it'll make your life easier, and also reduce the maintenance in this repo, which is always appreciated. I gave you and @jtpio permissions there.

To get a preview of the changes here you can run /preview in a comment as above. The preview is not automatically updated, you'll have to rerun with the /preview comment after making changes.

There is some magic that the examples do, this code needs to be executed in the session for the examples to work: https://github.com/pandas-dev/pandas/blob/main/doc/source/conf.py#L373

I think it'd be good that the warning about reporting bugs points to the new repo. I wouldn't be surprised if we get a decent amount of things not fully supported in wasm. You'll know better than me. It will also help users see if an error has been reported.

Just a comment for the future, some examples need these files, which I guess should be placed in the wasm container eventually: https://github.com/pandas-dev/pandas/tree/main/doc/data

Thanks for your work here, and sorry for the misunderstanding.

agriyakhetarpal · 2025-03-18T17:27:11Z

Sorry @agriyakhetarpal, my bad, I misunderstood what this was doing. I thought "try examples" meant only the terminal we've got in the getting started page, and clearly it didn't seem worth the added complexity and possible problems to me. Letting users run the API examples is something we wanted for a long time.

I created a new repo pandas-dev/pandas-jupyterlite so we can have the assets and the CI for the interactive terminal. I think it'll make your life easier, and also reduce the maintenance in this repo, which is always appreciated. I gave you and @jtpio permissions there.

I think it'd be good that the warning about reporting bugs points to the new repo. I wouldn't be surprised if we get a decent amount of things not fully supported in wasm. You'll know better than me. It will also help users see if an error has been reported.

Thanks a lot for reconsidering your stance, and no worries about the misunderstanding! I accepted the invitation to collaborate on https://github.com/pandas-dev/pandas-jupyterlite yesterday. I'll initiate work on that repository in a moment, and I agree that issues should be directed towards it – will point to that in the warning text.

I do have to note that this repository would still be limited for use only in the REPL on the Getting Started pages at the moment, and not for the API examples as they use a notebook instead of a REPL.

Currently, all code snippets in Numpydoc-based examples sections are converted to notebooks and assigned UUIDs for filenames, and subsequently included as a part of the built docs. While we can get jupyterlite-sphinx to use an external JupyterLite site instead of it building its own as a part of the documentation builds, there has to be a way to include example snippets from an external site. @jtpio, can there be a way to support the REPL's &code URL parameter for the Notebook interface, too?

To get a preview of the changes here you can run /preview in a comment as above. The preview is not automatically updated, you'll have to rerun with the /preview comment after making changes.

Noted!

There is some magic that the examples do, this code needs to be executed in the session for the examples to work: main/doc/source/conf.py#L373

Yes, this is also something I think @jtpio would have some better context on. Being able to run an example as-is would require executing code as a part of the kernel, which means we need to pre-install pandas into the environment, which in-turn is a long-standing issue for downstream users: jupyterlite/pyodide-kernel#60. The Pyodide kernel doesn't support this, but the Xeus kernel does. Even if we switch to that kernel, we still need an import pandas as pd statement to be added to the notebooks¹.

I noted this above in my self-review at #61061 (review), and contemporary discussions for the same have taken place for this in NumPy and SciPy. The simpler thing would be for pandas to consider changing its policy and allow the examples to be self-contained. However, I lack the knowledge and prior involvement in previous discussions around this area, so I'm not sure about the reason why things are the way they are now.

Just a comment for the future, some examples need these files, which I guess should be placed in the wasm container eventually: main/doc/data

Yes, this should be possible to include – I'll add them. I see that most of these CSV files are needed as a part of the notebooks in the User Guide and not in the API examples, making things easier.

I've been working on allowing notebook modifications for jupyterilte-sphinx lately, which should support this with the next release. ↩

datapythonista · 2025-03-19T04:27:08Z

I assumed there is a generic wasm file for all examples, and that clicking on the try out would somehow pass the code to run to it. That's why I was proposing to use an external repo for it. If things are more complex I guess it's on to use whatever the sphinx plugin does.

Good point about the data files. I was thinking they were mostly used for the read_whatever methods. They do in some cases, like here: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas.read_excel

But surely worth to not include in the wasm the ones only used in the user guide if they won't be runnable.

Finding a solution to run the imports before the examples is clearly a must. It's surely fine to not run them, but include them in an initial cell in the notebook so users themselves run it.

agriyakhetarpal added 9 commits March 5, 2025 22:58

Move runtime configuration to doc/source/

906d857

Move build-time configuration to doc/source/

d1ef0fc

Add jupyterlite-sphinx as a dependency

3e09e1a

Condigure jupyterlite_sphinx

e960a3a

Ignore generated contents by TryExamples

df33314

Enforce proper rendering of TryExamples notebooks

ebcb57c

Don't build the interactive terminal separately

20e4d55

Add some styling for the TryExamples buttons

1e9366d

Add REPL to the Getting Started page

741ca1b

agriyakhetarpal commented Mar 5, 2025

View reviewed changes

agriyakhetarpal added 3 commits March 6, 2025 00:58

Fix pre-commit failures

5c3e805

Silence JupyterLite again

00cac61

Fix docs build failures due to warnings

1f99439

agriyakhetarpal marked this pull request as ready for review March 5, 2025 20:59

agriyakhetarpal requested review from MarcoGorelli and mroeschke as code owners March 5, 2025 20:59

agriyakhetarpal mentioned this pull request Mar 5, 2025

Add projects' interactive documentation Quansight-Labs/czi-scientific-python-mgmt#19

Open

Merge branch 'main' into feat/interactive-docs

4c3f29d

datapythonista previously requested changes Mar 15, 2025

View reviewed changes

datapythonista reviewed Mar 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH, DOC: Add JupyterLite-powered interactive examples for the `pandas` documentation #61061

ENH, DOC: Add JupyterLite-powered interactive examples for the `pandas` documentation #61061

agriyakhetarpal commented Mar 5, 2025

agriyakhetarpal left a comment

agriyakhetarpal Mar 5, 2025

agriyakhetarpal Mar 5, 2025

agriyakhetarpal Mar 5, 2025

agriyakhetarpal commented Mar 6, 2025

datapythonista left a comment

Dr-Irv commented Mar 16, 2025

agriyakhetarpal commented Mar 17, 2025

datapythonista commented Mar 17, 2025

github-actions bot commented Mar 17, 2025

datapythonista left a comment

agriyakhetarpal commented Mar 18, 2025

datapythonista commented Mar 19, 2025

ENH, DOC: Add JupyterLite-powered interactive examples for the pandas documentation #61061

Are you sure you want to change the base?

ENH, DOC: Add JupyterLite-powered interactive examples for the pandas documentation #61061

Conversation

agriyakhetarpal commented Mar 5, 2025

Description

Next steps

See also

agriyakhetarpal left a comment

Choose a reason for hiding this comment

agriyakhetarpal Mar 5, 2025

Choose a reason for hiding this comment

agriyakhetarpal Mar 5, 2025

Choose a reason for hiding this comment

agriyakhetarpal Mar 5, 2025

Choose a reason for hiding this comment

agriyakhetarpal commented Mar 6, 2025

datapythonista left a comment

Choose a reason for hiding this comment

Dr-Irv commented Mar 16, 2025

agriyakhetarpal commented Mar 17, 2025

datapythonista commented Mar 17, 2025

github-actions bot commented Mar 17, 2025

datapythonista left a comment

Choose a reason for hiding this comment

agriyakhetarpal commented Mar 18, 2025

Footnotes

datapythonista commented Mar 19, 2025

ENH, DOC: Add JupyterLite-powered interactive examples for the `pandas` documentation #61061

ENH, DOC: Add JupyterLite-powered interactive examples for the `pandas` documentation #61061