Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH, DOC: Add JupyterLite-powered interactive examples for the pandas documentation #61061

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

agriyakhetarpal
Copy link
Contributor

Description

This PR is the first step in a series of PRs to add WASM-based interactive documentation elements for the pandas documentation. Particularly, this relies on JupyterLite and jupyterlite-sphinx.

The interactive REPL added to the website has been incorporated to use the same JupyterLite deployment that jupyterlite-sphinx internally builds, so that we don't build a separate/duplicate deployment.

The TryExamples directive from jupyterlite-sphinx has been added globally to all Numpydoc-processed docstrings with an "Examples" section via a global_enable_try_examples configuration option. The buttons enabled by the directive have been styled accordingly.

Next steps

The follow-up steps after this PR will be to:

  • address the mismatch between the versions of pandas available to use from the Pyodide distribution and the version of pandas that users are viewing the documentation for (most likely when we release Pyodide 0.28)
  • expand interactive documentation elements for the long-form content in the "User Guide" section (this PR just makes the API examples interactive).

See also

Copy link
Contributor Author

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a self-review where I leave some comments that I decided not to inline in code.

Unfortunately, for the interactive examples to work, the policy listed on the https://pandas.pydata.org/docs/development/contributing_docstring.html#section-7-examples page concerning the non-addition of import pandas as pd or import numpy statements might need to be revisited by the pandas maintainers. This is because jupyterlite-sphinx will convert the examples section to a mini-notebook that will be provided to JupyterLite, and rendered in-place. It is thus imperative for all docstring-based examples to be "self-contained", i.e., be copy-pasteable directly into a terminal or IPython shell where no other programs have been pre-imported.

I performed the same change for numpy/numpy#26814. This was also in discussion over at SciPy in scipy/scipy#13049, where the outcome was to add the necessary imports. I'm curious to know if this can be a possibility here, too.

I'll open this up and mark it ready for review, though it isn't ready for merging.

Comment on lines +660 to +681
Try pandas online (experimental)
--------------------------------

Try our experimental `JupyterLite <https://jupyterlite.readthedocs.io/en/stable/>`__ live shell with ``pandas``, powered by `Pyodide <https://pyodide.org/en/stable/>`__.

**Please note it can take a while (>30 seconds) before the shell is initialized and ready to run commands.**

**Running it requires a reasonable amount of bandwidth and resources (>70 MiB on the first load), so
it may not work properly on all devices or networks.**


.. replite::
:kernel: pyodide
:height: 600px
:prompt: Try pandas online!
:execute: False
:prompt_color: #E70288

import pandas as pd
df = pd.DataFrame({"num_legs": [2, 4], "num_wings": [2, 0]}, index=["falcon", "dog"])
df

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Dr-Irv, this should address #61060 – I've marked this PR as closing the issue.

Specifically, what I've added here is jupyterlite-sphinx's Replite directive, which brings the same interface as the one that now exists on https://pandas.pydata.org/try.html, and both Getting Started pages will use the same JupyterLite live shell, sharing the JupyterLite assets. This directive was designed for this purpose. This shell in the docs, just like the one on the website, does not load or consume any bandwidth until a user explicitly interacts with the REPL button (and goes further by not loading even the JupyterLite assets until prompted).

I can switch this to just a link instead of a REPL, if you wish.

Comment on lines +1 to +3
{
"global_min_height": "600px"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# see the following links for more context:
# 1. https://jupyterlite-pyodide-kernel.readthedocs.io/en/stable/#compatibility
# 2. https://pyodide.org/en/stable/usage/packages-in-pyodide.html
- jupyterlite-core
- jupyterlite-sphinx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jupyterlite-sphinx depends on jupyterlite-core.

@agriyakhetarpal
Copy link
Contributor Author

The remaining CI failures here look unrelated; I merged main to see if that makes a difference.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on this

Probably fine to have a link in the docs to the interactive terminal, but I don't think we should add the terminal itself to the docs.

The wasm terminal seems to be working much better than before, but adding it to the docs it means that when we release we can't change that page anymore. And I don't think we're confident enough with this for that.

Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.

Also, my personal suggestion was not to move this to a separate page in pandas. Was to move all this to a separate repository, publish it in github pages, and then we can link to it from our docs as needed. I think this is a very cool feature, but I think it's still something new that needs much faster iterations than pandas itself, and I don't think updates, fixes... Should happen in this repository.

Is there any reason why we shouldn't get rid of all the wasm stuff here, and simply have the links and the iframe in the website?

CC: @Dr-Irv

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 16, 2025

Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.

I agree with this. With respect to what happens in the docs, it could be the same issue. We don't want people to have to wait for docs pages to load because a bunch of downloading has to happen.

I'd also like to see how this actually looks when rendered. It also seems that we'd need to update a LOT of docs pages if there is going to be a "try it" button for each one of our rendered API pages that works correctly. But maybe I'm missing something.

@agriyakhetarpal
Copy link
Contributor Author

Thanks for the feedback, @datapythonista and @Dr-Irv!

Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.

I agree with this. With respect to what happens in the docs, it could be the same issue. We don't want people to have to wait for docs pages to load because a bunch of downloading has to happen.

This is answered here: #61061 (comment). No bandwidth consumption occurs on loading the Getting Started page in the docs, as the directive hides it behind a button that a user would need to interact with. Also, the terminal is shared between the website and the docs, as everything is hosted on the same domain at https://pandas.pydata.org/. If someone uses the one on the website, the assets are cached for the one on the docs, shall they use it (and vice versa).

my personal suggestion Was to move all this to a separate repository, publish it in github pages, and then we can link to it from our docs as needed. I think this is a very cool feature, but I think it's still something new that needs much faster iterations than pandas itself, and I don't think updates, fixes... Should happen in this repository.

Yes, I suggested this in the previous PR at #60758 (comment). If we are to incorporate this suggestion I'll need someone from the pandas team to coordinate with me to create such a repository from the template at https://github.com/jupyterlite/demo, and I can help maintain it and subsequently move things here.

I'd also like to see how this actually looks when rendered. It also seems that we'd need to update a LOT of docs pages if there is going to be a "try it" button for each one of our rendered API pages that works correctly. But maybe I'm missing something.

Oh, the "Try it" button is enabled across the API reference pages (for all valid Numpydoc-based "Examples" sections for a public API method's docstring) through the global_enable_try_examples configuration option, so it's not required to add it per API example manually. I'm not sure if pandas has a documentation preview on its PRs where we can test it out. You may try out a sample from the NumPy devdocs here: https://numpy.org/devdocs/reference/generated/numpy.strings.replace.html#numpy.strings.replace (the corresponding PR which added this has been linked in the PR description). It's also possible to exclude certain pages through a regex pattern if they don't make sense with being interactive (for example, numpy.distutils doesn't need to be, of course).

@datapythonista
Copy link
Member

/preview

Copy link
Contributor

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61061/

@datapythonista datapythonista dismissed their stale review March 17, 2025 13:22

Removing the requested changes flag.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @agriyakhetarpal, my bad, I misunderstood what this was doing. I thought "try examples" meant only the terminal we've got in the getting started page, and clearly it didn't seem worth the added complexity and possible problems to me. Letting users run the API examples is something we wanted for a long time.

I created a new repo https://github.com/pandas-dev/pandas-jupyterlite so we can have the assets and the CI for the interactive terminal. I think it'll make your life easier, and also reduce the maintenance in this repo, which is always appreciated. I gave you and @jtpio permissions there.

To get a preview of the changes here you can run /preview in a comment as above. The preview is not automatically updated, you'll have to rerun with the /preview comment after making changes.

There is some magic that the examples do, this code needs to be executed in the session for the examples to work: https://github.com/pandas-dev/pandas/blob/main/doc/source/conf.py#L373

Screenshot at 2025-03-17 20-10-43

I think it'd be good that the warning about reporting bugs points to the new repo. I wouldn't be surprised if we get a decent amount of things not fully supported in wasm. You'll know better than me. It will also help users see if an error has been reported.

Just a comment for the future, some examples need these files, which I guess should be placed in the wasm container eventually: https://github.com/pandas-dev/pandas/tree/main/doc/data

Thanks for your work here, and sorry for the misunderstanding.

@agriyakhetarpal
Copy link
Contributor Author

Sorry @agriyakhetarpal, my bad, I misunderstood what this was doing. I thought "try examples" meant only the terminal we've got in the getting started page, and clearly it didn't seem worth the added complexity and possible problems to me. Letting users run the API examples is something we wanted for a long time.

I created a new repo pandas-dev/pandas-jupyterlite so we can have the assets and the CI for the interactive terminal. I think it'll make your life easier, and also reduce the maintenance in this repo, which is always appreciated. I gave you and @jtpio permissions there.

I think it'd be good that the warning about reporting bugs points to the new repo. I wouldn't be surprised if we get a decent amount of things not fully supported in wasm. You'll know better than me. It will also help users see if an error has been reported.

Thanks a lot for reconsidering your stance, and no worries about the misunderstanding! I accepted the invitation to collaborate on https://github.com/pandas-dev/pandas-jupyterlite yesterday. I'll initiate work on that repository in a moment, and I agree that issues should be directed towards it – will point to that in the warning text.

I do have to note that this repository would still be limited for use only in the REPL on the Getting Started pages at the moment, and not for the API examples as they use a notebook instead of a REPL.

Currently, all code snippets in Numpydoc-based examples sections are converted to notebooks and assigned UUIDs for filenames, and subsequently included as a part of the built docs. While we can get jupyterlite-sphinx to use an external JupyterLite site instead of it building its own as a part of the documentation builds, there has to be a way to include example snippets from an external site. @jtpio, can there be a way to support the REPL's &code URL parameter for the Notebook interface, too?

To get a preview of the changes here you can run /preview in a comment as above. The preview is not automatically updated, you'll have to rerun with the /preview comment after making changes.

Noted!

There is some magic that the examples do, this code needs to be executed in the session for the examples to work: main/doc/source/conf.py#L373

Screenshot at 2025-03-17 20-10-43

Yes, this is also something I think @jtpio would have some better context on. Being able to run an example as-is would require executing code as a part of the kernel, which means we need to pre-install pandas into the environment, which in-turn is a long-standing issue for downstream users: jupyterlite/pyodide-kernel#60. The Pyodide kernel doesn't support this, but the Xeus kernel does. Even if we switch to that kernel, we still need an import pandas as pd statement to be added to the notebooks1.

I noted this above in my self-review at #61061 (review), and contemporary discussions for the same have taken place for this in NumPy and SciPy. The simpler thing would be for pandas to consider changing its policy and allow the examples to be self-contained. However, I lack the knowledge and prior involvement in previous discussions around this area, so I'm not sure about the reason why things are the way they are now.

Just a comment for the future, some examples need these files, which I guess should be placed in the wasm container eventually: main/doc/data

Yes, this should be possible to include – I'll add them. I see that most of these CSV files are needed as a part of the notebooks in the User Guide and not in the API examples, making things easier.

Footnotes

  1. I've been working on allowing notebook modifications for jupyterilte-sphinx lately, which should support this with the next release.

@datapythonista
Copy link
Member

I assumed there is a generic wasm file for all examples, and that clicking on the try out would somehow pass the code to run to it. That's why I was proposing to use an external repo for it. If things are more complex I guess it's on to use whatever the sphinx plugin does.

Good point about the data files. I was thinking they were mostly used for the read_whatever methods. They do in some cases, like here: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas.read_excel

But surely worth to not include in the wasm the ones only used in the user guide if they won't be runnable.

Finding a solution to run the imports before the examples is clearly a must. It's surely fine to not run them, but include them in an initial cell in the notebook so users themselves run it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Add link to the "try pandas online" in the regular documentation
3 participants