-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH, DOC: Add JupyterLite-powered interactive examples for the pandas
documentation
#61061
base: main
Are you sure you want to change the base?
ENH, DOC: Add JupyterLite-powered interactive examples for the pandas
documentation
#61061
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a self-review where I leave some comments that I decided not to inline in code.
Unfortunately, for the interactive examples to work, the policy listed on the https://pandas.pydata.org/docs/development/contributing_docstring.html#section-7-examples page concerning the non-addition of import pandas as pd
or import numpy
statements might need to be revisited by the pandas maintainers. This is because jupyterlite-sphinx
will convert the examples section to a mini-notebook that will be provided to JupyterLite, and rendered in-place. It is thus imperative for all docstring-based examples to be "self-contained", i.e., be copy-pasteable directly into a terminal or IPython shell where no other programs have been pre-imported.
I performed the same change for numpy/numpy#26814. This was also in discussion over at SciPy in scipy/scipy#13049, where the outcome was to add the necessary imports. I'm curious to know if this can be a possibility here, too.
I'll open this up and mark it ready for review, though it isn't ready for merging.
Try pandas online (experimental) | ||
-------------------------------- | ||
|
||
Try our experimental `JupyterLite <https://jupyterlite.readthedocs.io/en/stable/>`__ live shell with ``pandas``, powered by `Pyodide <https://pyodide.org/en/stable/>`__. | ||
|
||
**Please note it can take a while (>30 seconds) before the shell is initialized and ready to run commands.** | ||
|
||
**Running it requires a reasonable amount of bandwidth and resources (>70 MiB on the first load), so | ||
it may not work properly on all devices or networks.** | ||
|
||
|
||
.. replite:: | ||
:kernel: pyodide | ||
:height: 600px | ||
:prompt: Try pandas online! | ||
:execute: False | ||
:prompt_color: #E70288 | ||
|
||
import pandas as pd | ||
df = pd.DataFrame({"num_legs": [2, 4], "num_wings": [2, 0]}, index=["falcon", "dog"]) | ||
df | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Dr-Irv, this should address #61060 – I've marked this PR as closing the issue.
Specifically, what I've added here is jupyterlite-sphinx
's Replite
directive, which brings the same interface as the one that now exists on https://pandas.pydata.org/try.html, and both Getting Started pages will use the same JupyterLite live shell, sharing the JupyterLite assets. This directive was designed for this purpose. This shell in the docs, just like the one on the website, does not load or consume any bandwidth until a user explicitly interacts with the REPL button (and goes further by not loading even the JupyterLite assets until prompted).
I can switch this to just a link instead of a REPL, if you wish.
{ | ||
"global_min_height": "600px" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've documented this option here: https://jupyterlite-sphinx.readthedocs.io/en/latest/directives/try_examples.html#global-min-height
# see the following links for more context: | ||
# 1. https://jupyterlite-pyodide-kernel.readthedocs.io/en/stable/#compatibility | ||
# 2. https://pyodide.org/en/stable/usage/packages-in-pyodide.html | ||
- jupyterlite-core | ||
- jupyterlite-sphinx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jupyterlite-sphinx
depends on jupyterlite-core
.
The remaining CI failures here look unrelated; I merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on this
Probably fine to have a link in the docs to the interactive terminal, but I don't think we should add the terminal itself to the docs.
The wasm terminal seems to be working much better than before, but adding it to the docs it means that when we release we can't change that page anymore. And I don't think we're confident enough with this for that.
Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.
Also, my personal suggestion was not to move this to a separate page in pandas. Was to move all this to a separate repository, publish it in github pages, and then we can link to it from our docs as needed. I think this is a very cool feature, but I think it's still something new that needs much faster iterations than pandas itself, and I don't think updates, fixes... Should happen in this repository.
Is there any reason why we shouldn't get rid of all the wasm stuff here, and simply have the links and the iframe in the website?
CC: @Dr-Irv
I agree with this. With respect to what happens in the docs, it could be the same issue. We don't want people to have to wait for docs pages to load because a bunch of downloading has to happen. I'd also like to see how this actually looks when rendered. It also seems that we'd need to update a LOT of docs pages if there is going to be a "try it" button for each one of our rendered API pages that works correctly. But maybe I'm missing something. |
Thanks for the feedback, @datapythonista and @Dr-Irv!
This is answered here: #61061 (comment). No bandwidth consumption occurs on loading the Getting Started page in the docs, as the directive hides it behind a button that a user would need to interact with. Also, the terminal is shared between the website and the docs, as everything is hosted on the same domain at https://pandas.pydata.org/. If someone uses the one on the website, the assets are cached for the one on the docs, shall they use it (and vice versa).
Yes, I suggested this in the previous PR at #60758 (comment). If we are to incorporate this suggestion I'll need someone from the
Oh, the "Try it" button is enabled across the API reference pages (for all valid Numpydoc-based "Examples" sections for a public API method's docstring) through the |
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61061/ |
Removing the requested changes flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @agriyakhetarpal, my bad, I misunderstood what this was doing. I thought "try examples" meant only the terminal we've got in the getting started page, and clearly it didn't seem worth the added complexity and possible problems to me. Letting users run the API examples is something we wanted for a long time.
I created a new repo https://github.com/pandas-dev/pandas-jupyterlite so we can have the assets and the CI for the interactive terminal. I think it'll make your life easier, and also reduce the maintenance in this repo, which is always appreciated. I gave you and @jtpio permissions there.
To get a preview of the changes here you can run /preview
in a comment as above. The preview is not automatically updated, you'll have to rerun with the /preview
comment after making changes.
There is some magic that the examples do, this code needs to be executed in the session for the examples to work: https://github.com/pandas-dev/pandas/blob/main/doc/source/conf.py#L373
I think it'd be good that the warning about reporting bugs points to the new repo. I wouldn't be surprised if we get a decent amount of things not fully supported in wasm. You'll know better than me. It will also help users see if an error has been reported.
Just a comment for the future, some examples need these files, which I guess should be placed in the wasm container eventually: https://github.com/pandas-dev/pandas/tree/main/doc/data
Thanks for your work here, and sorry for the misunderstanding.
Thanks a lot for reconsidering your stance, and no worries about the misunderstanding! I accepted the invitation to collaborate on https://github.com/pandas-dev/pandas-jupyterlite yesterday. I'll initiate work on that repository in a moment, and I agree that issues should be directed towards it – will point to that in the warning text. I do have to note that this repository would still be limited for use only in the REPL on the Getting Started pages at the moment, and not for the API examples as they use a notebook instead of a REPL. Currently, all code snippets in Numpydoc-based examples sections are converted to notebooks and assigned UUIDs for filenames, and subsequently included as a part of the built docs. While we can get
Noted!
Yes, this is also something I think @jtpio would have some better context on. Being able to run an example as-is would require executing code as a part of the kernel, which means we need to pre-install pandas into the environment, which in-turn is a long-standing issue for downstream users: jupyterlite/pyodide-kernel#60. The Pyodide kernel doesn't support this, but the Xeus kernel does. Even if we switch to that kernel, we still need an I noted this above in my self-review at #61061 (review), and contemporary discussions for the same have taken place for this in NumPy and SciPy. The simpler thing would be for pandas to consider changing its policy and allow the examples to be self-contained. However, I lack the knowledge and prior involvement in previous discussions around this area, so I'm not sure about the reason why things are the way they are now.
Yes, this should be possible to include – I'll add them. I see that most of these CSV files are needed as a part of the notebooks in the User Guide and not in the API examples, making things easier. Footnotes
|
I assumed there is a generic wasm file for all examples, and that clicking on the try out would somehow pass the code to run to it. That's why I was proposing to use an external repo for it. If things are more complex I guess it's on to use whatever the sphinx plugin does. Good point about the data files. I was thinking they were mostly used for the read_whatever methods. They do in some cases, like here: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas.read_excel But surely worth to not include in the wasm the ones only used in the user guide if they won't be runnable. Finding a solution to run the imports before the examples is clearly a must. It's surely fine to not run them, but include them in an initial cell in the notebook so users themselves run it. |
pandas
in CI #57896; closes DOC: Add link to the "try pandas online" in the regular documentation #61060doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Description
This PR is the first step in a series of PRs to add WASM-based interactive documentation elements for the pandas documentation. Particularly, this relies on JupyterLite and jupyterlite-sphinx.
The interactive REPL added to the website has been incorporated to use the same JupyterLite deployment that
jupyterlite-sphinx
internally builds, so that we don't build a separate/duplicate deployment.The
TryExamples
directive from jupyterlite-sphinx has been added globally to all Numpydoc-processed docstrings with an "Examples" section via aglobal_enable_try_examples
configuration option. The buttons enabled by the directive have been styled accordingly.Next steps
The follow-up steps after this PR will be to:
pandas
available to use from the Pyodide distribution and the version of pandas that users are viewing the documentation for (most likely when we release Pyodide 0.28)See also
jupyterlite-sphinx
numpy/numpy#26745sphinx-gallery
: https://sphinx-gallery.github.io/stable/configuration.html#jupyterliteTryExamples
directive, used to enable API reference examples: https://jupyterlite-sphinx.readthedocs.io/en/latest/directives/try_examples.html