diff --git a/doc/python/ml-pca.md b/doc/python/ml-pca.md index cbd38e0fd9c..2160f671395 100644 --- a/doc/python/ml-pca.md +++ b/doc/python/ml-pca.md @@ -6,9 +6,9 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.14.1 + jupytext_version: 1.16.1 kernelspec: - display_name: Python 3 + display_name: Python 3 (ipykernel) language: python name: python3 language_info: @@ -20,7 +20,7 @@ jupyter: name: python nbconvert_exporter: python pygments_lexer: ipython3 - version: 3.8.8 + version: 3.10.11 plotly: description: Visualize Principle Component Analysis (PCA) of your high-dimensional data in Python with Plotly. @@ -105,17 +105,18 @@ fig.show() When you will have too many features to visualize, you might be interested in only visualizing the most relevant components. Those components often capture a majority of the [explained variance](https://en.wikipedia.org/wiki/Explained_variation), which is a good way to tell if those components are sufficient for modelling this dataset. -In the example below, our dataset contains 10 features, but we only select the first 4 components, since they explain over 99% of the total variance. +In the example below, our dataset contains 8 features, but we only select the first 2 components. ```python import pandas as pd import plotly.express as px from sklearn.decomposition import PCA -from sklearn.datasets import load_boston +from sklearn.datasets import fetch_california_housing -boston = load_boston() -df = pd.DataFrame(boston.data, columns=boston.feature_names) -n_components = 4 +housing = fetch_california_housing(as_frame=True) +df = housing.data + +n_components = 2 pca = PCA(n_components=n_components) components = pca.fit_transform(df) @@ -127,7 +128,7 @@ labels['color'] = 'Median Price' fig = px.scatter_matrix( components, - color=boston.target, + color=housing.target, dimensions=range(n_components), labels=labels, title=f'Total Explained Variance: {total_var:.2f}%', @@ -136,6 +137,10 @@ fig.update_traces(diagonal_visible=False) fig.show() ``` +```python +df +``` + ## PCA analysis in Dash [Dash](https://plotly.com/dash/) is the best way to build analytical apps in Python using Plotly figures. To run the app below, run `pip install dash`, click "Download" to get the code and run `python app.py`. diff --git a/doc/requirements.txt b/doc/requirements.txt index 3d1a83d96d5..b329e22271b 100644 --- a/doc/requirements.txt +++ b/doc/requirements.txt @@ -4,7 +4,7 @@ ipywidgets==7.7.2 jupyter-client<7 jupyter notebook -pandas==1.1.5 +pandas==1.2.0 statsmodels==0.12.1 scipy==1.5.4 patsy==0.5.1 diff --git a/packages/python/plotly/recipe/conda_build_config.yaml b/packages/python/plotly/recipe/conda_build_config.yaml index abcf8353472..86b425eca79 100644 --- a/packages/python/plotly/recipe/conda_build_config.yaml +++ b/packages/python/plotly/recipe/conda_build_config.yaml @@ -1,2 +1,2 @@ python: - - 3.6 + - 3.8