Skip to content

Commit 013ae3d

Browse files
datapythonistaTomAugspurger
authored andcommitted
DOC: Final reorganization of documentation pages (#24890)
* DOC: Final reorganization of documentation pages * Move ecosystem to top level
1 parent bb86a9d commit 013ae3d

15 files changed

+103
-204
lines changed

doc/redirects.csv

+5
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ release,whatsnew/index
88
# getting started
99
10min,getting_started/10min
1010
basics,getting_started/basics
11+
comparison_with_r,getting_started/comparison/comparison_with_r
12+
comparison_with_sql,getting_started/comparison/comparison_with_sql
13+
comparison_with_sas,getting_started/comparison/comparison_with_sas
14+
comparison_with_stata,getting_started/comparison/comparison_with_stata
1115
dsintro,getting_started/dsintro
1216
overview,getting_started/overview
1317
tutorials,getting_started/tutorials
@@ -16,6 +20,7 @@ tutorials,getting_started/tutorials
1620
advanced,user_guide/advanced
1721
categorical,user_guide/categorical
1822
computation,user_guide/computation
23+
cookbook,user_guide/cookbook
1924
enhancingperf,user_guide/enhancingperf
2025
gotchas,user_guide/gotchas
2126
groupby,user_guide/groupby
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{{ header }}
2+
3+
.. _comparison:
4+
5+
===========================
6+
Comparison with other tools
7+
===========================
8+
9+
.. toctree::
10+
:maxdepth: 2
11+
12+
comparison_with_r
13+
comparison_with_sql
14+
comparison_with_sas
15+
comparison_with_stata

doc/source/getting_started/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ Getting started
1313
10min
1414
basics
1515
dsintro
16+
comparison/index
1617
tutorials

doc/source/getting_started/overview.rst

+74-19
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,80 @@
66
Package overview
77
****************
88

9-
:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
10-
easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
11-
programming language.
12-
13-
:mod:`pandas` consists of the following elements:
14-
15-
* A set of labeled array data structures, the primary of which are
16-
Series and DataFrame.
17-
* Index objects enabling both simple axis indexing and multi-level /
18-
hierarchical axis indexing.
19-
* An integrated group by engine for aggregating and transforming data sets.
20-
* Date range generation (date_range) and custom date offsets enabling the
21-
implementation of customized frequencies.
22-
* Input/Output tools: loading tabular data from flat files (CSV, delimited,
23-
Excel 2003), and saving and loading pandas objects from the fast and
24-
efficient PyTables/HDF5 format.
25-
* Memory-efficient "sparse" versions of the standard data structures for storing
26-
data that is mostly missing or mostly constant (some fixed value).
27-
* Moving window statistics (rolling mean, rolling standard deviation, etc.).
9+
**pandas** is a `Python <https://www.python.org>`__ package providing fast,
10+
flexible, and expressive data structures designed to make working with
11+
"relational" or "labeled" data both easy and intuitive. It aims to be the
12+
fundamental high-level building block for doing practical, **real world** data
13+
analysis in Python. Additionally, it has the broader goal of becoming **the
14+
most powerful and flexible open source data analysis / manipulation tool
15+
available in any language**. It is already well on its way toward this goal.
16+
17+
pandas is well suited for many different kinds of data:
18+
19+
- Tabular data with heterogeneously-typed columns, as in an SQL table or
20+
Excel spreadsheet
21+
- Ordered and unordered (not necessarily fixed-frequency) time series data.
22+
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
23+
column labels
24+
- Any other form of observational / statistical data sets. The data actually
25+
need not be labeled at all to be placed into a pandas data structure
26+
27+
The two primary data structures of pandas, :class:`Series` (1-dimensional)
28+
and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
29+
cases in finance, statistics, social science, and many areas of
30+
engineering. For R users, :class:`DataFrame` provides everything that R's
31+
``data.frame`` provides and much more. pandas is built on top of `NumPy
32+
<https://www.numpy.org>`__ and is intended to integrate well within a scientific
33+
computing environment with many other 3rd party libraries.
34+
35+
Here are just a few of the things that pandas does well:
36+
37+
- Easy handling of **missing data** (represented as NaN) in floating point as
38+
well as non-floating point data
39+
- Size mutability: columns can be **inserted and deleted** from DataFrame and
40+
higher dimensional objects
41+
- Automatic and explicit **data alignment**: objects can be explicitly
42+
aligned to a set of labels, or the user can simply ignore the labels and
43+
let `Series`, `DataFrame`, etc. automatically align the data for you in
44+
computations
45+
- Powerful, flexible **group by** functionality to perform
46+
split-apply-combine operations on data sets, for both aggregating and
47+
transforming data
48+
- Make it **easy to convert** ragged, differently-indexed data in other
49+
Python and NumPy data structures into DataFrame objects
50+
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
51+
of large data sets
52+
- Intuitive **merging** and **joining** data sets
53+
- Flexible **reshaping** and pivoting of data sets
54+
- **Hierarchical** labeling of axes (possible to have multiple labels per
55+
tick)
56+
- Robust IO tools for loading data from **flat files** (CSV and delimited),
57+
Excel files, databases, and saving / loading data from the ultrafast **HDF5
58+
format**
59+
- **Time series**-specific functionality: date range generation and frequency
60+
conversion, moving window statistics, moving window linear regressions,
61+
date shifting and lagging, etc.
62+
63+
Many of these principles are here to address the shortcomings frequently
64+
experienced using other languages / scientific research environments. For data
65+
scientists, working with data is typically divided into multiple stages:
66+
munging and cleaning data, analyzing / modeling it, then organizing the results
67+
of the analysis into a form suitable for plotting or tabular display. pandas
68+
is the ideal tool for all of these tasks.
69+
70+
Some other notes
71+
72+
- pandas is **fast**. Many of the low-level algorithmic bits have been
73+
extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
74+
anything else generalization usually sacrifices performance. So if you focus
75+
on one feature for your application you may be able to create a faster
76+
specialized tool.
77+
78+
- pandas is a dependency of `statsmodels
79+
<https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
80+
statistical computing ecosystem in Python.
81+
82+
- pandas has been used extensively in production in financial applications.
2883

2984
Data Structures
3085
---------------

doc/source/index.rst.template

+6-90
Original file line numberDiff line numberDiff line change
@@ -22,93 +22,15 @@ pandas: powerful Python data analysis toolkit
2222

2323
**Developer Mailing List:** https://groups.google.com/forum/#!forum/pydata
2424

25-
**pandas** is a `Python <https://www.python.org>`__ package providing fast,
26-
flexible, and expressive data structures designed to make working with
27-
"relational" or "labeled" data both easy and intuitive. It aims to be the
28-
fundamental high-level building block for doing practical, **real world** data
29-
analysis in Python. Additionally, it has the broader goal of becoming **the
30-
most powerful and flexible open source data analysis / manipulation tool
31-
available in any language**. It is already well on its way toward this goal.
32-
33-
pandas is well suited for many different kinds of data:
34-
35-
- Tabular data with heterogeneously-typed columns, as in an SQL table or
36-
Excel spreadsheet
37-
- Ordered and unordered (not necessarily fixed-frequency) time series data.
38-
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
39-
column labels
40-
- Any other form of observational / statistical data sets. The data actually
41-
need not be labeled at all to be placed into a pandas data structure
42-
43-
The two primary data structures of pandas, :class:`Series` (1-dimensional)
44-
and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
45-
cases in finance, statistics, social science, and many areas of
46-
engineering. For R users, :class:`DataFrame` provides everything that R's
47-
``data.frame`` provides and much more. pandas is built on top of `NumPy
48-
<https://www.numpy.org>`__ and is intended to integrate well within a scientific
49-
computing environment with many other 3rd party libraries.
50-
51-
Here are just a few of the things that pandas does well:
52-
53-
- Easy handling of **missing data** (represented as NaN) in floating point as
54-
well as non-floating point data
55-
- Size mutability: columns can be **inserted and deleted** from DataFrame and
56-
higher dimensional objects
57-
- Automatic and explicit **data alignment**: objects can be explicitly
58-
aligned to a set of labels, or the user can simply ignore the labels and
59-
let `Series`, `DataFrame`, etc. automatically align the data for you in
60-
computations
61-
- Powerful, flexible **group by** functionality to perform
62-
split-apply-combine operations on data sets, for both aggregating and
63-
transforming data
64-
- Make it **easy to convert** ragged, differently-indexed data in other
65-
Python and NumPy data structures into DataFrame objects
66-
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
67-
of large data sets
68-
- Intuitive **merging** and **joining** data sets
69-
- Flexible **reshaping** and pivoting of data sets
70-
- **Hierarchical** labeling of axes (possible to have multiple labels per
71-
tick)
72-
- Robust IO tools for loading data from **flat files** (CSV and delimited),
73-
Excel files, databases, and saving / loading data from the ultrafast **HDF5
74-
format**
75-
- **Time series**-specific functionality: date range generation and frequency
76-
conversion, moving window statistics, moving window linear regressions,
77-
date shifting and lagging, etc.
78-
79-
Many of these principles are here to address the shortcomings frequently
80-
experienced using other languages / scientific research environments. For data
81-
scientists, working with data is typically divided into multiple stages:
82-
munging and cleaning data, analyzing / modeling it, then organizing the results
83-
of the analysis into a form suitable for plotting or tabular display. pandas
84-
is the ideal tool for all of these tasks.
85-
86-
Some other notes
87-
88-
- pandas is **fast**. Many of the low-level algorithmic bits have been
89-
extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
90-
anything else generalization usually sacrifices performance. So if you focus
91-
on one feature for your application you may be able to create a faster
92-
specialized tool.
93-
94-
- pandas is a dependency of `statsmodels
95-
<https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
96-
statistical computing ecosystem in Python.
97-
98-
- pandas has been used extensively in production in financial applications.
99-
100-
.. note::
101-
102-
This documentation assumes general familiarity with NumPy. If you haven't
103-
used NumPy much or at all, do invest some time in `learning about NumPy
104-
<https://docs.scipy.org>`__ first.
105-
106-
See the package overview for more detail about what's in the library.
25+
:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
26+
easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
27+
programming language.
10728

29+
See the :ref:`overview` for more detail about what's in the library.
10830

10931
{% if single_doc and single_doc.endswith('.rst') -%}
11032
.. toctree::
111-
:maxdepth: 4
33+
:maxdepth: 2
11234

11335
{{ single_doc[:-4] }}
11436
{% elif single_doc %}
@@ -118,21 +40,15 @@ See the package overview for more detail about what's in the library.
11840
{{ single_doc }}
11941
{% else -%}
12042
.. toctree::
121-
:maxdepth: 4
43+
:maxdepth: 2
12244
{% endif %}
12345

12446
{% if not single_doc -%}
12547
What's New <whatsnew/v0.24.0>
12648
install
12749
getting_started/index
128-
cookbook
12950
user_guide/index
130-
r_interface
13151
ecosystem
132-
comparison_with_r
133-
comparison_with_sql
134-
comparison_with_sas
135-
comparison_with_stata
13652
{% endif -%}
13753
{% if include_api -%}
13854
api/index

doc/source/r_interface.rst

-94
This file was deleted.
File renamed without changes.

doc/source/user_guide/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,4 @@ Further information on any specific method can be obtained in the
3737
enhancingperf
3838
sparse
3939
gotchas
40+
cookbook

doc/source/user_guide/style.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -1133,7 +1133,7 @@
11331133
"metadata": {},
11341134
"outputs": [],
11351135
"source": [
1136-
"with open(\"template_structure.html\") as f:\n",
1136+
"with open(\"templates/template_structure.html\") as f:\n",
11371137
" structure = f.read()\n",
11381138
" \n",
11391139
"HTML(structure)"
File renamed without changes.

0 commit comments

Comments
 (0)