Name	Name	Last commit message	Last commit date
Latest commit History 2,980 Commits
buildscripts	buildscripts
docs	docs
examples	examples
generate_data	generate_data
parquet_reader	parquet_reader
sdc	sdc
.clang-format	.clang-format
.gitattributes	.gitattributes
.gitignore	.gitignore
.pep8speaks.yml	.pep8speaks.yml
.travis.yml	.travis.yml
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
LICENSE.md	LICENSE.md
MANIFEST.in	MANIFEST.in
README.rst	README.rst
azure-pipelines.yml	azure-pipelines.yml
requirements.txt	requirements.txt
setup.cfg	setup.cfg
setup.py	setup.py
versioneer.py	versioneer.py
vtune.py	vtune.py

Intel® Scalable Dataframe Compiler

Numba* Extension For Pandas* Operations Compilation

Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba* that enables compilation of Pandas* operations. It automatically vectorizes and parallelizes the code by leveraging modern hardware instructions and by utilizing all available cores.

Intel® SDC documentation can be found here.

Intel® SDC uses special Numba build based on 0.48.0 tag for build and run. Required Numba version can be installed from intel/label/beta channel from the Anaconda Cloud.

Note

For maximum performance and stability, please use numba from intel/label/beta channel.

Installing Binary Packages (conda and wheel)

Intel® SDC is available on the Anaconda Cloud intel/label/beta channel. Distribution includes Intel® SDC for Python 3.6 and Python 3.7 for Windows and Linux platforms.

Intel® SDC conda package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6>
> conda activate sdc-env
> conda install sdc -c intel/label/beta -c intel -c defaults -c conda-forge --override-channels

Intel® SDC wheel package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6> pip
> conda activate sdc-env
> pip install --index-url https://pypi.anaconda.org/intel/label/beta/simple --extra-index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple sdc

Building Intel® SDC from Source on Linux

We use Anaconda distribution of Python for setting up Intel® SDC build environment.

If you do not have conda, we recommend using Miniconda3:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH

Intel® SDC uses special Numba build based on 0.48.0 tag for build and run. Required Numba version can be installed from intel/label/beta channel from the Anaconda Cloud.

Note

For maximum performance and stability, please use numba from intel/label/beta channel.

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Linux.

Building on Linux with conda-build

PYVER=<3.6 or 3.7>
NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env python=$PYVER conda-build
source activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python $PYVER --numpy $NUMPYVER --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels buildscripts/sdc-conda-recipe

Building on Linux with setuptools

PYVER=<3.6 or 3.7>
NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER numba=0.48.0 pandas=0.25.3 pyarrow=0.15.1 gcc_linux-64 gxx_linux-64
source activate sdc-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

In case of issues, reinstalling in a new conda environment is recommended.

Building Intel® SDC from Source on Windows

Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):

Install Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)).
Install Miniconda for Windows.
Start 'Anaconda prompt'

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Windows.

Building on Windows with conda-build

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env -q -y python=%PYVER% conda-build conda-verify vc vs2015_runtime vs2015_win-64
conda activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python %PYVER% --numpy %NUMPYVER% --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels buildscripts\sdc-conda-recipe

Building on Windows with setuptools

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% numba=0.48.0 pandas=0.25.3 pyarrow=0.15.1
conda activate sdc-env
set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include
set LIB=%LIB%;%CONDA_PREFIX%\Library\lib
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

Troubleshooting Windows Build

If the cl compiler throws the error fatal error LNK1158: cannot run 'rc.exe', add Windows Kits to your PATH (e.g. C:\Program Files (x86)\Windows Kits\8.0\bin\x86).
Some errors can be mitigated by set DISTUTILS_USE_SDK=1.
For setting up Visual Studio, one might need go to registry at HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\VisualStudio\SxS\VS7, and add a string value named 14.0 whose data is C:\Program Files (x86)\Microsoft Visual Studio 14.0\.
Sometimes if the conda version or visual studio version being used are not latest then building Intel® SDC can throw some vague error about a keyword used in a file. So make sure you are using the latest versions.

Building documentation

Building Intel® SDC User's Guide documentation requires pre-installed Intel® SDC package along with compatible Pandas* version as well as Sphinx* 2.2.1 or later.

Intel® SDC documentation includes Intel® SDC examples output which is pasted to functions description in the API Reference.

Use pip to install Sphinx* and extensions:

pip install sphinx sphinxcontrib-programoutput

Currently the build precedure is based on make located at ./sdc/docs/ folder. While it is not generally required we recommended that you clean up the system from previous documentaiton build by running:

make clean

To build HTML documentation you will need to run:

make html

The built documentation will be located in the ./sdc/docs/build/html directory. To preview the documentation open index.html file.

Sphinx* Generation Internals

The documentation generation is controlled by conf.py script automatically invoked by Sphinx*. See Sphinx documentation for details.

The API Reference for Intel® SDC User's Guide is auto-generated by inspecting pandas and sdc modules. That's why these modules must be pre-installed for documentation generation using Sphinx*. However, there is a possibility to skip API Reference auto-generation by setting environment variable SDC_DOC_NO_API_REF_STR=1.

If the environment variable SDC_DOC_NO_API_REF_STR is unset then Sphinx's conf.py invokes generate_api_reference() function located in ./sdc/docs/source/buildscripts/apiref_generator module. This function parses pandas and sdc docstrings for each API, combines those into single docstring and writes it into RST file with respective Pandas* API name. The auto-generated RST files are located at ./sdc/docs/source/_api_ref directory.

Note

Sphinx* will automatically clean the _api_ref directory on the next invocation of the documenation build.

Intel® SDC docstring decoration rules

Since Intel® SDC API Reference is auto-generated from respective Pandas* and Intel® SDC docstrings there are certain rules that must be followed to accurately generate the API description.

Every Intel® SDC API must have the docstring.

If developer does not provide the docstring then Sphinx* will not be able to match Pandas* docstring with respective SDC one. In this situation Sphinx* assumes that SDC does not support such API and will include respective note in the API Reference that This API is currently unsupported.
Follow 'one function - one docstring' rule.

You cannot have one docstring for multiple APIs, even if those are very similar. Auto-generator assumes every SDC API is covered by respective docstring. If Sphinx* does not find the docstring for particular API then it assumes that SDC does not support such API and will include respective note in the API Reference that This API is currently unsupported.
Description (introductory section, the very first few paragraphs without a title) is taken from Pandas*.

Intel® SDC developers should not include API description in SDC docstring. But developers are encouraged to follow Pandas API description naming conventions so that the combined docstring appears consistent.
Parameters, Returns, and Raises sections' description is taken from Pandas* docstring.

Intel® SDC developers should not include such descriptions in their SDC docstrings. Rather developers are encouraged to follow Pandas naming conventions so that the combined docstring appears consistent.

Every SDC docstring must be of the follwing structure:

"""
Intel Scalable Dataframe Compiler User Guide
********************************************
Pandas API: <full pandas name, e.g. pandas.Series.nlargest>

<Intel® SDC specific sections>

Intel Scalable Dataframe Compiler Developer Guide
*************************************************
<Developer's Guide specific sections>
"""

The first two lines must be the User Guide header. This is an indication to Sphinx* that this section is intended for public API and it will be combined with repsective Pandas API docstring.

Line 3 must specify what Pandas API this Intel® SDC docstring does correspond to. It must start with Pandas API: followed by full Pandas API name that corresponds to this SDC docstring. Remember to include full name, for example, nlargest is not sufficient for auto-generator to perform the match. The full name must be pandas.Series.nlargest.

After User Guide sections in the docstring there can be another header indicating that the remaining part of the docstring belongs to Developer's Guide and must not be included into User's Guide.

Examples, See Also, References sections are NOT taken from Pandas docstring. SDC developers are expected to complete these sections in SDC doctrings.

This is so because respective Pandas sections are sometimes too Pandas specific and are not relevant to SDC. SDC developers have to rewrite those sections in Intel® SDC style. Do not forget about User Guide header and Pandas API name prior to adding SDC specific sections.
Examples section is mandatory for every SDC API. 'One API - at least one example' rule is applied.

Examples are essential part of user experience and must accompany every API docstring.
Embed examples into Examples section from ./sdc/examples.
Rather than writing example in the docstring (which is error-prone) embed relevant example scripts into the docstring. For example, here is an example how to embed example for pandas.Series.get() function into respective Intel® SDC docstring:
```
"""
...
Examples
--------
.. literalinclude:: ../../../examples/series_getitem.py
   :language: python
   :lines: 27-
   :caption: Getting Pandas Series elements
   :name: ex_series_getitem

.. code-block:: console

    > python ./series_getitem.py
    55
```
In the above snapshot the script series_getitem.py is embedded into the docstring. :lines: 27- allows to skip lengthy copyright header of the file. :caption: provides meaningful description of the example. It is a good tone to have the caption for every example. :name: is the Sphinx* name that allows referencing example from other parts of the documentation. It is a good tone to include this field. Please follow the naming convention ex_<example file name> for consistency.

Accompany every example with the expected output using .. code-block:: console decorator.

Every Examples section must come with one or more examples illustrating all major variations of supported API parameter combinations. It is highly recommended to illustrate SDC API limitations (e.g. unsupported parameters) in example script comments.
See Also sections are highly encouraged.

This is a good practice to include relevant references into the See Also section. Embedding references which are not directly related to the topic may be distructing if those appear across API description. A good style is to have a dedicated section for relevant topics.

See Also section may include references to relevant SDC and Pandas as well as to external topics.

A special form of See Also section is References to publications. Pandas documentation sometimes uses References section to refer to external projects. While it is not prohibited to use References section in SDC docstrings, it is better to combine all references under See Also umbrella.
Notes and Warnings must be decorated with .. note:: and .. warning:: respectively. Do not use
```
Notes
-----

Warning
-------
```
Pay attention to indentation and required blank lines. Sphinx* is very sensitive to that.
If SDC API does not support all variations of respective Pandas API then Limitations section is mandatory. While there is not specific guideline how Limitations section must be written, a good style is to follow Pandas Parameters section description style and naming conventions.
Before committing your code for public SDC API you are expected to:
- have SDC docstring implemented;
- have respective SDC examples implemented and tested
- API Reference documentation generated and visually inspected. New warnings in the documentation build are not allowed.

Running unit tests

python sdc/tests/gen_test_data.py
python -m unittest

References

Intel® SDC follows ideas and initial code base of High-Performance Analytics Toolkit (HPAT). These academic papers describe ideas and methods behind HPAT:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intel® Scalable Dataframe Compiler

Numba* Extension For Pandas* Operations Compilation

Installing Binary Packages (conda and wheel)

Building Intel® SDC from Source on Linux

Building on Linux with conda-build

Building on Linux with setuptools

Building Intel® SDC from Source on Windows

Building on Windows with conda-build

Building on Windows with setuptools

Troubleshooting Windows Build

Building documentation

Sphinx* Generation Internals

Intel® SDC docstring decoration rules

Running unit tests

References

About

Uh oh!

Releases

Packages

Languages

License

1e-to/sdc

Folders and files

Latest commit

History

Repository files navigation

Intel® Scalable Dataframe Compiler

Numba* Extension For Pandas* Operations Compilation

Installing Binary Packages (conda and wheel)

Building Intel® SDC from Source on Linux

Building on Linux with conda-build

Building on Linux with setuptools

Building Intel® SDC from Source on Windows

Building on Windows with conda-build

Building on Windows with setuptools

Troubleshooting Windows Build

Building documentation

Sphinx* Generation Internals

Intel® SDC docstring decoration rules

Running unit tests

References

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages