Curated list of data science software in Python
- scikit-learn compatible (or inspired) API
- pandas compatible or based on
- Theano based project
- TensorFlow based project
- PyTorch based project
- CuPy based project
- R inspired/ported lib
- MXNet based project
- Apache Spark based project
- GPU-accelerated computations (if not based on Theano, Tensorflow, PyTorch, CuPy etc.)
- possible to run on AMD GPU
- Machine Learning
- Deep Learning
- Data manipulation
- Feature engineering
- Visualization
- Model explanation
- Reinforcement Learning
- Distributed computing systems
- Probabilistic methods
- Genetic Programming
- Optimization
- Natural Language Processing
- Computer Audition
- Computer Vision
- Statistics
- Experiments tools
- Evaluation
- Computations
- Spatial analysis
- Quantum computing
- Conversion
- scikit-learn
- machine learning in Python
- Shogun - machine learning toolbox
- xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package
- Reproducible Experiment Platform (REP)
- Machine Learning toolbox for Humans
- modAL
- a modular active learning framework for Python3
- Sparkit-learn
- PySpark + Scikit-learn = Sparkit-learn
- mlpack - a scalable C++ machine learning library (Python bindings)
- dlib - A toolkit for making real world machine learning and data analysis applications in C++ (Python bindings)
- MLxtend
- extension and helper modules for Python's data analysis and machine learning libraries
- scikit-multilearn
- multi-label classification for python
- seqlearn
- seqlearn is a sequence classification toolkit for Python
- pystruct
- Simple structured learning framework for python
- sklearn-expertsys
- Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
- RuleFit
- implementation of the rulefit
- metric-learn
- metric learning algorithms in Python
- pyGAM - Generalized Additive Models in Python
- Other...
- tslearn
- machine learning toolkit dedicated to time-series data
- tick
- module for statistical learning, with a particular emphasis on time-dependent modelling
- Prophet - Automatic Forecasting Procedure
- PyFlux - Open source time series library for Python
- bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models
- luminol - Anomaly Detection and Correlation library
- TPOT
- Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming
- auto-sklearn
- is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
- MLBox - a powerful Automated Machine Learning python library.
- ML-Ensemble
- high performance ensemble learning
- Stacking
- Simple and useful stacking library, written in Python.
- stacked_generalization
- library for machine learning stacking generalization.
- vecstack
- Python package for stacking (machine learning technique)
- imbalanced-learn
- module to perform under sampling and over sampling with various techniques
- imbalanced-algorithms
- Python-based implementations of algorithms for learning on imbalanced data.
- rpforest
- a forest of random projection trees
- Random Forest Clustering
- Unsupervised Clustering using Random Forests
- sklearn-random-bits-forest
- wrapper of the Random Bits Forest program written by (Wang et al., 2016)
- rgf_python
- Python Wrapper of Regularized Greedy Forest
- Python-ELM
- Extreme Learning Machine implementation in Python
- Python Extreme Learning Machine (ELM) - a machine learning technique used for classification/regression tasks
- hpelm
- High performance implementation of Extreme Learning Machines (fast randomized neural networks).
- pyFM
- Factorization machines in python
- fastFM
- a library for Factorization Machines
- tffm
- TensorFlow implementation of an arbitrary order Factorization Machine
- liquidSVM - an implementation of SVMs
- scikit-rvm
- Relevance Vector Machine implementation using the scikit-learn API
- ThunderSVM
- a fast SVM Library on GPUs and CPUs
- XGBoost
- Scalable, Portable and Distributed Gradient Boosting
- LightGBM
- a fast, distributed, high performance gradient boosting by Microsoft
- CatBoost
- an open-source gradient boosting on decision trees library by Yandex
- ThunderGBM
- Fast GBDTs and Random Forests on GPUs
- Other...
- Keras - a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
- keras-contrib - Keras community contributions
- Hyperas - Keras + Hyperopt: A very simple wrapper for convenient hyperparameter
- Elephas - Distributed Deep learning with Keras & Spark
- Hera - Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
- dist-keras
- Distributed Deep Learning, with a focus on distributed training
- Conx - The On-Ramp to Deep Learning
- Spektral - deep learning on graphs
- Keras add-ons...
- TensorFlow
- omputation using data flow graphs for scalable machine learning by Google
- TensorLayer
- Deep Learning and Reinforcement Learning Library for Researcher and Engineer.
- TFLearn
- Deep learning library featuring a higher-level API for TensorFlow
- Sonnet
- TensorFlow-based neural network library by DeepMind
- TensorForce
- a TensorFlow library for applied reinforcement learning
- tensorpack
- a Neural Net Training Interface on TensorFlow
- Polyaxon
- a platform that helps you build, manage and monitor deep learning models
- NeuPy
- NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously:
)
- Horovod
- Distributed training framework for TensorFlow
- tfdeploy
- Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy
- hiptensorflow
- ROCm/HIP enabled Tensorflow
- TensorFlow Fold
- Deep learning with dynamic computation graphs in TensorFlow
- tensorlm
- wrapper library for text generation / language models at char and word level with RNN
- TensorLight
- a high-level framework for TensorFlow
- Mesh TensorFlow
- Model Parallelism Made Easier
- Ludwig
- a toolbox, that allows to train and test deep learning models without the need to write code.
WARNING: Theano development has been stopped
- Theano
- is a Python library that allows you to define, optimize, and evaluate mathematical expressions
- Lasagne
- Lightweight library to build and train neural networks in Theano Lasagne add-ons...
- nolearn
- scikit-learn compatible neural network library (mainly for Lasagne)
- Blocks
- a Theano framework for building and training neural networks
- platoon
- Multi-GPU mini-framework for Theano
- scikit-neuralnetwork
- Deep neural networks without the learning cliff
- Theano-MPI
- MPI Parallel framework for training deep learning models built in Theano
- PyTorch
- Tensors and Dynamic neural networks in Python with strong GPU acceleration
- torchvision
- Datasets, Transforms and Models specific to Computer Vision
- torchtext
- Data loaders and abstractions for text and NLP
- torchaudio
- an audio library for PyTorch
- ignite
- high-level library to help with training neural networks in PyTorch
- PyToune - a Keras-like framework and utilities for PyTorch
- skorch
- a scikit-learn compatible neural network library that wraps pytorch
- PyTorchNet
- an abstraction to train neural networks
- Aorun
- intend to implement an API similar to Keras with PyTorch as backend.
- pytorch_geometric
- Geometric Deep Learning Extension Library for PyTorch
- MXNet
- Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler
- Gluon
- a clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
- MXbox
- simple, efficient and flexible vision toolbox for mxnet framework.
- gluon-cv
- provides implementations of the state-of-the-art deep learning models in computer vision.
- gluon-nlp
- NLP made easy
- Xfer
- Transfer Learning library for Deep Neural Networks
- MXNet
- HIP Port of MXNet
- Caffe - a fast open framework for deep learning
- Caffe2 - a lightweight, modular, and scalable deep learning framework
- hipCaffe
- the HIP port of Caffe
- Chainer - a flexible framework for neural networks
- ChainerRL - a deep reinforcement learning library built on top of Chainer.
- ChainerCV - a Library for Deep Learning in Computer Vision
- ChainerMN - scalable distributed deep learning with Chainer
- scikit-chainer
- scikit-learn like interface to chainer
- chainer_sklearn
- Sklearn (Scikit-learn) like interface for Chainer
- CNTK - Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
- Neon - Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
- Tangent - Source-to-Source Debuggable Derivatives in Pure Python
- autograd - Efficiently computes derivatives of numpy code
- Myia - deep learning framework (pre-alpha)
- nnabla - Neural Network Libraries by Sony
- pandas - powerful Python data analysis toolkit
- blaze
- NumPy and Pandas interface to Big Data
- pandasql
- allows you to query pandas DataFrames using SQL syntax
- pandas-gbq
- Pandas Google Big Query
- xpandas - universal 1d/2d data containers with Transformers functionality for data analysis by The Alan Turing Institute
- pysparkling
- a pure Python implementation of Apache Spark's RDD and DStream interfaces
- Arctic - high performance datastore for time series and tick data
- datatable
- data.table for Python
- koalas
- pandas API on Apache Spark
- Fuel - data pipeline framework for machine learning
- pdpipe - sasy pipelines for pandas DataFrames.
- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy and Pytorch
- meza - a Python toolkit for processing tabular data
- pandas-ply
- functional data manipulation for pandas
- Dplython
- Dplyr for Python
- sklearn-pandas
- Pandas integration with sklearn
- quinn
- pyspark methods to enhance developer productivity
- Dataset - helps you conveniently work with random or sequential batches of your data and define data processing
- swifter - a package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
- pyjanitor
- Clean APIs for data cleaning
- modin
- speed up your Pandas workflows by changing a single line of code
- Prodmodel - build system for data science pipelines
- Featuretools - automated feature engineering
- skl-groups
- scikit-learn addon to operate on set/"group"-based features
- Feature Forge
- a set of tools for creating and testing machine learning feature
- few
- a feature engineering wrapper for sklearn
- scikit-mdr
- a sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
- tsfresh
- Automatic extraction of relevant features from time series
- scikit-feature - feature selection repository in python
- boruta_py
- implementations of the Boruta all-relevant feature selection method
- BoostARoota
- a fast xgboost feature selection algorithm
- scikit-rebate
- a scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning
- Alibi - Algorithms for monitoring and explaining machine learning models
- Auralisation - auralisation of learned features in CNN (for audio)
- CapsNet-Visualization - a visualization of the CapsNet layers to better understand how it works
- lucid - a collection of infrastructure and tools for research in neural network interpretability.
- Netron - visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
- FlashLight - visualization Tool for your NeuralNetwork
- tensorboard-pytorch - tensorboard for pytorch (and chainer, mxnet, numpy, ...)
- anchor - code for "High-Precision Model-Agnostic Explanations" paper
- aequitas - Bias and Fairness Audit Toolkit
- Contrastive Explanation
- Contrastive Explanation (Foil Trees)
- yellowbrick
- visual analysis and diagnostic tools to facilitate machine learning model selection
- scikit-plot
- an intuitive library to add plotting functionality to scikit-learn objects
- shap
- a unified approach to explain the output of any machine learning model
- ELI5 - a library for debugging/inspecting machine learning classifiers and explaining their predictions
- Lime
- Explaining the predictions of any machine learning classifier
- FairML
- FairML is a python toolbox auditing the machine learning models for bias
- L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
- PDPbox - partial dependence plot toolbox
- pyBreakDown
- Python implementation of R package breakDown
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox
- Skater - Python Library for Model Interpretation
- model-analysis
- Model analysis tools for TensorFlow
- themis-ml
- a library that implements fairness-aware machine learning algorithms
- treeinterpreter
-interpreting scikit-learn's decision tree and random forest predictions
- mxboard - Logging MXNet data for visualization in TensorBoard
- OpenAI Gym - a toolkit for developing and comparing reinforcement learning algorithms.
- PySpark
- exposes the Spark programming model to Python
- Veles - Distributed machine learning platform by Samsung
- Jubatus - Framework and Library for Distributed Online Machine Learning
- DMTK - Microsoft Distributed Machine Learning Toolkit
- PaddlePaddle - PArallel Distributed Deep LEarning by Baidu
- dask-ml
- Distributed and parallel machine learning
- Distributed - Distributed computation in Python
- pomegranate
- probabilistic and graphical models for Python
- pyro
- a flexible, scalable deep probabilistic programming library built on PyTorch.
- ZhuSuan
- Bayesian Deep Learning
- PyMC - Bayesian Stochastic Modelling in Python
- PyMC3
- Python package for Bayesian statistical modeling and Probabilistic Machine Learning
- sampled - Decorator for reusable models in PyMC3
- Edward
- A library for probabilistic modeling, inference, and criticism.
- InferPy
- Deep Probabilistic Modelling Made Easy
- GPflow
- Gaussian processes in TensorFlow
- PyStan - Bayesian inference using the No-U-Turn sampler (Python interface)
- gelato
- Bayesian dessert for Lasagne
- sklearn-bayes
- Python package for Bayesian Machine Learning with scikit-learn API
- skggm
- estimation of general graphical models
- pgmpy - a python library for working with Probabilistic Graphical Models.
- skpro
- supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute
- Aboleth
- a bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation
- PtStat
- Probabilistic Programming and Statistical Inference in PyTorch
- PyVarInf
- Bayesian Deep Learning methods with Variational Inference for PyTorch
- emcee - The Python ensemble sampling toolkit for affine-invariant MCMC
- hsmmlearn - a library for hidden semi-Markov models with explicit durations
- pyhsmm - bayesian inference in HSMMs and HMMs
- GPyTorch
- a highly efficient and modular implementation of Gaussian Processes in PyTorch
- MXFusion
- Modular Probabilistic Programming on MXNet
- sklearn-crfsuite
- scikit-learn inspired API for CRFsuite
- gplearn
- Genetic Programming in Python
- DEAP - Distributed Evolutionary Algorithms in Python
- karoo_gp
- A Genetic Programming platform for Python with GPU support
- monkeys - A strongly-typed genetic programming framework for Python
- sklearn-genetic
- Genetic feature selection module for scikit-learn
- Spearmint - Bayesian optimization
- BoTorch
- Bayesian optimization in PyTorch
- SMAC3 - Sequential Model-based Algorithm Configuration
- Optunity - is a library containing various optimizers for hyperparameter tuning.
- hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python
- hyperopt-sklearn
- hyper-parameter optimization for sklearn
- sklearn-deap
- use evolutionary algorithms instead of gridsearch in scikit-learn
- sigopt_sklearn
- SigOpt wrappers for scikit-learn methods
- Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
- SafeOpt - Safe Bayesian Optimization
- scikit-optimize - Sequential model-based optimization with a
scipy.optimize
interface - Solid - A comprehensive gradient-free optimization framework written in Python
- PySwarms - A research toolkit for particle swarm optimization in Python
- Platypus - A Free and Open Source Python Library for Multiobjective Optimization
- GPflowOpt
- Bayesian Optimization using GPflow
- POT - Python Optimal Transport library
- Talos - Hyperparameter Optimization for Keras Models
- nlopt - library for nonlinear optimization (global and local, constrained or unconstrained)
- NLTK - modules, data sets, and tutorials supporting research and development in Natural Language Processing
- CLTK - The Classical Language Toolkik
- gensim - Topic Modelling for Humans
- PSI-Toolkit - a natural language processing toolkit by Adam Mickiewicz University in Poznań
- pyMorfologik - Python binding for Morfologik (Polish morphological analyzer)
- skift
- scikit-learn wrappers for Python fastText.
- Phonemizer - Simple text to phonemes converter for multiple languages
- flair - very simple framework for state-of-the-art NLP by Zalando Research
- librosa - Python library for audio and music analysis
- Yaafe - Audio features extraction
- aubio - a library for audio and music analysis
- Essentia - library for audio and music analysis, description and synthesis
- LibXtract - is a simple, portable, lightweight library of audio feature extraction functions
- Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals
- muda - a library for augmenting annotated audio data
- madmom - Python audio and music signal processing library
- OpenCV - Open Source Computer Vision Library
- scikit-image - Image Processing SciKit (Toolbox for SciPy)
- imgaug - image augmentation for machine learning experiments
- imgaug_extension - additional augmentations for imgaug
- Augmentor - Image augmentation library in Python for machine learning
- albumentations - fast image augmentation library and easy to use wrapper around other libraries
- pandas_summary
- extension to pandas dataframes describe function
- Pandas Profiling
- Create HTML profiling reports from pandas DataFrame objects
- statsmodels - statistical modeling and econometrics in Python
- stockstats - Supply a wrapper
StockDataFrame
based on thepandas.DataFrame
with inline stock statistics/indicators support. - simplestatistics - simple statistical functions implemented in readable Python.
- weightedcalcs - pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more
- scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests
- pysie - provides python implementation of statistical inference engine
- Sacred - a tool to help you configure, organize, log and reproduce experiments by IDSIA
- Xcessiv - a web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling
- Persimmon
- A visual dataflow programming language for sklearn
- Ax - Adaptive Experimentation Platform
- Matplotlib - plotting with Python
- seaborn - statistical data visualization using matplotlib
- Bokeh - Interactive Web Plotting for Python
- HoloViews - stop plotting your data - annotate your data and let it visualize itself
- Alphalens - performance analysis of predictive (alpha) stock factors by Quantopian
- python-ternary - ternary plotting library for python with matplotlib
- missingno - Missing data visualization module for Python
- recmetrics - library of useful metrics and plots for evaluating recommender systems
- kaggle-metrics - Metrics for Kaggle competitions
- Metrics - machine learning evaluation metric
- sklearn-evaluation - scikit-learn model evaluation made easy: plots, tables and markdown reports
- numpy - the fundamental package needed for scientific computing with Python.
- Dask
- parallel computing with task scheduling
- bottleneck - Fast NumPy array functions written in C
- minpy - NumPy interface with mixed backend execution
- CuPy - NumPy-like API accelerated with CUDA
- scikit-tensor - Python library for multilinear algebra and tensor factorizations
- numdifftools - solve automatic numerical differentiation problems in one or more variables
- quaternion - Add built-in support for quaternions to numpy
- adaptive - Tools for adaptive and parallel samping of mathematical functions
- QML - a Python Toolkit for Quantum Machine Learning
- sklearn-porter - transpile trained scikit-learn estimators to C, Java, JavaScript and others
- ONNX - Open Neural Network Exchange
- MMdnn - a set of tools to help users inter-operate among different deep learning frameworks.