Skip to content

Byblonomikon/Python-Data-Science-Software

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Python Data Science Awesome

Curated list of data science software in Python

Legend:

alt text - scikit-learn compatible (or inspired) API
alt text - pandas compatible or based on
alt text - Theano based project
alt text - TensorFlow based project
alt text - PyTorch based project
alt text - CuPy based project
alt text - R inspired/ported lib
alt text - MXNet based project
alt text - Apache Spark based project
alt text - GPU-accelerated computations (if not based on Theano, Tensorflow, PyTorch, CuPy etc.)
alt text - possible to run on AMD GPU

Table of contents:

Machine Learning

General purpouse Machine Learning

  • scikit-learn alt text - machine learning in Python
  • Shogun - machine learning toolbox
  • xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package
  • Reproducible Experiment Platform (REP) alt text - Machine Learning toolbox for Humans
  • modAL alt text - a modular active learning framework for Python3
  • Sparkit-learn alt text alt text - PySpark + Scikit-learn = Sparkit-learn
  • mlpack - a scalable C++ machine learning library (Python bindings)
  • dlib - A toolkit for making real world machine learning and data analysis applications in C++ (Python bindings)
  • MLxtend alt text - extension and helper modules for Python's data analysis and machine learning libraries
  • scikit-multilearn alt text - multi-label classification for python
  • seqlearn alt text - seqlearn is a sequence classification toolkit for Python
  • pystruct alt text - Simple structured learning framework for python
  • sklearn-expertsys alt text - Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
  • RuleFit alt text - implementation of the rulefit
  • metric-learn alt text - metric learning algorithms in Python
  • pyGAM - Generalized Additive Models in Python
  • Other...

Time series

  • tslearn alt text - machine learning toolkit dedicated to time-series data
  • tick alt text - module for statistical learning, with a particular emphasis on time-dependent modelling
  • Prophet - Automatic Forecasting Procedure
  • PyFlux - Open source time series library for Python
  • bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models
  • luminol - Anomaly Detection and Correlation library

Automated machine learning

  • TPOT alt text - Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming
  • auto-sklearn alt text - is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
  • MLBox - a powerful Automated Machine Learning python library.

Ensemble methods

  • ML-Ensemble alt text - high performance ensemble learning
  • Stacking alt text - Simple and useful stacking library, written in Python.
  • stacked_generalization alt text - library for machine learning stacking generalization.
  • vecstack alt text - Python package for stacking (machine learning technique)

Imbalanced datasets

  • imbalanced-learn alt text - module to perform under sampling and over sampling with various techniques
  • imbalanced-algorithms alt text alt text - Python-based implementations of algorithms for learning on imbalanced data.

Random Forests

Extreme Learning Machine

  • Python-ELM alt text - Extreme Learning Machine implementation in Python
  • Python Extreme Learning Machine (ELM) - a machine learning technique used for classification/regression tasks
  • hpelm alt text - High performance implementation of Extreme Learning Machines (fast randomized neural networks).

Kernel methods

  • pyFM alt text - Factorization machines in python
  • fastFM alt text - a library for Factorization Machines
  • tffm alt text alt text - TensorFlow implementation of an arbitrary order Factorization Machine
  • liquidSVM - an implementation of SVMs
  • scikit-rvm alt text - Relevance Vector Machine implementation using the scikit-learn API
  • ThunderSVM alt text alt text - a fast SVM Library on GPUs and CPUs

Gradient boosting

  • XGBoost alt text alt text - Scalable, Portable and Distributed Gradient Boosting
  • LightGBM alt text alt text - a fast, distributed, high performance gradient boosting by Microsoft
  • CatBoost alt text alt text - an open-source gradient boosting on decision trees library by Yandex
  • ThunderGBM alt text alt text - Fast GBDTs and Random Forests on GPUs
  • Other...

Deep Learning

Keras

  • Keras - a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
  • keras-contrib - Keras community contributions
  • Hyperas - Keras + Hyperopt: A very simple wrapper for convenient hyperparameter
  • Elephas - Distributed Deep learning with Keras & Spark
  • Hera - Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
  • dist-keras alt text - Distributed Deep Learning, with a focus on distributed training
  • Conx - The On-Ramp to Deep Learning
  • Spektral - deep learning on graphs
  • Keras add-ons...

TensorFlow

  • TensorFlow alt text - omputation using data flow graphs for scalable machine learning by Google
  • TensorLayer alt text - Deep Learning and Reinforcement Learning Library for Researcher and Engineer.
  • TFLearn alt text - Deep learning library featuring a higher-level API for TensorFlow
  • Sonnet alt text - TensorFlow-based neural network library by DeepMind
  • TensorForce alt text - a TensorFlow library for applied reinforcement learning
  • tensorpack alt text - a Neural Net Training Interface on TensorFlow
  • Polyaxon alt text - a platform that helps you build, manage and monitor deep learning models
  • NeuPy alt text - NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: alt text)
  • Horovod alt text - Distributed training framework for TensorFlow
  • tfdeploy alt text - Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy
  • hiptensorflow alt text alt text - ROCm/HIP enabled Tensorflow
  • TensorFlow Fold alt text - Deep learning with dynamic computation graphs in TensorFlow
  • tensorlm alt text - wrapper library for text generation / language models at char and word level with RNN
  • TensorLight alt text - a high-level framework for TensorFlow
  • Mesh TensorFlow alt text - Model Parallelism Made Easier
  • Ludwig alt text - a toolbox, that allows to train and test deep learning models without the need to write code.

Theano

WARNING: Theano development has been stopped

  • Theanoalt text - is a Python library that allows you to define, optimize, and evaluate mathematical expressions
  • Lasagne alt text - Lightweight library to build and train neural networks in Theano Lasagne add-ons...
  • nolearn alt text alt text - scikit-learn compatible neural network library (mainly for Lasagne)
  • Blocks alt text - a Theano framework for building and training neural networks
  • platoon alt text - Multi-GPU mini-framework for Theano
  • scikit-neuralnetwork alt text alt text - Deep neural networks without the learning cliff
  • Theano-MPI alt text - MPI Parallel framework for training deep learning models built in Theano

PyTorch

  • PyTorch alt text - Tensors and Dynamic neural networks in Python with strong GPU acceleration
  • torchvision alt text - Datasets, Transforms and Models specific to Computer Vision
  • torchtext alt text - Data loaders and abstractions for text and NLP
  • torchaudio alt text - an audio library for PyTorch
  • ignite alt text - high-level library to help with training neural networks in PyTorch
  • PyToune - a Keras-like framework and utilities for PyTorch
  • skorch alt text alt text - a scikit-learn compatible neural network library that wraps pytorch
  • PyTorchNet alt text - an abstraction to train neural networks
  • Aorun alt text - intend to implement an API similar to Keras with PyTorch as backend.
  • pytorch_geometric alt text - Geometric Deep Learning Extension Library for PyTorch

MXNet

  • MXNet alt text - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler
  • Gluon alt text - a clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
  • MXbox alt text - simple, efficient and flexible vision toolbox for mxnet framework.
  • gluon-cv alt text - provides implementations of the state-of-the-art deep learning models in computer vision.
  • gluon-nlp alt text - NLP made easy
  • Xfer alt text - Transfer Learning library for Deep Neural Networks
  • MXNet alt text alt text - HIP Port of MXNet

Caffe

  • Caffe - a fast open framework for deep learning
  • Caffe2 - a lightweight, modular, and scalable deep learning framework
  • hipCaffe alt text - the HIP port of Caffe

Chainer

  • Chainer - a flexible framework for neural networks
  • ChainerRL - a deep reinforcement learning library built on top of Chainer.
  • ChainerCV - a Library for Deep Learning in Computer Vision
  • ChainerMN - scalable distributed deep learning with Chainer
  • scikit-chainer alt text - scikit-learn like interface to chainer
  • chainer_sklearn alt text - Sklearn (Scikit-learn) like interface for Chainer

Others

  • CNTK - Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
  • Neon - Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
  • Tangent - Source-to-Source Debuggable Derivatives in Pure Python
  • autograd - Efficiently computes derivatives of numpy code
  • Myia - deep learning framework (pre-alpha)
  • nnabla - Neural Network Libraries by Sony

Data manipulation

Data Containers

  • pandas - powerful Python data analysis toolkit
  • blaze alt text - NumPy and Pandas interface to Big Data
  • pandasql alt text - allows you to query pandas DataFrames using SQL syntax
  • pandas-gbq alt text - Pandas Google Big Query
  • xpandas - universal 1d/2d data containers with Transformers functionality for data analysis by The Alan Turing Institute
  • pysparkling alt text - a pure Python implementation of Apache Spark's RDD and DStream interfaces
  • Arctic - high performance datastore for time series and tick data
  • datatable alt text - data.table for Python
  • koalas alt text - pandas API on Apache Spark

Pipelines

  • Fuel - data pipeline framework for machine learning
  • pdpipe - sasy pipelines for pandas DataFrames.
  • SSPipe - Python pipe (|) operator with support for DataFrames and Numpy and Pytorch
  • meza - a Python toolkit for processing tabular data
  • pandas-ply alt text - functional data manipulation for pandas
  • Dplython alt text - Dplyr for Python
  • sklearn-pandas alt text alt text - Pandas integration with sklearn
  • quinn alt text - pyspark methods to enhance developer productivity
  • Dataset - helps you conveniently work with random or sequential batches of your data and define data processing
  • swifter - a package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
  • pyjanitor alt text - Clean APIs for data cleaning
  • modin alt text - speed up your Pandas workflows by changing a single line of code
  • Prodmodel - build system for data science pipelines

Feature engineering

General

  • Featuretools - automated feature engineering
  • skl-groups alt text - scikit-learn addon to operate on set/"group"-based features
  • Feature Forge alt text - a set of tools for creating and testing machine learning feature
  • few alt text - a feature engineering wrapper for sklearn
  • scikit-mdr alt text - a sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
  • tsfresh alt text - Automatic extraction of relevant features from time series

Feature selection

  • scikit-feature - feature selection repository in python
  • boruta_py alt text - implementations of the Boruta all-relevant feature selection method
  • BoostARoota alt text - a fast xgboost feature selection algorithm
  • scikit-rebate alt text - a scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning

Model explanation

  • Alibi - Algorithms for monitoring and explaining machine learning models
  • Auralisation - auralisation of learned features in CNN (for audio)
  • CapsNet-Visualization - a visualization of the CapsNet layers to better understand how it works
  • lucid - a collection of infrastructure and tools for research in neural network interpretability.
  • Netron - visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
  • FlashLight - visualization Tool for your NeuralNetwork
  • tensorboard-pytorch - tensorboard for pytorch (and chainer, mxnet, numpy, ...)
  • anchor - code for "High-Precision Model-Agnostic Explanations" paper
  • aequitas - Bias and Fairness Audit Toolkit
  • Contrastive Explanation alt text - Contrastive Explanation (Foil Trees)
  • yellowbrick alt text- visual analysis and diagnostic tools to facilitate machine learning model selection
  • scikit-plot alt text - an intuitive library to add plotting functionality to scikit-learn objects
  • shap alt text - a unified approach to explain the output of any machine learning model
  • ELI5 - a library for debugging/inspecting machine learning classifiers and explaining their predictions
  • Lime alt text - Explaining the predictions of any machine learning classifier
  • FairML alt text - FairML is a python toolbox auditing the machine learning models for bias
  • L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
  • PDPbox - partial dependence plot toolbox
  • pyBreakDown alt text - Python implementation of R package breakDown
  • PyCEbox - Python Individual Conditional Expectation Plot Toolbox
  • Skater - Python Library for Model Interpretation
  • model-analysis alt text - Model analysis tools for TensorFlow
  • themis-ml alt text - a library that implements fairness-aware machine learning algorithms
  • treeinterpreter alt text -interpreting scikit-learn's decision tree and random forest predictions
  • mxboard - Logging MXNet data for visualization in TensorBoard

Reinforcement Learning

  • OpenAI Gym - a toolkit for developing and comparing reinforcement learning algorithms.

Distributed computing systems

  • PySpark alt text - exposes the Spark programming model to Python
  • Veles - Distributed machine learning platform by Samsung
  • Jubatus - Framework and Library for Distributed Online Machine Learning
  • DMTK - Microsoft Distributed Machine Learning Toolkit
  • PaddlePaddle - PArallel Distributed Deep LEarning by Baidu
  • dask-ml alt text - Distributed and parallel machine learning
  • Distributed - Distributed computation in Python

Probabilistic methods

  • pomegranate alt text - probabilistic and graphical models for Python
  • pyro alt text - a flexible, scalable deep probabilistic programming library built on PyTorch.
  • ZhuSuan alt text - Bayesian Deep Learning
  • PyMC - Bayesian Stochastic Modelling in Python
  • PyMC3 alt text - Python package for Bayesian statistical modeling and Probabilistic Machine Learning
  • sampled - Decorator for reusable models in PyMC3
  • Edward alt text - A library for probabilistic modeling, inference, and criticism.
  • InferPy alt text - Deep Probabilistic Modelling Made Easy
  • GPflow alt text - Gaussian processes in TensorFlow
  • PyStan - Bayesian inference using the No-U-Turn sampler (Python interface)
  • gelato alt text - Bayesian dessert for Lasagne
  • sklearn-bayes alt text - Python package for Bayesian Machine Learning with scikit-learn API
  • skggm alt text - estimation of general graphical models
  • pgmpy - a python library for working with Probabilistic Graphical Models.
  • skpro alt text - supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute
  • Aboleth alt text - a bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation
  • PtStat alt text - Probabilistic Programming and Statistical Inference in PyTorch
  • PyVarInf alt text - Bayesian Deep Learning methods with Variational Inference for PyTorch
  • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC
  • hsmmlearn - a library for hidden semi-Markov models with explicit durations
  • pyhsmm - bayesian inference in HSMMs and HMMs
  • GPyTorch alt text - a highly efficient and modular implementation of Gaussian Processes in PyTorch
  • MXFusion alt text - Modular Probabilistic Programming on MXNet
  • sklearn-crfsuite alt text - scikit-learn inspired API for CRFsuite

Genetic Programming

  • gplearn alt text - Genetic Programming in Python
  • DEAP - Distributed Evolutionary Algorithms in Python
  • karoo_gp alt text - A Genetic Programming platform for Python with GPU support
  • monkeys - A strongly-typed genetic programming framework for Python
  • sklearn-genetic alt text - Genetic feature selection module for scikit-learn

Optimization

  • Spearmint - Bayesian optimization
  • BoTorch alt text - Bayesian optimization in PyTorch
  • SMAC3 - Sequential Model-based Algorithm Configuration
  • Optunity - is a library containing various optimizers for hyperparameter tuning.
  • hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python
  • hyperopt-sklearn alt text - hyper-parameter optimization for sklearn
  • sklearn-deap alt text - use evolutionary algorithms instead of gridsearch in scikit-learn
  • sigopt_sklearn alt text - SigOpt wrappers for scikit-learn methods
  • Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
  • SafeOpt - Safe Bayesian Optimization
  • scikit-optimize - Sequential model-based optimization with a scipy.optimize interface
  • Solid - A comprehensive gradient-free optimization framework written in Python
  • PySwarms - A research toolkit for particle swarm optimization in Python
  • Platypus - A Free and Open Source Python Library for Multiobjective Optimization
  • GPflowOpt alt text - Bayesian Optimization using GPflow
  • POT - Python Optimal Transport library
  • Talos - Hyperparameter Optimization for Keras Models
  • nlopt - library for nonlinear optimization (global and local, constrained or unconstrained)

Natural Language Processing

  • NLTK - modules, data sets, and tutorials supporting research and development in Natural Language Processing
  • CLTK - The Classical Language Toolkik
  • gensim - Topic Modelling for Humans
  • PSI-Toolkit - a natural language processing toolkit by Adam Mickiewicz University in Poznań
  • pyMorfologik - Python binding for Morfologik (Polish morphological analyzer)
  • skift alt text - scikit-learn wrappers for Python fastText.
  • Phonemizer - Simple text to phonemes converter for multiple languages
  • flair - very simple framework for state-of-the-art NLP by Zalando Research

Computer Audition

  • librosa - Python library for audio and music analysis
  • Yaafe - Audio features extraction
  • aubio - a library for audio and music analysis
  • Essentia - library for audio and music analysis, description and synthesis
  • LibXtract - is a simple, portable, lightweight library of audio feature extraction functions
  • Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals
  • muda - a library for augmenting annotated audio data
  • madmom - Python audio and music signal processing library

Computer Vision

  • OpenCV - Open Source Computer Vision Library
  • scikit-image - Image Processing SciKit (Toolbox for SciPy)
  • imgaug - image augmentation for machine learning experiments
  • imgaug_extension - additional augmentations for imgaug
  • Augmentor - Image augmentation library in Python for machine learning
  • albumentations - fast image augmentation library and easy to use wrapper around other libraries

Statistics

  • pandas_summary alt text - extension to pandas dataframes describe function
  • Pandas Profiling alt text - Create HTML profiling reports from pandas DataFrame objects
  • statsmodels - statistical modeling and econometrics in Python
  • stockstats - Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline stock statistics/indicators support.
  • simplestatistics - simple statistical functions implemented in readable Python.
  • weightedcalcs - pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more
  • scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests
  • pysie - provides python implementation of statistical inference engine

Experiments tools

  • Sacred - a tool to help you configure, organize, log and reproduce experiments by IDSIA
  • Xcessiv - a web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling
  • Persimmon alt text - A visual dataflow programming language for sklearn
  • Ax - Adaptive Experimentation Platform

Visualization

  • Matplotlib - plotting with Python
  • seaborn - statistical data visualization using matplotlib
  • Bokeh - Interactive Web Plotting for Python
  • HoloViews - stop plotting your data - annotate your data and let it visualize itself
  • Alphalens - performance analysis of predictive (alpha) stock factors by Quantopian
  • python-ternary - ternary plotting library for python with matplotlib
  • missingno - Missing data visualization module for Python

Evaluation

  • recmetrics - library of useful metrics and plots for evaluating recommender systems
  • kaggle-metrics - Metrics for Kaggle competitions
  • Metrics - machine learning evaluation metric
  • sklearn-evaluation - scikit-learn model evaluation made easy: plots, tables and markdown reports

Computations

  • numpy - the fundamental package needed for scientific computing with Python.
  • Dask alt text - parallel computing with task scheduling
  • bottleneck - Fast NumPy array functions written in C
  • minpy - NumPy interface with mixed backend execution
  • CuPy - NumPy-like API accelerated with CUDA
  • scikit-tensor - Python library for multilinear algebra and tensor factorizations
  • numdifftools - solve automatic numerical differentiation problems in one or more variables
  • quaternion - Add built-in support for quaternions to numpy
  • adaptive - Tools for adaptive and parallel samping of mathematical functions

Spatial analysis

  • GeoPandas alt text - Python tools for geographic data
  • PySal - Python Spatial Analysis Library

Quantum Computing

  • QML - a Python Toolkit for Quantum Machine Learning

Conversion

  • sklearn-porter - transpile trained scikit-learn estimators to C, Java, JavaScript and others
  • ONNX - Open Neural Network Exchange
  • MMdnn - a set of tools to help users inter-operate among different deep learning frameworks.

Deprecated libs Waiting room

About

Curated list of data science software in Python.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published