@@ -28,6 +28,7 @@ General ticks: [link](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks
28
28
29
29
#### Extraction
30
30
[ textract] ( https://github.com/deanmalmgren/textract ) - Extract text from any document.
31
+ [ camelot] ( https://github.com/socialcopsdev/camelot ) - Extract text from PDF.
31
32
32
33
#### Big Data
33
34
[ spark] ( https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html#work-with-dataframes ) - ` DataFrame ` for big data, [ cheatsheet] ( https://gist.github.com/crawles/b47e23da8218af0b9bd9d47f5242d189 ) , [ tutorial] ( https://github.com/ericxiao251/spark-syntax ) .
@@ -53,19 +54,26 @@ Visualizations - [Null Hypothesis Significance Testing (NHST)](https://rpsycholo
53
54
[ scikit-posthocs] ( https://github.com/maximtrp/scikit-posthocs ) - Statistical post-hoc tests for pairwise multiple comparisons.
54
55
55
56
#### Exploration and Cleaning
57
+ [ impyute] ( https://github.com/eltonlaw/impyute ) - Imputations.
56
58
[ fancyimpute] ( https://github.com/iskandr/fancyimpute ) - Matrix completion and imputation algorithms.
57
59
[ imbalanced-learn] ( https://github.com/scikit-learn-contrib/imbalanced-learn ) - Resampling for imbalanced datasets.
58
60
[ tspreprocess] ( https://github.com/MaxBenChrist/tspreprocess ) - Time series preprocessing: Denoising, Compression, Resampling.
61
+ [ Kaggler] ( https://github.com/jeongyoonlee/Kaggler ) - Utility functions (` OneHotEncoder(min_obs=100) ` )
62
+ [ pyupset] ( https://github.com/ImSoErgodic/py-upset ) - Visualizing intersecting sets.
63
+ [ pyemd] ( https://github.com/wmayner/pyemd ) - Earth Mover's Distance, similarity between histograms.
59
64
60
65
#### Feature Engineering
61
66
[ sklearn] ( https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html ) - Pipeline, [ examples] ( https://github.com/jem1031/pandas-pipelines-custom-transformers ) .
67
+ [ pdpipe] ( https://github.com/shaypal5/pdpipe ) - Pipelines for DataFrames.
62
68
[ few] ( https://github.com/lacava/few ) - Feature engineering wrapper for sklearn.
63
69
[ skoot] ( https://github.com/tgsmith61591/skoot ) - Pipeline helper functions.
64
70
[ categorical-encoding] ( https://github.com/scikit-learn-contrib/categorical-encoding ) - Categorical encoding of variables.
71
+ [ dirty_cat] ( https://github.com/dirty-cat/dirty_cat ) - Encoding dirty categorical variables.
65
72
[ patsy] ( https://github.com/pydata/patsy/ ) - R-like syntax for statistical models.
66
73
[ mlxtend] ( https://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/ ) - LDA.
67
74
[ featuretools] ( https://github.com/Featuretools/featuretools ) - Automated feature engineering, [ example] ( https://github.com/WillKoehrsen/automated-feature-engineering/blob/master/walk_through/Automated_Feature_Engineering.ipynb ) .
68
75
[ tsfresh] ( https://github.com/blue-yonder/tsfresh ) - Time series feature engineering.
76
+ [ pypeln] ( https://github.com/cgarciae/pypeln ) - Concurrent data pipelines.
69
77
70
78
#### Feature Selection
71
79
[ Tutorial] ( https://machinelearningmastery.com/feature-selection-machine-learning-python/ ) , [ Talk] ( https://www.youtube.com/watch?v=JsArBz46_3s )
@@ -93,6 +101,7 @@ Visualizations - [Null Hypothesis Significance Testing (NHST)](https://rpsycholo
93
101
[ yellowbrick] ( https://github.com/DistrictDataLabs/yellowbrick ) - Wrapper for matplotlib for diagnosic ML plots.
94
102
[ bokeh] ( https://bokeh.pydata.org/en/latest/ ) - Interactive visualization library, [ Examples] ( https://bokeh.pydata.org/en/latest/docs/user_guide/server.html ) , [ Examples] ( https://github.com/WillKoehrsen/Bokeh-Python-Visualization ) .
95
103
[ altair] ( https://altair-viz.github.io/ ) - Declarative statistical visualization library.
104
+ [ bqplot] ( https://github.com/bloomberg/bqplot ) - Plotting library for IPython/Jupyter Notebooks.
96
105
[ holoviews] ( http://holoviews.org/ ) - Visualization library.
97
106
[ dtreeviz] ( https://github.com/parrt/dtreeviz ) - Decision tree visualization and model interpretation.
98
107
[ chartify] ( https://github.com/spotify/chartify/ ) - Generate charts.
@@ -159,7 +168,6 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www
159
168
[ faiss] ( https://github.com/facebookresearch/faiss ) - Approximate nearest neighbor search.
160
169
[ pysparnn] ( https://github.com/facebookresearch/pysparnn ) - Approximate nearest neighbor search.
161
170
[ infomap] ( https://github.com/mapequation/infomap ) - Cluster (word-)vectors to find topics, [ example] ( https://github.com/mapequation/infomap/blob/master/examples/python/infomap-examples.ipynb ) .
162
- [ textract] ( https://github.com/deanmalmgren/textract ) - Extract text from any document.
163
171
[ datasketch] ( https://github.com/ekzhu/datasketch ) - Probabilistic data structures for large data (MinHash, HyperLogLog).
164
172
[ flair] ( https://github.com/zalandoresearch/flair ) - NLP Framework by Zalando.
165
173
[ standfordnlp] ( https://github.com/stanfordnlp/stanfordnlp ) - NLP Library.
@@ -206,6 +214,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/),
206
214
[ Augmentor] ( https://github.com/mdbloice/Augmentor ) - Image augmentation library.
207
215
[ tcav] ( https://github.com/tensorflow/tcav ) - Interpretability method.
208
216
217
+ #### Text Related
218
+ [ ktext] ( https://github.com/hamelsmu/ktext ) - Utilities for pre-processing text for deep learning in Keras.
219
+
209
220
##### Libs
210
221
[ keras] ( https://keras.io/ ) - Neural Networks on top of [ tensorflow] ( https://www.tensorflow.org/ ) .
211
222
[ keras-contrib] ( https://github.com/keras-team/keras-contrib ) - Keras community contributions.
@@ -216,13 +227,16 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/),
216
227
[ tensorforce] ( https://github.com/reinforceio/tensorforce ) - Tensorflow for applied reinforcement learning.
217
228
[ fastai] ( https://github.com/fastai/fastai ) - Neural Networks in pytorch.
218
229
[ ignite] ( https://github.com/pytorch/ignite ) - Highlevel library for pytorch.
230
+ [ skorch] ( https://github.com/dnouri/skorch ) - Scikit-learn compatible neural network library that wraps pytorch.
219
231
[ Detectron] ( https://github.com/facebookresearch/Detectron ) - Object Detection by Facebook.
220
232
[ autokeras] ( https://github.com/jhfjhfj1/autokeras ) - AutoML for deep learning.
221
233
[ simpledet] ( https://github.com/TuSimple/simpledet ) - Object Detection and Instance Recognition.
222
234
[ PlotNeuralNet] ( https://github.com/HarisIqbal88/PlotNeuralNet ) - Plot neural networks.
223
235
[ lucid] ( https://github.com/tensorflow/lucid ) - Neural network interpretability.
224
236
[ AdaBound] ( https://github.com/Luolc/AdaBound ) - Optimizer that trains as fast as Adam and as good as SGD.
225
- [ caffe] ( https://github.com/BVLC/caffe ) - Deep learning framework, [ pretrained models] ( https://github.com/BVLC/caffe/wiki/Model-Zoo ) .
237
+ [ caffe] ( https://github.com/BVLC/caffe ) - Deep learning framework, [ pretrained models] ( https://github.com/BVLC/caffe/wiki/Model-Zoo ) .
238
+ [ foolbox] ( https://github.com/bethgelab/foolbox ) - Adversarial examples that fool neural networks.
239
+ [ hiddenlayer] ( https://github.com/waleedka/hiddenlayer ) - Training metrics.
226
240
227
241
##### Snippets
228
242
[ Simple Keras models] ( https://gist.github.com/candlewill/552fa102352ccce42fd829ae26277d24 )
@@ -316,11 +330,16 @@ RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests).
316
330
[ pomegranate] ( https://github.com/jmschrei/pomegranate ) - Probabilistic modelling, [ talk] ( https://www.youtube.com/watch?v=dE5j6NW-Kzg ) .
317
331
[ pmlearn] ( https://github.com/pymc-learn/pymc-learn ) - Probabilistic machine learning.
318
332
[ arviz] ( https://github.com/arviz-devs/arviz ) - Exploratory analysis of Bayesian models.
333
+ [ zhusuan] ( https://github.com/thu-ml/zhusuan ) - Bayesian Deep Learning, Generative Models.
319
334
320
- #### Stacking Models
335
+ #### Causal Inference
336
+ [ dowhy] ( https://github.com/Microsoft/dowhy ) - Estimate causal effects.
337
+
338
+ #### Stacking Models and Ensembles
321
339
[ mlxtend] ( https://github.com/rasbt/mlxtend ) - ` EnsembleVoteClassifier ` , ` StackingRegressor ` , ` StackingCVRegressor ` for model stacking.
322
340
[ vecstack] ( https://github.com/vecxoz/vecstack ) - Stacking ML models.
323
341
[ StackNet] ( https://github.com/kaz-Anova/StackNet ) - Stacking ML models.
342
+ [ mlens] ( https://github.com/flennerhag/mlens ) - Ensemble learning.
324
343
325
344
#### Model Evaluation
326
345
[ pycm] ( https://github.com/sepandhaghighi/pycm ) - Multi-class confusion matrix.
@@ -357,6 +376,9 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin
357
376
[ optuna] ( https://github.com/pfnet/optuna ) - Hyperparamter optimization.
358
377
[ hypergraph] ( https://github.com/aljabr0/hypergraph ) - Global optimization methods and hyperparameter optimization.
359
378
379
+ #### Online Learning
380
+ [ Kaggler] ( https://github.com/jeongyoonlee/Kaggler ) - Online Learning algorithms.
381
+
360
382
#### Active Learning
361
383
[ Talk] ( https://www.youtube.com/watch?v=0efyjq5rWS4 )
362
384
[ modAL] ( https://github.com/modAL-python/modAL ) - Active learning framework.
@@ -411,6 +433,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe
411
433
[ Awesome Network Embedding] ( https://github.com/chihming/awesome-network-embedding )
412
434
[ Awesome Python] ( https://github.com/vinta/awesome-python )
413
435
[ Awesome Python Data Science] ( https://github.com/krzjoa/awesome-python-datascience )
436
+ [ Awesome Python Data Science] ( https://github.com/thomasjpfan/awesome-python-data-science )
414
437
[ Awesome Semantic Segmentation] ( https://github.com/mrgloom/awesome-semantic-segmentation )
415
438
[ Awesome Sentence Embedding] ( https://github.com/Separius/awesome-sentence-embedding )
416
439
[ Awesome Time Series] ( https://github.com/MaxBenChrist/awesome_time_series_in_python )
0 commit comments