Skip to content

Commit 25ad729

Browse files
authored
Update README.md
1 parent 48d598a commit 25ad729

File tree

1 file changed

+26
-3
lines changed

1 file changed

+26
-3
lines changed

README.md

+26-3
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ General ticks: [link](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks
2828

2929
#### Extraction
3030
[textract](https://github.com/deanmalmgren/textract) - Extract text from any document.
31+
[camelot](https://github.com/socialcopsdev/camelot) - Extract text from PDF.
3132

3233
#### Big Data
3334
[spark](https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html#work-with-dataframes) - `DataFrame` for big data, [cheatsheet](https://gist.github.com/crawles/b47e23da8218af0b9bd9d47f5242d189), [tutorial](https://github.com/ericxiao251/spark-syntax).
@@ -53,19 +54,26 @@ Visualizations - [Null Hypothesis Significance Testing (NHST)](https://rpsycholo
5354
[scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons.
5455

5556
#### Exploration and Cleaning
57+
[impyute](https://github.com/eltonlaw/impyute) - Imputations.
5658
[fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms.
5759
[imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Resampling for imbalanced datasets.
5860
[tspreprocess](https://github.com/MaxBenChrist/tspreprocess) - Time series preprocessing: Denoising, Compression, Resampling.
61+
[Kaggler](https://github.com/jeongyoonlee/Kaggler) - Utility functions (`OneHotEncoder(min_obs=100)`)
62+
[pyupset](https://github.com/ImSoErgodic/py-upset) - Visualizing intersecting sets.
63+
[pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance, similarity between histograms.
5964

6065
#### Feature Engineering
6166
[sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) - Pipeline, [examples](https://github.com/jem1031/pandas-pipelines-custom-transformers).
67+
[pdpipe](https://github.com/shaypal5/pdpipe) - Pipelines for DataFrames.
6268
[few](https://github.com/lacava/few) - Feature engineering wrapper for sklearn.
6369
[skoot](https://github.com/tgsmith61591/skoot) - Pipeline helper functions.
6470
[categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding) - Categorical encoding of variables.
71+
[dirty_cat](https://github.com/dirty-cat/dirty_cat) - Encoding dirty categorical variables.
6572
[patsy](https://github.com/pydata/patsy/) - R-like syntax for statistical models.
6673
[mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/) - LDA.
6774
[featuretools](https://github.com/Featuretools/featuretools) - Automated feature engineering, [example](https://github.com/WillKoehrsen/automated-feature-engineering/blob/master/walk_through/Automated_Feature_Engineering.ipynb).
6875
[tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering.
76+
[pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines.
6977

7078
#### Feature Selection
7179
[Tutorial](https://machinelearningmastery.com/feature-selection-machine-learning-python/), [Talk](https://www.youtube.com/watch?v=JsArBz46_3s)
@@ -93,6 +101,7 @@ Visualizations - [Null Hypothesis Significance Testing (NHST)](https://rpsycholo
93101
[yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Wrapper for matplotlib for diagnosic ML plots.
94102
[bokeh](https://bokeh.pydata.org/en/latest/) - Interactive visualization library, [Examples](https://bokeh.pydata.org/en/latest/docs/user_guide/server.html), [Examples](https://github.com/WillKoehrsen/Bokeh-Python-Visualization).
95103
[altair](https://altair-viz.github.io/) - Declarative statistical visualization library.
104+
[bqplot](https://github.com/bloomberg/bqplot) - Plotting library for IPython/Jupyter Notebooks.
96105
[holoviews](http://holoviews.org/) - Visualization library.
97106
[dtreeviz](https://github.com/parrt/dtreeviz) - Decision tree visualization and model interpretation.
98107
[chartify](https://github.com/spotify/chartify/) - Generate charts.
@@ -159,7 +168,6 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www
159168
[faiss](https://github.com/facebookresearch/faiss) - Approximate nearest neighbor search.
160169
[pysparnn](https://github.com/facebookresearch/pysparnn) - Approximate nearest neighbor search.
161170
[infomap](https://github.com/mapequation/infomap) - Cluster (word-)vectors to find topics, [example](https://github.com/mapequation/infomap/blob/master/examples/python/infomap-examples.ipynb).
162-
[textract](https://github.com/deanmalmgren/textract) - Extract text from any document.
163171
[datasketch](https://github.com/ekzhu/datasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog).
164172
[flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando.
165173
[standfordnlp](https://github.com/stanfordnlp/stanfordnlp) - NLP Library.
@@ -206,6 +214,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/),
206214
[Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library.
207215
[tcav](https://github.com/tensorflow/tcav) - Interpretability method.
208216

217+
#### Text Related
218+
[ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras.
219+
209220
##### Libs
210221
[keras](https://keras.io/) - Neural Networks on top of [tensorflow](https://www.tensorflow.org/).
211222
[keras-contrib](https://github.com/keras-team/keras-contrib) - Keras community contributions.
@@ -216,13 +227,16 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/),
216227
[tensorforce](https://github.com/reinforceio/tensorforce) - Tensorflow for applied reinforcement learning.
217228
[fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch.
218229
[ignite](https://github.com/pytorch/ignite) - Highlevel library for pytorch.
230+
[skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch.
219231
[Detectron](https://github.com/facebookresearch/Detectron) - Object Detection by Facebook.
220232
[autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning.
221233
[simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition.
222234
[PlotNeuralNet](https://github.com/HarisIqbal88/PlotNeuralNet) - Plot neural networks.
223235
[lucid](https://github.com/tensorflow/lucid) - Neural network interpretability.
224236
[AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD.
225-
[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo).
237+
[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo).
238+
[foolbox](https://github.com/bethgelab/foolbox) - Adversarial examples that fool neural networks.
239+
[hiddenlayer](https://github.com/waleedka/hiddenlayer) - Training metrics.
226240

227241
##### Snippets
228242
[Simple Keras models](https://gist.github.com/candlewill/552fa102352ccce42fd829ae26277d24)
@@ -316,11 +330,16 @@ RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests).
316330
[pomegranate](https://github.com/jmschrei/pomegranate) - Probabilistic modelling, [talk](https://www.youtube.com/watch?v=dE5j6NW-Kzg).
317331
[pmlearn](https://github.com/pymc-learn/pymc-learn) - Probabilistic machine learning.
318332
[arviz](https://github.com/arviz-devs/arviz) - Exploratory analysis of Bayesian models.
333+
[zhusuan](https://github.com/thu-ml/zhusuan) - Bayesian Deep Learning, Generative Models.
319334

320-
#### Stacking Models
335+
#### Causal Inference
336+
[dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects.
337+
338+
#### Stacking Models and Ensembles
321339
[mlxtend](https://github.com/rasbt/mlxtend) - `EnsembleVoteClassifier`, `StackingRegressor`, `StackingCVRegressor` for model stacking.
322340
[vecstack](https://github.com/vecxoz/vecstack) - Stacking ML models.
323341
[StackNet](https://github.com/kaz-Anova/StackNet) - Stacking ML models.
342+
[mlens](https://github.com/flennerhag/mlens) - Ensemble learning.
324343

325344
#### Model Evaluation
326345
[pycm](https://github.com/sepandhaghighi/pycm) - Multi-class confusion matrix.
@@ -357,6 +376,9 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin
357376
[optuna](https://github.com/pfnet/optuna) - Hyperparamter optimization.
358377
[hypergraph](https://github.com/aljabr0/hypergraph) - Global optimization methods and hyperparameter optimization.
359378

379+
#### Online Learning
380+
[Kaggler](https://github.com/jeongyoonlee/Kaggler) - Online Learning algorithms.
381+
360382
#### Active Learning
361383
[Talk](https://www.youtube.com/watch?v=0efyjq5rWS4)
362384
[modAL](https://github.com/modAL-python/modAL) - Active learning framework.
@@ -411,6 +433,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe
411433
[Awesome Network Embedding](https://github.com/chihming/awesome-network-embedding)
412434
[Awesome Python](https://github.com/vinta/awesome-python)
413435
[Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience)
436+
[Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science)
414437
[Awesome Semantic Segmentation](https://github.com/mrgloom/awesome-semantic-segmentation)
415438
[Awesome Sentence Embedding](https://github.com/Separius/awesome-sentence-embedding)
416439
[Awesome Time Series](https://github.com/MaxBenChrist/awesome_time_series_in_python)

0 commit comments

Comments
 (0)