From 62475b36a351a765e02bae22f8eabdee4bb4cc67 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 8 Jun 2019 01:20:56 +0200 Subject: [PATCH 001/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index f7d5817..84210b5 100644 --- a/README.md +++ b/README.md @@ -99,7 +99,9 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [boruta_py](https://github.com/scikit-learn-contrib/boruta_py) - Feature selection, [explaination](https://stats.stackexchange.com/questions/264360/boruta-all-relevant-feature-selection-vs-random-forest-variables-of-importanc/264467), [example](https://www.kaggle.com/tilii7/boruta-feature-elimination). [linselect](https://github.com/efavdb/linselect) - Feature selection package. [mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/) - Exhaustive feature selection. + #### Dimensionality Reduction +[Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU) [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) - Multidimensional scaling (MDS). [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). From d61fbf5b8e920a86aa4db6978b5a0a0d5cfd3d2f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 9 Jun 2019 22:48:00 +0200 Subject: [PATCH 002/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 84210b5..995c5a0 100644 --- a/README.md +++ b/README.md @@ -76,6 +76,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Talk](https://www.youtube.com/watch?v=68ABAU_V8qI) [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) - Pipeline, [examples](https://github.com/jem1031/pandas-pipelines-custom-transformers). [pdpipe](https://github.com/shaypal5/pdpipe) - Pipelines for DataFrames. +[scikit-lego](https://github.com/koaning/scikit-lego) - Custom transformers for pipelines. [few](https://github.com/lacava/few) - Feature engineering wrapper for sklearn. [skoot](https://github.com/tgsmith61591/skoot) - Pipeline helper functions. [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding) - Categorical encoding of variables, [vtreat (R package)](https://cran.r-project.org/web/packages/vtreat/vignettes/vtreat.html). @@ -271,6 +272,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [Image Super-Resolution](https://github.com/idealo/image-super-resolution) - Super-scaling using a Residual Dense Network. Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Posts: [1](https://www.thomasjpfan.com/2018/07/nuclei-image-segmentation-tutorial/), [2](https://www.thomasjpfan.com/2017/08/hassle-free-unets/) [CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. +[deeplearning-models](https://github.com/rasbt/deeplearning-models) - Deep learning models. #### GPU [cuML](https://github.com/rapidsai/cuml) - Run traditional tabular ML tasks on GPUs. From bc299cd4ede559755f26d53e6ab03c0d3e91fd98 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 12 Jun 2019 19:45:30 +0200 Subject: [PATCH 003/550] Update README.md --- README.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 995c5a0..b2c3e65 100644 --- a/README.md +++ b/README.md @@ -57,13 +57,16 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Statistics [Common statistical tests explained](https://lindeloev.github.io/tests-as-linear/) -[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. +[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. +[researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). +[ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. ##### Visualizations [Null Hypothesis Significance Testing (NHST)](https://rpsychologist.com/d3/NHST/), [Correlation](https://rpsychologist.com/d3/correlation/), [Cohen's d](https://rpsychologist.com/d3/cohend/), [Confidence Interval](https://rpsychologist.com/d3/CI/), [Equivalence, non-inferiority and superiority testing](https://rpsychologist.com/d3/equivalence/), [Bayesian two-sample t test](https://rpsychologist.com/d3/bayes/), [Distribution of p-values when comparing two groups](https://rpsychologist.com/d3/pdist/), [Understanding the t-distribution and its normal approximation](https://rpsychologist.com/d3/tdist/) #### Exploration and Cleaning +[Checklist](https://github.com/r0f1/ml_checklist). [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Resampling for imbalanced datasets. @@ -140,7 +143,8 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. [bowtie](https://github.com/jwkvam/bowtie/) - Dashboarding solution. [panel](https://panel.pyviz.org/index.html) - Dashboarding solution. -[altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs) +[altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). +[voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. #### Geopraphical Tools [folium](https://github.com/python-visualization/folium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https://github.com/jupyter-widgets/ipyleaflet). @@ -309,9 +313,10 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [scikit-multilearn](https://github.com/scikit-multilearn/scikit-multilearn) - Multi-label classification, [talk](https://www.youtube.com/watch?v=m-tAASQA7XQ&t=18m57s). #### Signal Processing and Filtering +[Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). +[The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html). [Kalman Filter book](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Baysian and various Kalman filters. [Interactive Tool](https://fiiir.com/) for FIR and IIR filters, [Examples](https://plot.ly/python/fft-filters/). -[The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html). [filterpy](https://github.com/rlabbe/filterpy) - Kalman filtering and optimal estimation library. #### Time Series @@ -490,6 +495,11 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [skll](https://github.com/EducationalTestingService/skll) - Command-line utilities to make it easier to run machine learning experiments. [BentoML](https://github.com/bentoml/BentoML) - Package and deploy machine learning models for serving in production +#### Math and Background +Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) +Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machine Learning +](https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/) + #### Other [dvc](https://github.com/iterative/dvc) - Versioning for ML projects. [daft](https://github.com/dfm/daft) - Render probabilistic graphical models using matplotlib. From 4f63ee4d714fc1e98ef01376976e4712ca925f27 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 13 Jun 2019 23:28:49 +0200 Subject: [PATCH 004/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b2c3e65..c92a282 100644 --- a/README.md +++ b/README.md @@ -313,7 +313,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [scikit-multilearn](https://github.com/scikit-multilearn/scikit-multilearn) - Multi-label classification, [talk](https://www.youtube.com/watch?v=m-tAASQA7XQ&t=18m57s). #### Signal Processing and Filtering -[Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). +[Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). [The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html). [Kalman Filter book](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Baysian and various Kalman filters. [Interactive Tool](https://fiiir.com/) for FIR and IIR filters, [Examples](https://plot.ly/python/fft-filters/). From fa8666782f4cfae65315dfa14f99ece95a922bf6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 15 Jun 2019 17:35:41 +0200 Subject: [PATCH 005/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index c92a282..8946d0d 100644 --- a/README.md +++ b/README.md @@ -518,7 +518,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [pyup](https://github.com/pyupio/pyup) - Python dependency management. #### Blogs -[PocketCluster](https://blog.pocketcluster.io/) - Blog. [Distill.pub](https://distill.pub/) - Blog. #### Awesome Lists and Resources From be1028e008b44df54130d0ef3407eff2ccb324cc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 18 Jun 2019 08:38:31 +0200 Subject: [PATCH 006/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8946d0d..18417f2 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. [vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. +[zappy](https://github.com/lasersonlab/zappy) - Distributed numpy arrays. ##### Command line tools [ni](https://github.com/spencertipping/ni) - Command line tool for big data. From 59956327c13de2363e17d042dc597fd1aba109d6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 26 Jun 2019 00:30:49 +0200 Subject: [PATCH 007/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 18417f2..9e13d9f 100644 --- a/README.md +++ b/README.md @@ -469,7 +469,7 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 #### Incremental Learning, Online Learning sklearn - [PassiveAggressiveClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html). -[creme-ml](https://github.com/creme-ml/creme) - Incremental learning framework. +[creme-ml](https://github.com/creme-ml/creme) - Incremental learning framework, [talk](https://www.youtube.com/watch?v=P3M6dt7bY9U). [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Online Learning algorithms. #### Active Learning From 1c3f2580dd340270c3895c1fb6f90dff820651ea Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 26 Jun 2019 22:27:16 +0200 Subject: [PATCH 008/550] Update README.md --- README.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 9e13d9f..2e7fa77 100644 --- a/README.md +++ b/README.md @@ -56,15 +56,28 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [csvkit](https://csvkit.readthedocs.io/en/1.0.3/) - Another command line tool for CSV files. [csvsort](https://pypi.org/project/csvsort/) - Sort large csv files. -#### Statistics -[Common statistical tests explained](https://lindeloev.github.io/tests-as-linear/) -[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. +#### Classical Statistics [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). -[ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. +[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. + +#### Tests +[Blog post](https://lindeloev.github.io/tests-as-linear/) +[scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. +[ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). ##### Visualizations -[Null Hypothesis Significance Testing (NHST)](https://rpsychologist.com/d3/NHST/), [Correlation](https://rpsychologist.com/d3/correlation/), [Cohen's d](https://rpsychologist.com/d3/cohend/), [Confidence Interval](https://rpsychologist.com/d3/CI/), [Equivalence, non-inferiority and superiority testing](https://rpsychologist.com/d3/equivalence/), [Bayesian two-sample t test](https://rpsychologist.com/d3/bayes/), [Distribution of p-values when comparing two groups](https://rpsychologist.com/d3/pdist/), [Understanding the t-distribution and its normal approximation](https://rpsychologist.com/d3/tdist/) +[Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https://rpsychologist.com/d3/NHST/) +[Correlation](https://rpsychologist.com/d3/correlation/) +[Cohen's d](https://rpsychologist.com/d3/cohend/) +[Confidence Interval](https://rpsychologist.com/d3/CI/) +[Equivalence, non-inferiority and superiority testing](https://rpsychologist.com/d3/equivalence/) +[Bayesian two-sample t test](https://rpsychologist.com/d3/bayes/) +[Distribution of p-values when comparing two groups](https://rpsychologist.com/d3/pdist/) +[Understanding the t-distribution and its normal approximation](https://rpsychologist.com/d3/tdist/) + +#### Talks +[Inverse Propensity Weighting](https://www.youtube.com/watch?v=SUq0shKLPPs) #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). From 584aa02bbb3c67c9b70346a9da5d1a172a1d805b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 30 Jun 2019 12:09:42 +0200 Subject: [PATCH 009/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 2e7fa77..4ff1502 100644 --- a/README.md +++ b/README.md @@ -162,6 +162,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection #### Geopraphical Tools [folium](https://github.com/python-visualization/folium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https://github.com/jupyter-widgets/ipyleaflet). +[gmaps](https://github.com/pbugnion/gmaps) - Google Maps for Jupyter notebooks. [stadiamaps](https://stadiamaps.com/) - Plot geographical maps. [datashader](https://github.com/bokeh/datashader) - Draw millions of points on a map. [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html) - BallTree, [Example](https://tech.minodes.com/experiments-with-in-memory-spatial-radius-queries-in-python-e40c9e66cf63). @@ -262,6 +263,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Libs [keras](https://keras.io/) - Neural Networks on top of [tensorflow](https://www.tensorflow.org/), [examples](https://gist.github.com/candlewill/552fa102352ccce42fd829ae26277d24). [keras-contrib](https://github.com/keras-team/keras-contrib) - Keras community contributions. +[keras-tuner](https://github.com/keras-team/keras-tuner) - Hyperparameter tuning for Keras. [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt: Convenient hyperparameter optimization wrapper. [elephas](https://github.com/maxpumperla/elephas) - Distributed Deep learning with Keras & Spark. [tflearn](https://github.com/tflearn/tflearn) - Neural Networks on top of tensorflow. From 952032c5a3fdf7d275e18cef5fcaaf61126dc5b3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 30 Jun 2019 12:12:05 +0200 Subject: [PATCH 010/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4ff1502..e200463 100644 --- a/README.md +++ b/README.md @@ -78,6 +78,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Talks [Inverse Propensity Weighting](https://www.youtube.com/watch?v=SUq0shKLPPs) +[Dealing with Selection Bias By Propensity Based Feature Selection](https://www.youtube.com/watch?reload=9&v=3ZWCKr0vDtc) #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). From 38c3432bf6d8a8becb75784e4d1d55602338499e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 6 Jul 2019 17:07:48 +0200 Subject: [PATCH 011/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e200463..7dc38bc 100644 --- a/README.md +++ b/README.md @@ -278,7 +278,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. [PlotNeuralNet](https://github.com/HarisIqbal88/PlotNeuralNet) - Plot neural networks. [lucid](https://github.com/tensorflow/lucid) - Neural network interpretability, [Activation Maps](https://openai.com/blog/introducing-activation-atlases/). -[AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD. +[AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD, [alt](https://github.com/titu1994/keras-adabound). [caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). [foolbox](https://github.com/bethgelab/foolbox) - Adversarial examples that fool neural networks. [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Training metrics. @@ -287,6 +287,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. ##### Applications and Snippets +[efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Promising neural network architecture. [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. [SPADE](https://github.com/nvlabs/spade) - Semantic Image Synthesis. [Entity Embeddings of Categorical Variables](https://arxiv.org/abs/1604.06737), [code](https://github.com/entron/entity-embedding-rossmann), [kaggle](https://www.kaggle.com/aquatic/entity-embedding-neural-net/code) From a6829c43a582d64668ea9e5ebcb52919fc2bc3fd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 13 Jul 2019 16:52:23 +0200 Subject: [PATCH 012/550] Update README.md --- README.md | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7dc38bc..1f36481 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,10 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [nbdime](https://github.com/jupyter/nbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https://www.reviewnb.com/). [RISE](https://github.com/damianavila/RISE) - Turn Jupyter notebooks into presentations. +#### Helpful +[pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. +[intake](https://github.com/intake/intake) - Loading datasets made easier, [talk](https://www.youtube.com/watch?v=s7Ww5-vD2Os&t=33m40s). + #### Extraction [textract](https://github.com/deanmalmgren/textract) - Extract text from any document. [camelot](https://github.com/socialcopsdev/camelot) - Extract text from PDF. @@ -37,6 +41,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [sparkit-learn](https://github.com/lensacom/sparkit-learn), [spark-deep-learning](https://github.com/databricks/spark-deep-learning) - ML frameworks for spark. [koalas](https://github.com/databricks/koalas) - Pandas API on Apache Spark. [dask](https://github.com/dask/dask), [dask-ml](http://ml.dask.org/) - Pandas `DataFrame` for big data and machine learning library, [resources](https://matthewrocklin.com/blog//work/2018/07/17/dask-dev), [talk1](https://www.youtube.com/watch?v=ccfsbuqsjgI), [talk2](https://www.youtube.com/watch?v=RA_2qdipVng), [notebooks](https://github.com/dask/dask-ec2/tree/master/notebooks), [videos](https://www.youtube.com/user/mdrocklin). +[dask-gateway](https://github.com/jcrist/dask-gateway) - Managing dask clusters. [turicreate](https://github.com/apple/turicreate) - Helpful `SFrame` class for out-of-memory dataframes. [h2o](https://github.com/h2oai/h2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes. [datatable](https://github.com/h2oai/datatable) - Data Table for big data support. @@ -230,13 +235,21 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Papers [Search Engine Correlation](https://arxiv.org/pdf/1107.2691.pdf) +#### Biology + +##### Sequencing +[scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). + +##### Image-related +[mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). +[imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. +[CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. +[imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). + #### Image Processing [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) [cv2](https://github.com/skvark/opencv-python) - OpenCV, classical algorithms: [Gaussian Filter](https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html), [Morphological Transformations](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html). [scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. -[mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). -[imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. -[CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. #### Neural Networks From a0a02605d5543a630cc739fe967f403451fdf31a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 13 Jul 2019 20:46:53 +0200 Subject: [PATCH 013/550] Update README.md --- README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 1f36481..0d8921e 100644 --- a/README.md +++ b/README.md @@ -496,7 +496,6 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 [bbopt](https://github.com/evhub/bbopt) - Black box hyperparameter optimization. [dragonfly](https://github.com/dragonfly/dragonfly) - Scalable Bayesian optimisation. - #### Incremental Learning, Online Learning sklearn - [PassiveAggressiveClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html). [creme-ml](https://github.com/creme-ml/creme) - Incremental learning framework, [talk](https://www.youtube.com/watch?v=P3M6dt7bY9U). @@ -519,12 +518,19 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [astroml](https://github.com/astroML/astroML) - ML for astronomical data. #### Deployment and Lifecycle Management + +##### General +[pyup](https://github.com/pyupio/pyup) - Python dependency management. +[pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. + +##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. [sklearn-porter](https://github.com/nok/sklearn-porter) - Transpile trained scikit-learn estimators to C, Java, JavaScript and others. [mlflow](https://mlflow.org/) - Manage the machine learning lifecycle, including experimentation, reproducibility and deployment. [modelchimp](https://github.com/ModelChimp/modelchimp) - Experiment Tracking. [skll](https://github.com/EducationalTestingService/skll) - Command-line utilities to make it easier to run machine learning experiments. [BentoML](https://github.com/bentoml/BentoML) - Package and deploy machine learning models for serving in production +[dvc](https://github.com/iterative/dvc) - Versioning for ML projects. #### Math and Background Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) @@ -532,7 +538,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin ](https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/) #### Other -[dvc](https://github.com/iterative/dvc) - Versioning for ML projects. [daft](https://github.com/dfm/daft) - Render probabilistic graphical models using matplotlib. [unyt](https://github.com/yt-project/unyt) - Working with units. [scrapy](https://github.com/scrapy/scrapy) - Web scraping library. @@ -546,7 +551,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [attrs](https://github.com/python-attrs/attrs) - Python classes without boilerplate. [dateparser](https://dateparser.readthedocs.io/en/latest/) - A better date parser. [jellyfish](https://github.com/jamesturk/jellyfish) - Approximate string matching. -[pyup](https://github.com/pyupio/pyup) - Python dependency management. #### Blogs [Distill.pub](https://distill.pub/) - Blog. From 2b285303b368ea873db5d218dcc4f7096b714684 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 Jul 2019 18:23:46 +0200 Subject: [PATCH 014/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0d8921e..b662efb 100644 --- a/README.md +++ b/README.md @@ -245,6 +245,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. [CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. [imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). +[microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). #### Image Processing [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) @@ -285,7 +286,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [tensorforce](https://github.com/reinforceio/tensorforce) - Tensorflow for applied reinforcement learning. [fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. [ignite](https://github.com/pytorch/ignite) - Highlevel library for pytorch. -[skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch. +[skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). [Detectron](https://github.com/facebookresearch/Detectron) - Object Detection by Facebook. [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. [simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. From 873bf1b6bf1517293b4589cf00ae9abcc9b7e544 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 15 Jul 2019 08:25:55 +0200 Subject: [PATCH 015/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b662efb..d61049b 100644 --- a/README.md +++ b/README.md @@ -131,7 +131,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). [sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) - Truncated SVD (aka LSA). [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). -[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU). +[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4). From 94f7c866c807a4463371ffd56d7114bf172665c2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 21 Jul 2019 15:25:06 +0200 Subject: [PATCH 016/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d61049b..7300a16 100644 --- a/README.md +++ b/README.md @@ -523,6 +523,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe ##### General [pyup](https://github.com/pyupio/pyup) - Python dependency management. [pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. +[pypi2nix] - Fix package versions and create reproducible environments, [Talk](https://www.youtube.com/watch?v=USDbjmxEZ_I). ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From 66875dae4485cf14eef623e9ff9cf7a9c525bca0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 21 Jul 2019 21:00:57 +0200 Subject: [PATCH 017/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7300a16..026d07f 100644 --- a/README.md +++ b/README.md @@ -133,7 +133,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. -[scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4). +[scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). From 8a11a990731ff845cf578392cafeab22b3c88fad Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 9 Aug 2019 16:38:40 +0200 Subject: [PATCH 018/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 026d07f..a83d652 100644 --- a/README.md +++ b/README.md @@ -246,6 +246,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. [imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). [microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). +[cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. #### Image Processing [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) From d2133d8d630f6018b882b212364412c95f1e2835 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 27 Aug 2019 13:40:55 +0200 Subject: [PATCH 019/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index a83d652..b0932e4 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,6 @@ General tricks: [link](https://www.dataquest.io/blog/jupyter-notebook-tips-trick Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) - Project template for data science projects. [nteract](https://nteract.io/) - Open Jupyter Notebooks with doubleclick. -[modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. [swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. [xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. [blackcellmagic](https://github.com/csurfer/blackcellmagic) - Code formatting for jupyter notebooks. @@ -27,6 +26,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [ipysheet](https://github.com/QuantStack/ipysheet) - Jupyter spreadsheet widget. [nbdime](https://github.com/jupyter/nbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https://www.reviewnb.com/). [RISE](https://github.com/damianavila/RISE) - Turn Jupyter notebooks into presentations. +[papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). #### Helpful [pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. @@ -43,6 +43,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [dask](https://github.com/dask/dask), [dask-ml](http://ml.dask.org/) - Pandas `DataFrame` for big data and machine learning library, [resources](https://matthewrocklin.com/blog//work/2018/07/17/dask-dev), [talk1](https://www.youtube.com/watch?v=ccfsbuqsjgI), [talk2](https://www.youtube.com/watch?v=RA_2qdipVng), [notebooks](https://github.com/dask/dask-ec2/tree/master/notebooks), [videos](https://www.youtube.com/user/mdrocklin). [dask-gateway](https://github.com/jcrist/dask-gateway) - Managing dask clusters. [turicreate](https://github.com/apple/turicreate) - Helpful `SFrame` class for out-of-memory dataframes. +[modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. [h2o](https://github.com/h2oai/h2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes. [datatable](https://github.com/h2oai/datatable) - Data Table for big data support. [cuDF](https://github.com/rapidsai/cudf) - GPU DataFrame Library. From 6abacec01fa742f1b52ee12a8c4a22580913d6b6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 1 Sep 2019 14:51:08 +0200 Subject: [PATCH 020/550] Update README.md --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b0932e4..574055b 100644 --- a/README.md +++ b/README.md @@ -56,11 +56,12 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. [zappy](https://github.com/lasersonlab/zappy) - Distributed numpy arrays. -##### Command line tools +##### Command line tools, CSV [ni](https://github.com/spencertipping/ni) - Command line tool for big data. [xsv](https://github.com/BurntSushi/xsv) - Command line tool for indexing, slicing, analyzing, splitting and joining CSV files. [csvkit](https://csvkit.readthedocs.io/en/1.0.3/) - Another command line tool for CSV files. [csvsort](https://pypi.org/project/csvsort/) - Sort large csv files. +[tsv-utils](https://github.com/eBay/tsv-utils) - Tools for working with CSV files by ebay. #### Classical Statistics [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). @@ -394,6 +395,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. +[stockstats](https://github.com/jealous/stockstats) - Pandas DataFrame wrapper for working with stock data. #### Survival Analysis [Time-dependent Cox Model in R](https://stats.stackexchange.com/questions/101353/cox-regression-with-time-varying-covariates). @@ -533,8 +535,9 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [mlflow](https://mlflow.org/) - Manage the machine learning lifecycle, including experimentation, reproducibility and deployment. [modelchimp](https://github.com/ModelChimp/modelchimp) - Experiment Tracking. [skll](https://github.com/EducationalTestingService/skll) - Command-line utilities to make it easier to run machine learning experiments. -[BentoML](https://github.com/bentoml/BentoML) - Package and deploy machine learning models for serving in production +[BentoML](https://github.com/bentoml/BentoML) - Package and deploy machine learning models for serving in production. [dvc](https://github.com/iterative/dvc) - Versioning for ML projects. +[dagster](https://github.com/dagster-io/dagster) - Tool with focus on dependency graphs. #### Math and Background Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) From 1995cedebf54701c2ad17a19b16d5dbcfaae13af Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 7 Sep 2019 17:04:16 +0200 Subject: [PATCH 021/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 574055b..344095b 100644 --- a/README.md +++ b/README.md @@ -207,6 +207,7 @@ Examples: [1](https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and- [scikit-garden](https://github.com/scikit-garden/scikit-garden) - Quantile Regression. [grf](https://github.com/grf-labs/grf) - Generalized random forest. [dtreeviz](https://github.com/parrt/dtreeviz) - Decision tree visualization and model interpretation. +[Nuance](https://github.com/SauceCat/Nuance) - Decision tree visualization. [rfpimp](https://github.com/parrt/random-forest-importances) - Feature Importance for RandomForests using Permuation Importance. Why the default feature importance for random forests is wrong: [link](http://explained.ai/rf-importance/index.html) [treeinterpreter](https://github.com/andosa/treeinterpreter) - Interpreting scikit-learn's decision tree and random forest predictions. From 74fdb53a786f7b564fcfd0052ee360bcc3b196a3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 8 Sep 2019 18:30:52 +0200 Subject: [PATCH 022/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 344095b..f31fd6d 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). #### Helpful +[icecream](https://github.com/gruns/icecream) - Simple debugging output. [pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. [intake](https://github.com/intake/intake) - Loading datasets made easier, [talk](https://www.youtube.com/watch?v=s7Ww5-vD2Os&t=33m40s). From 71a71fcbadda6560c47491ffee45e684e3aa1417 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 9 Sep 2019 15:44:00 +0200 Subject: [PATCH 023/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index f31fd6d..ba03e02 100644 --- a/README.md +++ b/README.md @@ -273,6 +273,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [imgaug_extension](https://github.com/cadenai/imgaug_extension) - Extension for imgaug. [albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. [Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library. +[Random-Erasing](https://github.com/zhunzhong07/Random-Erasing) - Image augmentation technique. [tcav](https://github.com/tensorflow/tcav) - Interpretability method. [cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. From f4e3f74cda873a492164a862569aeb35983d1b10 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 9 Sep 2019 15:45:25 +0200 Subject: [PATCH 024/550] Update README.md --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ba03e02..179586a 100644 --- a/README.md +++ b/README.md @@ -268,12 +268,11 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [Visualization of optimization algorithms](https://vis.ensmallen.org/) ##### Image Related -[keras preprocessing](https://keras.io/preprocessing/image/) - Preprocess images. [imgaug](https://github.com/aleju/imgaug) - More sophisticated image preprocessing. [imgaug_extension](https://github.com/cadenai/imgaug_extension) - Extension for imgaug. -[albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. [Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library. -[Random-Erasing](https://github.com/zhunzhong07/Random-Erasing) - Image augmentation technique. +[keras preprocessing](https://keras.io/preprocessing/image/) - Preprocess images. +[albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. [tcav](https://github.com/tensorflow/tcav) - Interpretability method. [cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. From ec9e3821cdd6f22f2a3449144195f5aedc8ae5f5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 17 Sep 2019 00:11:25 +0200 Subject: [PATCH 025/550] Update README.md --- README.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 179586a..dad401b 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,6 @@ [pandas_summary](https://github.com/mouradmourafiq/pandas-summary) - Basic statistics using `DataFrameSummary(df).summary()`. [pandas_profiling](https://github.com/pandas-profiling/pandas-profiling) - Descriptive statistics using `ProfileReport`. [sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Helpful `DataFrameMapper` class. -[janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. #### Pandas and Jupyter @@ -26,9 +25,11 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [ipysheet](https://github.com/QuantStack/ipysheet) - Jupyter spreadsheet widget. [nbdime](https://github.com/jupyter/nbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https://www.reviewnb.com/). [RISE](https://github.com/damianavila/RISE) - Turn Jupyter notebooks into presentations. -[papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). +[papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). +[pixiedust](https://github.com/pixiedust/pixiedust) - Helper library for Jupyter. #### Helpful +[tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. [icecream](https://github.com/gruns/icecream) - Simple debugging output. [pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. [intake](https://github.com/intake/intake) - Loading datasets made easier, [talk](https://www.youtube.com/watch?v=s7Ww5-vD2Os&t=33m40s). @@ -90,6 +91,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). +[janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Resampling for imbalanced datasets. @@ -380,7 +382,7 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor [pydlm](https://github.com/wwrechard/pydlm) - Bayesian time series modeling ([R package](https://cran.r-project.org/web/packages/bsts/index.html), [Blog post](http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html)) [PyAF](https://github.com/antoinecarme/pyaf) - Automatic Time Series Forecasting. [luminol](https://github.com/linkedin/luminol) - Anomaly Detection and Correlation library from Linkedin. -[matrixprofile-ts](https://github.com/target/matrixprofile-ts) - Detecting patterns and anomalies, [website](https://www.cs.ucr.edu/~eamonn/MatrixProfile.html), [ppt](https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1.pdf). +[matrixprofile-ts](https://github.com/target/matrixprofile-ts) - Detecting patterns and anomalies, [website](https://www.cs.ucr.edu/~eamonn/MatrixProfile.html), [ppt](https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1.pdf), [alternative](https://github.com/matrix-profile-foundation/mass-ts). [stumpy](https://github.com/TDAmeritrade/stumpy) - Another matrix profile library. [obspy](https://github.com/obspy/obspy) - Seismology package. Useful `classic_sta_lta` function. [RobustSTL](https://github.com/LeeDoYup/RobustSTL) - Robust Seasonal-Trend Decomposition. @@ -414,7 +416,9 @@ RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests). [eif](https://github.com/sahandha/eif) - Extended Isolation Forest. [AnomalyDetection](https://github.com/twitter/AnomalyDetection) - Anomaly detection (R package). [luminol](https://github.com/linkedin/luminol) - Anomaly Detection and Correlation library from Linkedin. -Distances for comparing histograms and detecting outliers - [Talk](https://www.youtube.com/watch?v=U7xdiGc7IRU): [Kolmogorov-Smirnov](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ks_2samp.html), [Wasserstein](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html), [Energy Distance (Cramer)](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.energy_distance.html), [Kullback-Leibler divergence](https://scipy.github.io/devdocs/generated/scipy.stats.entropy.html) +Distances for comparing histograms and detecting outliers - [Talk](https://www.youtube.com/watch?v=U7xdiGc7IRU): [Kolmogorov-Smirnov](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ks_2samp.html), [Wasserstein](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html), [Energy Distance (Cramer)](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.energy_distance.html), [Kullback-Leibler divergence](https://scipy.github.io/devdocs/generated/scipy.stats.entropy.html). +[banpei](https://github.com/tsurubee/banpei) - Anomaly detection library based on singular spectrum transformation. +[telemanom](https://github.com/khundman/telemanom) - Detect anomalies in multivariate time series data using LSTMs. #### Ranking [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear classification, regression and ranking. @@ -540,6 +544,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [BentoML](https://github.com/bentoml/BentoML) - Package and deploy machine learning models for serving in production. [dvc](https://github.com/iterative/dvc) - Versioning for ML projects. [dagster](https://github.com/dagster-io/dagster) - Tool with focus on dependency graphs. +[knockknock](https://github.com/huggingface/knockknock) - Be notified when your training ends. #### Math and Background Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) From 123168ec44055eb7b057f8be47c2dd16457f8f91 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 19 Sep 2019 00:09:27 +0200 Subject: [PATCH 026/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index dad401b..a35caff 100644 --- a/README.md +++ b/README.md @@ -127,7 +127,8 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [scikit-genetic](https://github.com/manuel-calzolari/sklearn-genetic) - Genetic feature selection. [boruta_py](https://github.com/scikit-learn-contrib/boruta_py) - Feature selection, [explaination](https://stats.stackexchange.com/questions/264360/boruta-all-relevant-feature-selection-vs-random-forest-variables-of-importanc/264467), [example](https://www.kaggle.com/tilii7/boruta-feature-elimination). [linselect](https://github.com/efavdb/linselect) - Feature selection package. -[mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/) - Exhaustive feature selection. +[mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/) - Exhaustive feature selection. +[BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. #### Dimensionality Reduction [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU) @@ -585,7 +586,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) -[Awesome Network Embedding](https://github.com/chihming/awesome-network-embedding) [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) [Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science) From 56e98cca863b19566a03cef109f62a367f330e86 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 19 Sep 2019 00:18:43 +0200 Subject: [PATCH 027/550] Update README.md --- README.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index a35caff..a3f5d89 100644 --- a/README.md +++ b/README.md @@ -114,6 +114,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [featuretools](https://github.com/Featuretools/featuretools) - Automated feature engineering, [example](https://github.com/WillKoehrsen/automated-feature-engineering/blob/master/walk_through/Automated_Feature_Engineering.ipynb). [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. +[feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. #### Feature Selection [Talk](https://www.youtube.com/watch?v=JsArBz46_3s) @@ -276,7 +277,6 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library. [keras preprocessing](https://keras.io/preprocessing/image/) - Preprocess images. [albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. -[tcav](https://github.com/tensorflow/tcav) - Interpretability method. [cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. #### Text Related @@ -295,11 +295,10 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. [ignite](https://github.com/pytorch/ignite) - Highlevel library for pytorch. [skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). -[Detectron](https://github.com/facebookresearch/Detectron) - Object Detection by Facebook. [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. -[simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. [PlotNeuralNet](https://github.com/HarisIqbal88/PlotNeuralNet) - Plot neural networks. [lucid](https://github.com/tensorflow/lucid) - Neural network interpretability, [Activation Maps](https://openai.com/blog/introducing-activation-atlases/). +[tcav](https://github.com/tensorflow/tcav) - Interpretability method. [AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD, [alt](https://github.com/titu1994/keras-adabound). [caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). [foolbox](https://github.com/bethgelab/foolbox) - Adversarial examples that fool neural networks. @@ -308,6 +307,12 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. +#### Object detection +[Detectron](https://github.com/facebookresearch/Detectron) - Object Detection by Facebook. +[simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. +[CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. +[FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. + ##### Applications and Snippets [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Promising neural network architecture. [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. @@ -315,7 +320,6 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [Entity Embeddings of Categorical Variables](https://arxiv.org/abs/1604.06737), [code](https://github.com/entron/entity-embedding-rossmann), [kaggle](https://www.kaggle.com/aquatic/entity-embedding-neural-net/code) [Image Super-Resolution](https://github.com/idealo/image-super-resolution) - Super-scaling using a Residual Dense Network. Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Posts: [1](https://www.thomasjpfan.com/2018/07/nuclei-image-segmentation-tutorial/), [2](https://www.thomasjpfan.com/2017/08/hassle-free-unets/) -[CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. [deeplearning-models](https://github.com/rasbt/deeplearning-models) - Deep learning models. #### GPU @@ -438,6 +442,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [edward](https://github.com/blei-lab/edward) - Probabilistic modeling, inference, and criticism, [Mixture Density Networks (MNDs)](http://edwardlib.org/tutorials/mixture-density-network), [MDN Explanation](https://towardsdatascience.com/a-hitchhikers-guide-to-mixture-density-networks-76b435826cca). [Pyro](https://github.com/pyro-ppl/pyro) - Deep Universal Probabilistic Programming. [tensorflow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). +[bambi](https://github.com/bambinos/bambi) - High-level Bayesian model-building interface on top of PyMC3. #### Stacking Models and Ensembles [Model Stacking Blog Post](http://blog.kaggle.com/2017/06/15/stacking-made-easy-an-introduction-to-stacknet-by-competitions-grandmaster-marios-michailidis-kazanova/) From 3456de15879dc0e9982c1cbc51d636100673c23b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 19 Sep 2019 11:24:48 +0200 Subject: [PATCH 028/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index a3f5d89..6702c46 100644 --- a/README.md +++ b/README.md @@ -393,7 +393,8 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor [RobustSTL](https://github.com/LeeDoYup/RobustSTL) - Robust Seasonal-Trend Decomposition. [seglearn](https://github.com/dmbee/seglearn) - Time Series library. [pyts](https://github.com/johannfaouzi/pyts) - Time series transformation and classification, [Imaging time series](https://pyts.readthedocs.io/en/latest/auto_examples/index.html#imaging-time-series). -Turn time series into images and use Neural Nets: [example](https://gist.github.com/oguiza/c9c373aec07b96047d1ba484f23b7b47), [example](https://github.com/kiss90/time-series-classification). +Turn time series into images and use Neural Nets: [example](https://gist.github.com/oguiza/c9c373aec07b96047d1ba484f23b7b47), [example](https://github.com/kiss90/time-series-classification). +[sktime](https://github.com/alan-turing-institute/sktime), [sktime-dl](https://github.com/uea-machine-learning/sktime-dl) - Toolbox for (deep) learning with time series. ##### Time Series Evaluation From 8f1f9f178e005411a58832fd432bd43b36a8e2e0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 27 Sep 2019 10:06:29 +0200 Subject: [PATCH 029/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6702c46..b13b8d2 100644 --- a/README.md +++ b/README.md @@ -537,7 +537,8 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe #### Deployment and Lifecycle Management -##### General +##### Dependency Management +[pipreqs](https://github.com/bndr/pipreqs) - Generate a requirements.txt from import statements. [pyup](https://github.com/pyupio/pyup) - Python dependency management. [pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. [pypi2nix] - Fix package versions and create reproducible environments, [Talk](https://www.youtube.com/watch?v=USDbjmxEZ_I). From 4dd23fabf91c72f2a1306f401a234c7106592aba Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 29 Sep 2019 16:11:20 +0200 Subject: [PATCH 030/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b13b8d2..76f4215 100644 --- a/README.md +++ b/README.md @@ -165,7 +165,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. #### Dashboards -[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo), [example](https://github.com/ned2/slapdash) +[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo) [bokeh](https://github.com/bokeh/bokeh) - Dashboarding solution. [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. [bowtie](https://github.com/jwkvam/bowtie/) - Dashboarding solution. From ecff492fe0da2bad07473b9eaf5dbf88db9fe3c7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 29 Sep 2019 16:13:27 +0200 Subject: [PATCH 031/550] Update README.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 76f4215..00135b1 100644 --- a/README.md +++ b/README.md @@ -166,10 +166,9 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection #### Dashboards [dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo) +[panel](https://panel.pyviz.org/index.html) - Dashboarding solution. [bokeh](https://github.com/bokeh/bokeh) - Dashboarding solution. [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. -[bowtie](https://github.com/jwkvam/bowtie/) - Dashboarding solution. -[panel](https://panel.pyviz.org/index.html) - Dashboarding solution. [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. From 9683e4e3359f6d142f476402158a8bec6c01df85 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 29 Sep 2019 16:28:06 +0200 Subject: [PATCH 032/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 00135b1..6783381 100644 --- a/README.md +++ b/README.md @@ -165,7 +165,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. #### Dashboards -[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo) +[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo), [resources](https://github.com/ucg8j/awesome-dash) [panel](https://panel.pyviz.org/index.html) - Dashboarding solution. [bokeh](https://github.com/bokeh/bokeh) - Dashboarding solution. [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. @@ -585,6 +585,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) [Awesome CSV](https://github.com/secretGeek/AwesomeCSV) [Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) +[Awesome Dash](https://github.com/ucg8j/awesome-dash) [Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning) [Awesome ETL](https://github.com/pawl/awesome-etl) [Awesome Financial Machine Learning](https://github.com/firmai/financial-machine-learning) From cf737385670037fecdbc9fad6ece3068fad23c73 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Sep 2019 10:39:11 +0200 Subject: [PATCH 033/550] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 6783381..e1ea983 100644 --- a/README.md +++ b/README.md @@ -64,6 +64,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [csvkit](https://csvkit.readthedocs.io/en/1.0.3/) - Another command line tool for CSV files. [csvsort](https://pypi.org/project/csvsort/) - Sort large csv files. [tsv-utils](https://github.com/eBay/tsv-utils) - Tools for working with CSV files by ebay. +[cheat](https://github.com/cheat/cheat) - Make cheatsheets for command line commands. #### Classical Statistics [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). @@ -566,12 +567,11 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [metric-learn](https://github.com/metric-learn/metric-learn) - Metric learning. #### General Python Programming -[funcy](https://github.com/Suor/funcy) - Fancy and practical functional tools. [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. -[dill](https://pypi.org/project/dill/) - Serialization, alternative to pickle. -[attrs](https://github.com/python-attrs/attrs) - Python classes without boilerplate. +[funcy](https://github.com/Suor/funcy) - Fancy and practical functional tools. [dateparser](https://dateparser.readthedocs.io/en/latest/) - A better date parser. [jellyfish](https://github.com/jamesturk/jellyfish) - Approximate string matching. +[coloredlogs](https://github.com/xolox/python-coloredlogs) - Colored logging output. #### Blogs [Distill.pub](https://distill.pub/) - Blog. From ed6a06e2b942fc68e8bab8dade655f44c76eb93b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Sep 2019 22:48:18 +0200 Subject: [PATCH 034/550] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e1ea983..ecaf5c7 100644 --- a/README.md +++ b/README.md @@ -280,8 +280,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. #### Text Related -[ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras. -[textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. +[ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras. +[textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. +[ctrl](https://github.com/salesforce/ctrl) - Text generation. ##### Libs [keras](https://keras.io/) - Neural Networks on top of [tensorflow](https://www.tensorflow.org/), [examples](https://gist.github.com/candlewill/552fa102352ccce42fd829ae26277d24). From 5916ab334c910fcd155d1d01114e07de581e8f72 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Sep 2019 23:17:28 +0200 Subject: [PATCH 035/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index ecaf5c7..82ef87a 100644 --- a/README.md +++ b/README.md @@ -142,6 +142,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). +[ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). @@ -594,6 +595,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) +[Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) [Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science) From de94eaf45eff852edb593fc13b5091121e6353eb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Sep 2019 23:22:03 +0200 Subject: [PATCH 036/550] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 82ef87a..a770369 100644 --- a/README.md +++ b/README.md @@ -578,6 +578,10 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin #### Blogs [Distill.pub](https://distill.pub/) - Blog. +#### Datasets and Repositories + +[academictorrents](http://academictorrents.com/) + #### Awesome Lists and Resources [Data Science Notebooks](https://github.com/donnemartin/data-science-ipython-notebooks) [Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) From 56c68b46e6aa8989dc0d64ea0b7a4c9304d95e12 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Sep 2019 23:24:33 +0200 Subject: [PATCH 037/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index a770369..e0b668c 100644 --- a/README.md +++ b/README.md @@ -581,6 +581,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin #### Datasets and Repositories [academictorrents](http://academictorrents.com/) +[datasetlist](https://www.datasetlist.com/) #### Awesome Lists and Resources [Data Science Notebooks](https://github.com/donnemartin/data-science-ipython-notebooks) From 862b11f2acca0a5b7c99f08f2d1f40b135c5a6af Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 5 Oct 2019 14:40:48 +0200 Subject: [PATCH 038/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e0b668c..da9df6a 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [RISE](https://github.com/damianavila/RISE) - Turn Jupyter notebooks into presentations. [papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). [pixiedust](https://github.com/pixiedust/pixiedust) - Helper library for Jupyter. +[pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. #### Helpful [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. From 8e512cdc33cebf31f79f8396f06c96e6df702969 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 5 Oct 2019 22:16:37 +0200 Subject: [PATCH 039/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index da9df6a..e7f6d8b 100644 --- a/README.md +++ b/README.md @@ -174,6 +174,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. +[steamlit](https://github.com/streamlit/streamlit) - Dashboards. #### Geopraphical Tools [folium](https://github.com/python-visualization/folium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https://github.com/jupyter-widgets/ipyleaflet). From 802a657d5000170e0e449647529429c7e03a260e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 6 Oct 2019 11:02:25 +0200 Subject: [PATCH 040/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e7f6d8b..423aa0f 100644 --- a/README.md +++ b/README.md @@ -410,6 +410,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. [stockstats](https://github.com/jealous/stockstats) - Pandas DataFrame wrapper for working with stock data. +[pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. #### Survival Analysis [Time-dependent Cox Model in R](https://stats.stackexchange.com/questions/101353/cox-regression-with-time-varying-covariates). From 0ad4a855ac90e12359d87da22b32eed23d4c34a1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 8 Oct 2019 16:25:49 +0200 Subject: [PATCH 041/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 423aa0f..2796b56 100644 --- a/README.md +++ b/README.md @@ -102,6 +102,9 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pyupset](https://github.com/ImSoErgodic/py-upset) - Visualizing intersecting sets. [pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance, similarity between histograms. +#### Train / Test Split +[iterative-stratification](https://github.com/trent-b/iterative-stratification) - Stratification of multilabel data. + #### Feature Engineering [Talk](https://www.youtube.com/watch?v=68ABAU_V8qI) [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) - Pipeline, [examples](https://github.com/jem1031/pandas-pipelines-custom-transformers). From b77d1c32d7df3ce48803b784148a31a28b6de9d7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 10 Oct 2019 14:57:07 +0200 Subject: [PATCH 042/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2796b56..065e63a 100644 --- a/README.md +++ b/README.md @@ -147,6 +147,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. +[trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). From 2e2a8e65ae9eb084512392760aa4dcff1df1bc7e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 11 Oct 2019 09:41:53 +0200 Subject: [PATCH 043/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 065e63a..532eccd 100644 --- a/README.md +++ b/README.md @@ -316,7 +316,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. #### Object detection -[Detectron](https://github.com/facebookresearch/Detectron) - Object Detection by Facebook. +[detectron2](https://github.com/facebookresearch/detectron2) - Object Detection (Mask R-CNN) by Facebook. [simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. [CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. [FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. From c1367cf5bbf5798e30dc8dbd21abed1a54ca8bea Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 15 Oct 2019 12:46:28 +0200 Subject: [PATCH 044/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 532eccd..21f91fe 100644 --- a/README.md +++ b/README.md @@ -350,6 +350,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [DESlib](https://github.com/scikit-learn-contrib/DESlib) - Dynamic classifier and ensemble selection #### Clustering +[Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering) [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU). From a5a5330ca3322c272473e3b4dfd93ac9b9480fa8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 20 Oct 2019 14:31:08 +0200 Subject: [PATCH 045/550] Update README.md --- README.md | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 21f91fe..e06a646 100644 --- a/README.md +++ b/README.md @@ -583,16 +583,15 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [jellyfish](https://github.com/jamesturk/jellyfish) - Approximate string matching. [coloredlogs](https://github.com/xolox/python-coloredlogs) - Colored logging output. -#### Blogs +#### Resources [Distill.pub](https://distill.pub/) - Blog. - -#### Datasets and Repositories - -[academictorrents](http://academictorrents.com/) -[datasetlist](https://www.datasetlist.com/) - -#### Awesome Lists and Resources +[Machine Learning Videos](https://github.com/dustinvtran/ml-videos) [Data Science Notebooks](https://github.com/donnemartin/data-science-ipython-notebooks) +[Recommender Systems (Microsoft)](https://github.com/Microsoft/Recommenders) +[The GAN Zoo](https://deephunt.in/the-gan-zoo-79597dc8c347) - List of Generative Adversarial Networks +[Datascience Cheatsheets](https://github.com/FavioVazquez/ds-cheatsheets) + +##### Other Awesome Lists [Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) [Awesome AI Booksmarks](https://github.com/goodrahstar/my-awesome-AI-bookmarks) [Awesome AI on Kubernetes](https://github.com/CognonicLabs/awesome-AI-kubernetes) @@ -618,9 +617,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Sentence Embedding](https://github.com/Separius/awesome-sentence-embedding) [Awesome Time Series](https://github.com/MaxBenChrist/awesome_time_series_in_python) [Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) -[Recommender Systems (Microsoft)](https://github.com/Microsoft/Recommenders) -[The GAN Zoo](https://deephunt.in/the-gan-zoo-79597dc8c347) - List of Generative Adversarial Networks -[Datascience Cheatsheets](https://github.com/FavioVazquez/ds-cheatsheets) #### Things I google a lot [Frequency codes for time series](https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) From b1d162ad2216b3b47db509ba60ba5f7e69f49ae8 Mon Sep 17 00:00:00 2001 From: Eyal Trabelsi Date: Mon, 28 Oct 2019 08:48:20 +0200 Subject: [PATCH 046/550] add pandas-log --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e06a646..e978e2e 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). [pixiedust](https://github.com/pixiedust/pixiedust) - Helper library for Jupyter. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. +[pandas-log](https://github.com/eyaltrabelsi/pandas-log) - A package which allow to provide feedback about basic pandas operations and find both buisness logic and performance issues. #### Helpful [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. From b44536cf0c618b7ba5f39a99a91a19833014d95f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 31 Oct 2019 13:28:51 +0100 Subject: [PATCH 047/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e978e2e..c67cc5f 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). [pixiedust](https://github.com/pixiedust/pixiedust) - Helper library for Jupyter. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. -[pandas-log](https://github.com/eyaltrabelsi/pandas-log) - A package which allow to provide feedback about basic pandas operations and find both buisness logic and performance issues. +[pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. #### Helpful [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. From 04f0d8834a8745d74946c1e6fbcc67689dbf9999 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 3 Nov 2019 18:35:43 +0100 Subject: [PATCH 048/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c67cc5f..2d95db7 100644 --- a/README.md +++ b/README.md @@ -589,7 +589,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Machine Learning Videos](https://github.com/dustinvtran/ml-videos) [Data Science Notebooks](https://github.com/donnemartin/data-science-ipython-notebooks) [Recommender Systems (Microsoft)](https://github.com/Microsoft/Recommenders) -[The GAN Zoo](https://deephunt.in/the-gan-zoo-79597dc8c347) - List of Generative Adversarial Networks +[The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks [Datascience Cheatsheets](https://github.com/FavioVazquez/ds-cheatsheets) ##### Other Awesome Lists From 2fb24515c0ff6a9c59eafa9752437afbb5d0a581 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 7 Nov 2019 09:08:56 +0100 Subject: [PATCH 049/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2d95db7..d71b7c4 100644 --- a/README.md +++ b/README.md @@ -144,7 +144,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). [sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) - Truncated SVD (aka LSA). [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). -[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer). +[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. From 9411f872ac1ce93d1a9e147839e0aaa9d632e46e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 17 Nov 2019 10:03:44 +0100 Subject: [PATCH 050/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d71b7c4..18fa575 100644 --- a/README.md +++ b/README.md @@ -598,6 +598,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome AI on Kubernetes](https://github.com/CognonicLabs/awesome-AI-kubernetes) [Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) [Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) +[Awsome Causality](https://github.com/rguo12/awesome-causality-algorithms) [Awesome CSV](https://github.com/secretGeek/AwesomeCSV) [Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) [Awesome Dash](https://github.com/ucg8j/awesome-dash) From c663317739a41cf71cbd34da435dd78425de85c0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 21 Nov 2019 13:47:55 +0100 Subject: [PATCH 051/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 18fa575..b9592cc 100644 --- a/README.md +++ b/README.md @@ -598,7 +598,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome AI on Kubernetes](https://github.com/CognonicLabs/awesome-AI-kubernetes) [Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) [Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) -[Awsome Causality](https://github.com/rguo12/awesome-causality-algorithms) +[Awesome Causality](https://github.com/rguo12/awesome-causality-algorithms) [Awesome CSV](https://github.com/secretGeek/AwesomeCSV) [Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) [Awesome Dash](https://github.com/ucg8j/awesome-dash) From 208e64dd995cec0ef545d661e9ef3c806f2c4af1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 4 Dec 2019 11:02:14 +0100 Subject: [PATCH 052/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b9592cc..2a86fd5 100644 --- a/README.md +++ b/README.md @@ -564,6 +564,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [dvc](https://github.com/iterative/dvc) - Versioning for ML projects. [dagster](https://github.com/dagster-io/dagster) - Tool with focus on dependency graphs. [knockknock](https://github.com/huggingface/knockknock) - Be notified when your training ends. +[metaflow](https://github.com/Netflix/metaflow) - Lifecycle Management Tool by Netflix. #### Math and Background Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) From 162f84bcbdcc9ddfb97f58d59a1291d10ccb9e0d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 6 Dec 2019 10:21:32 +0100 Subject: [PATCH 053/550] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 2a86fd5..184961f 100644 --- a/README.md +++ b/README.md @@ -271,21 +271,21 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www #### Neural Networks -##### Tutorials +##### Tutorials & Viewer [Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) fast.ai course - [Lessons 1-7](https://course.fast.ai/videos/?lesson=1), [Lessons 8-14](http://course18.fast.ai/lessons/lessons2.html) [Tensorflow without a PhD](https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd) - Neural Network course by Google. Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [PPT](http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture12.pdf) [Tensorflow Playground](https://playground.tensorflow.org/) [Visualization of optimization algorithms](https://vis.ensmallen.org/) +[cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. ##### Image Related [imgaug](https://github.com/aleju/imgaug) - More sophisticated image preprocessing. -[imgaug_extension](https://github.com/cadenai/imgaug_extension) - Extension for imgaug. [Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library. [keras preprocessing](https://keras.io/preprocessing/image/) - Preprocess images. [albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. -[cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. +[augmix](https://github.com/google-research/augmix) - Image augmentation from Google. #### Text Related [ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras. From 8ad2dad0d86a5722a8f75875cdd8b81cdb5c7d33 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 13 Dec 2019 21:37:01 +0100 Subject: [PATCH 054/550] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 184961f..2846cf2 100644 --- a/README.md +++ b/README.md @@ -550,9 +550,10 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe ##### Dependency Management [pipreqs](https://github.com/bndr/pipreqs) - Generate a requirements.txt from import statements. -[pyup](https://github.com/pyupio/pyup) - Python dependency management. +[dephell](https://github.com/dephell/dephell) - Dependency management. +[poetry](https://github.com/python-poetry/poetry) - Dependency management. +[pyup](https://github.com/pyupio/pyup) - Dependency management. [pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. -[pypi2nix] - Fix package versions and create reproducible environments, [Talk](https://www.youtube.com/watch?v=USDbjmxEZ_I). ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From 7f9730b0be693d3164840e99da94503586a7d77f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 16 Dec 2019 13:30:07 +0100 Subject: [PATCH 055/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2846cf2..bd7ab1e 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ #### Pandas and Jupyter General tricks: [link](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) +Fixing environment: [link](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) - Project template for data science projects. [nteract](https://nteract.io/) - Open Jupyter Notebooks with doubleclick. From 96700af65ffa9db67adccb3c6bc7e00259e854d7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 17 Dec 2019 17:54:29 +0100 Subject: [PATCH 056/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bd7ab1e..54ad974 100644 --- a/README.md +++ b/README.md @@ -180,7 +180,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. -[steamlit](https://github.com/streamlit/streamlit) - Dashboards. +[streamlit](https://github.com/streamlit/streamlit) - Dashboards. #### Geopraphical Tools [folium](https://github.com/python-visualization/folium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https://github.com/jupyter-widgets/ipyleaflet). From bc0503fcafe94088dcd572be793ab11fe286712f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 21 Dec 2019 11:03:27 +0100 Subject: [PATCH 057/550] Update README.md --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 54ad974..24e84cb 100644 --- a/README.md +++ b/README.md @@ -556,6 +556,10 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [pyup](https://github.com/pyupio/pyup) - Dependency management. [pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. +##### Data Versioning +[dvc](https://github.com/iterative/dvc) - Version control for large files. +[hangar](https://github.com/tensorwerk/hangar-py) - Version control for tensor data. + ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. [sklearn-porter](https://github.com/nok/sklearn-porter) - Transpile trained scikit-learn estimators to C, Java, JavaScript and others. @@ -563,7 +567,6 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [modelchimp](https://github.com/ModelChimp/modelchimp) - Experiment Tracking. [skll](https://github.com/EducationalTestingService/skll) - Command-line utilities to make it easier to run machine learning experiments. [BentoML](https://github.com/bentoml/BentoML) - Package and deploy machine learning models for serving in production. -[dvc](https://github.com/iterative/dvc) - Versioning for ML projects. [dagster](https://github.com/dagster-io/dagster) - Tool with focus on dependency graphs. [knockknock](https://github.com/huggingface/knockknock) - Be notified when your training ends. [metaflow](https://github.com/Netflix/metaflow) - Lifecycle Management Tool by Netflix. From c4ccacd67f4bb79c1caf20ea901f07e47ee147ca Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 21 Dec 2019 16:05:23 +0100 Subject: [PATCH 058/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 24e84cb..9d6cab7 100644 --- a/README.md +++ b/README.md @@ -354,6 +354,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering) [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. +[GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. From 7c10623d1f94e59d77705df144280f49cc162126 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 29 Dec 2019 11:06:03 +0100 Subject: [PATCH 059/550] Update README.md --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 9d6cab7..2762430 100644 --- a/README.md +++ b/README.md @@ -143,11 +143,12 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) - Multidimensional scaling (MDS). [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). -[sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) - Truncated SVD (aka LSA). -[mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). -[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. +[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/). +UMAP does not preserve global structure any better than t-SNE when using the same initialization - [paper](https://www.biorxiv.org/content/10.1101/2019.12.19.877522v1). [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). +[mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). +[sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) - Truncated SVD (aka LSA). [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. [trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. From 9063c576604552ee2082e2eae6e66fc7a75d9e79 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Dec 2019 17:58:20 +0100 Subject: [PATCH 060/550] Added some papers --- README.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 2762430..ef8118d 100644 --- a/README.md +++ b/README.md @@ -70,12 +70,19 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [cheat](https://github.com/cheat/cheat) - Make cheatsheets for command line commands. #### Classical Statistics + +##### Texts +[Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) +[Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) +[Chatruc - The Central Limit Theorem and its misuse](https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) +[Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) +[Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) +[Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) + +#### Statistical Tests and Packages [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. -[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. - -#### Tests -[Blog post](https://lindeloev.github.io/tests-as-linear/) +[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). From 8d316ff86d9401d596bfcc7b54fb56847f4062aa Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Dec 2019 18:05:17 +0100 Subject: [PATCH 061/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ef8118d..41ea705 100644 --- a/README.md +++ b/README.md @@ -78,6 +78,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) [Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) [Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) +[Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) #### Statistical Tests and Packages [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). From c3c7e0ab3480d281bb9660ab18a9f5f34bb78c9d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Dec 2019 18:07:48 +0100 Subject: [PATCH 062/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 41ea705..450c5f8 100644 --- a/README.md +++ b/README.md @@ -72,11 +72,11 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Classical Statistics ##### Texts -[Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) +[Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) [Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) [Chatruc - The Central Limit Theorem and its misuse](https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) [Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) -[Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) +[Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) [Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) From 9b88fad8760c71b21ea014e8abdb1df9841250ba Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 3 Jan 2020 13:25:31 +0100 Subject: [PATCH 063/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 450c5f8..a1e666d 100644 --- a/README.md +++ b/README.md @@ -243,7 +243,7 @@ Why the default feature importance for random forests is wrong: [link](http://ex [talk](https://www.youtube.com/watch?v=6zm9NC9uRkk)-[nb](https://nbviewer.jupyter.org/github/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb), [nb2](https://ahmedbesbes.com/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html), [talk](https://www.youtube.com/watch?time_continue=2&v=sI7VpFNiy_I). [Text classification Intro](https://mlwhiz.com/blog/2018/12/17/text_classification/), [Preprocessing blog post](https://mlwhiz.com/blog/2019/01/17/deeplearning_nlp_preprocess/). [gensim](https://radimrehurek.com/gensim/) - NLP, doc2vec, word2vec, text processing, topic modelling (LSA, LDA), [Example](https://markroxor.github.io/gensim/static/notebooks/gensim_news_classification.html), [Coherence Model](https://radimrehurek.com/gensim/models/coherencemodel.html) for evaluation. -Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www.kaggle.com/jhoward/improved-lstm-baseline-glove-dropout)], [[2](https://www.kaggle.com/sbongo/do-pretrained-embeddings-give-you-the-extra-edge)]), [StarSpace](https://github.com/facebookresearch/StarSpace), [wikipedia2vec](https://wikipedia2vec.github.io/wikipedia2vec/pretrained/). +Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www.kaggle.com/jhoward/improved-lstm-baseline-glove-dropout)], [[2](https://www.kaggle.com/sbongo/do-pretrained-embeddings-give-you-the-extra-edge)]), [StarSpace](https://github.com/facebookresearch/StarSpace), [wikipedia2vec](https://wikipedia2vec.github.io/wikipedia2vec/pretrained/), [visualization](https://projector.tensorflow.org/). [magnitude](https://github.com/plasticityai/magnitude) - Vector embedding utility package. [pyldavis](https://github.com/bmabey/pyLDAvis) - Visualization for topic modelling. [spaCy](https://spacy.io/) - NLP. From 1d239d9194b3c36bdde208d7852f231ea64ad328 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 3 Jan 2020 21:18:04 +0100 Subject: [PATCH 064/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a1e666d..95afb09 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. #### Pandas and Jupyter -General tricks: [link](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) +[General tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/), [Clean Coding (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) Fixing environment: [link](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) - Project template for data science projects. From c9cc4b83f2d124fa3f6971081f2478a4e7dc826b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 3 Jan 2020 22:12:24 +0100 Subject: [PATCH 065/550] Update README.md --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 95afb09..b9b2c56 100644 --- a/README.md +++ b/README.md @@ -101,6 +101,12 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Inverse Propensity Weighting](https://www.youtube.com/watch?v=SUq0shKLPPs) [Dealing with Selection Bias By Propensity Based Feature Selection](https://www.youtube.com/watch?reload=9&v=3ZWCKr0vDtc) +#### Frameworks +[scikit-learn](https://github.com/scikit-learn/scikit-learn) - General machine learning framework. +[h2o](https://github.com/h2oai/h2o-3) - Machine learning framework. +[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). +[mxnet](https://github.com/apache/incubator-mxnet) - Deep learning framework, [book](https://d2l.ai/index.html). + #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. @@ -318,8 +324,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [PlotNeuralNet](https://github.com/HarisIqbal88/PlotNeuralNet) - Plot neural networks. [lucid](https://github.com/tensorflow/lucid) - Neural network interpretability, [Activation Maps](https://openai.com/blog/introducing-activation-atlases/). [tcav](https://github.com/tensorflow/tcav) - Interpretability method. -[AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD, [alt](https://github.com/titu1994/keras-adabound). -[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). +[AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD, [alt](https://github.com/titu1994/keras-adabound). [foolbox](https://github.com/bethgelab/foolbox) - Adversarial examples that fool neural networks. [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Training metrics. [imgclsmob](https://github.com/osmr/imgclsmob) - Pretrained models. @@ -552,11 +557,6 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [RLLib](https://ray.readthedocs.io/en/latest/rllib.html) - Library for reinforcement learning. [Horizon](https://github.com/facebookresearch/Horizon/) - Facebook RL framework. -#### Frameworks -[h2o](https://github.com/h2oai/h2o-3) - Scalable machine learning. -[turicreate](https://github.com/apple/turicreate) - Apple Machine Learning Toolkit. -[astroml](https://github.com/astroML/astroML) - ML for astronomical data. - #### Deployment and Lifecycle Management ##### Dependency Management From 02af0cfd3190cb6e4a595d431d614060e3667144 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 5 Jan 2020 11:46:36 +0100 Subject: [PATCH 066/550] Update README.md --- README.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index b9b2c56..790deee 100644 --- a/README.md +++ b/README.md @@ -12,28 +12,29 @@ [sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Helpful `DataFrameMapper` class. [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. -#### Pandas and Jupyter +#### Environment and Jupyter [General tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/), [Clean Coding (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) Fixing environment: [link](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) - Project template for data science projects. [nteract](https://nteract.io/) - Open Jupyter Notebooks with doubleclick. -[swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. -[xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. -[blackcellmagic](https://github.com/csurfer/blackcellmagic) - Code formatting for jupyter notebooks. -[pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for jupyter notebooks. -[qgrid](https://github.com/quantopian/qgrid) - Pandas `DataFrame` sorting. -[ipysheet](https://github.com/QuantStack/ipysheet) - Jupyter spreadsheet widget. +[papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). [nbdime](https://github.com/jupyter/nbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https://www.reviewnb.com/). [RISE](https://github.com/damianavila/RISE) - Turn Jupyter notebooks into presentations. -[papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). -[pixiedust](https://github.com/pixiedust/pixiedust) - Helper library for Jupyter. -[pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. -[pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. +[qgrid](https://github.com/quantopian/qgrid) - Pandas `DataFrame` sorting. +[pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for jupyter notebooks. +[itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. + +#### Pandas Additions +[xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. +[swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. +[pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. +[pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. #### Helpful [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. [icecream](https://github.com/gruns/icecream) - Simple debugging output. +[loguru](https://github.com/Delgan/loguru) - Python logging. [pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. [intake](https://github.com/intake/intake) - Loading datasets made easier, [talk](https://www.youtube.com/watch?v=s7Ww5-vD2Os&t=33m40s). From 8b2b1041baa5cb495fe6bd17c93840f64db2b8a6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 5 Jan 2020 12:43:28 +0100 Subject: [PATCH 067/550] Update README.md --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 790deee..ad2c344 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ [pandas_profiling](https://github.com/pandas-profiling/pandas-profiling) - Descriptive statistics using `ProfileReport`. [sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Helpful `DataFrameMapper` class. [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. +[rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - Plugin to display .csv files with nice colors. #### Environment and Jupyter [General tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/), [Clean Coding (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) @@ -25,7 +26,9 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for jupyter notebooks. [itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. -#### Pandas Additions +#### Pandas Alternatives and Additions +[modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. +[vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. [xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. [swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. @@ -49,7 +52,6 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [dask](https://github.com/dask/dask), [dask-ml](http://ml.dask.org/) - Pandas `DataFrame` for big data and machine learning library, [resources](https://matthewrocklin.com/blog//work/2018/07/17/dask-dev), [talk1](https://www.youtube.com/watch?v=ccfsbuqsjgI), [talk2](https://www.youtube.com/watch?v=RA_2qdipVng), [notebooks](https://github.com/dask/dask-ec2/tree/master/notebooks), [videos](https://www.youtube.com/user/mdrocklin). [dask-gateway](https://github.com/jcrist/dask-gateway) - Managing dask clusters. [turicreate](https://github.com/apple/turicreate) - Helpful `SFrame` class for out-of-memory dataframes. -[modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. [h2o](https://github.com/h2oai/h2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes. [datatable](https://github.com/h2oai/datatable) - Data Table for big data support. [cuDF](https://github.com/rapidsai/cudf) - GPU DataFrame Library. @@ -58,7 +60,6 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [bottleneck](https://github.com/kwgoodman/bottleneck) - Fast NumPy array functions written in C. [bolz](https://github.com/Blosc/bcolz) - A columnar data container that can be compressed. [cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. -[vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. [zappy](https://github.com/lasersonlab/zappy) - Distributed numpy arrays. From dc33ba9d745571647fc25a15e95442312951c532 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 10 Jan 2020 10:08:16 +0100 Subject: [PATCH 068/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ad2c344..8560c2a 100644 --- a/README.md +++ b/README.md @@ -582,6 +582,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [dagster](https://github.com/dagster-io/dagster) - Tool with focus on dependency graphs. [knockknock](https://github.com/huggingface/knockknock) - Be notified when your training ends. [metaflow](https://github.com/Netflix/metaflow) - Lifecycle Management Tool by Netflix. +[cortex](https://github.com/cortexlabs/cortex) - Deploy machine learning models. #### Math and Background Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) From 20b7054bb0ad05065146388f0d91b717ff4d18e8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 10 Jan 2020 10:12:19 +0100 Subject: [PATCH 069/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8560c2a..f21465f 100644 --- a/README.md +++ b/README.md @@ -632,6 +632,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) [Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science) +[Awesome Python Data Science](https://github.com/amitness/toolbox) [Awesome Pytorch](https://github.com/bharathgs/Awesome-pytorch-list) [Awesome Recommender Systems](https://github.com/grahamjenson/list_of_recommender_systems) [Awesome Semantic Segmentation](https://github.com/mrgloom/awesome-semantic-segmentation) From 319fb96e935a3d11edbf662b35b9841bf66c0e0a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 12 Jan 2020 22:17:34 +0100 Subject: [PATCH 070/550] Update README.md --- README.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index f21465f..3faa8b1 100644 --- a/README.md +++ b/README.md @@ -333,14 +333,18 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. -#### Object detection +#### Object detection / Instance Segmentation +[yolact](https://github.com/dbolya/yolact) - Fully convolutional model for real-time instance segmentation. +[EfficientDet Pytorch](https://github.com/toandaominh1997/EfficientDet.Pytorch), [EfficientDet Keras](https://github.com/xuannianz/EfficientDet) - Scalable and Efficient Object Detection. [detectron2](https://github.com/facebookresearch/detectron2) - Object Detection (Mask R-CNN) by Facebook. [simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. [CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. [FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. -##### Applications and Snippets +#### Image Classification [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Promising neural network architecture. + +##### Applications and Snippets [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. [SPADE](https://github.com/nvlabs/spade) - Semantic Image Synthesis. [Entity Embeddings of Categorical Variables](https://arxiv.org/abs/1604.06737), [code](https://github.com/entron/entity-embedding-rossmann), [kaggle](https://www.kaggle.com/aquatic/entity-embedding-neural-net/code) From 2964767288cb54e82198cc80600a400c3018fdfa Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 17 Jan 2020 01:11:18 +0100 Subject: [PATCH 071/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3faa8b1..d4316c0 100644 --- a/README.md +++ b/README.md @@ -265,6 +265,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [datasketch](https://github.com/ekzhu/datasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog). [flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando. [stanfordnlp](https://github.com/stanfordnlp/stanfordnlp) - NLP Library. +[Chatistics](https://github.com/MasterScrat/Chatistics) - Turn Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames. ##### Papers [Search Engine Correlation](https://arxiv.org/pdf/1107.2691.pdf) From ed189eabe79481a66c1359b9e23ee06031c00d65 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 22 Jan 2020 18:36:25 +0100 Subject: [PATCH 072/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d4316c0..d82b104 100644 --- a/README.md +++ b/README.md @@ -106,7 +106,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Frameworks [scikit-learn](https://github.com/scikit-learn/scikit-learn) - General machine learning framework. [h2o](https://github.com/h2oai/h2o-3) - Machine learning framework. -[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). +[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). [mxnet](https://github.com/apache/incubator-mxnet) - Deep learning framework, [book](https://d2l.ai/index.html). #### Exploration and Cleaning From e99ca96023d7e034c20863e6c8d563efb4a8c328 Mon Sep 17 00:00:00 2001 From: Benedek Rozemberczki Date: Sat, 25 Jan 2020 22:43:11 +0000 Subject: [PATCH 073/550] Added Graph Representation learning and awesome Added section about graph representation learning and 6 Awesome Repos. --- README.md | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index d82b104..919a7d4 100644 --- a/README.md +++ b/README.md @@ -523,7 +523,12 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. [nni](https://github.com/Microsoft/nni) - Toolkit for neural architecture search and hyper-parameter tuning by Microsoft. [automl-gs](https://github.com/minimaxir/automl-gs) - Automated machine learning. -[mljar](https://github.com/mljar/mljar-supervised) - Automated machine learning. +[mljar](https://github.com/mljar/mljar-supervised) - Automated machine learning. + +#### Graph Representation Learning +[Karate Club](https://github.com/rusty1s/pytorch_geometric) - Unsupervised learning on graphs. +[Pytorch Geometric](https://github.com/benedekrozemberczki/karateclub) - Graph representation learning with PyTorch. +[DLG](https://github.com/dmlc/dgl) - Graph representation learning with TensorFlow. #### Evolutionary Algorithms & Optimization [deap](https://github.com/DEAP/deap) - Evolutionary computation framework (Genetic Algorithm, Evolution strategies). @@ -622,17 +627,23 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome AI on Kubernetes](https://github.com/CognonicLabs/awesome-AI-kubernetes) [Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) [Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) -[Awesome Causality](https://github.com/rguo12/awesome-causality-algorithms) -[Awesome CSV](https://github.com/secretGeek/AwesomeCSV) +[Awesome Causality](https://github.com/rguo12/awesome-causality-algorithms) +[Awesome Community Detection](https://github.com/benedekrozemberczki/awesome-community-detection) +[Awesome CSV](https://github.com/secretGeek/AwesomeCSV) [Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) [Awesome Dash](https://github.com/ucg8j/awesome-dash) -[Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning) +[Awesome Decision Trees](https://github.com/benedekrozemberczki/awesome-decision-tree-papers) +[Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning) [Awesome ETL](https://github.com/pawl/awesome-etl) -[Awesome Financial Machine Learning](https://github.com/firmai/financial-machine-learning) -[Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) +[Awesome Financial Machine Learning](https://github.com/firmai/financial-machine-learning) +[Awesome Fraud Detection](https://github.com/benedekrozemberczki/awesome-fraud-detection-papers) +[Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) +[Awesome Graph Classification](https://github.com/benedekrozemberczki/awesome-graph-classification) +[Awesome Gradient Boosting](https://github.com/benedekrozemberczki/awesome-gradient-boosting-papers) [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) -[Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) +[Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) +[Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) From 12d558854e7f6d374abf3ed248e218202b7348ba Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 25 Jan 2020 23:49:49 +0100 Subject: [PATCH 074/550] Update README.md --- README.md | 54 +++++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/README.md b/README.md index 919a7d4..90c1d6a 100644 --- a/README.md +++ b/README.md @@ -526,9 +526,9 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [mljar](https://github.com/mljar/mljar-supervised) - Automated machine learning. #### Graph Representation Learning -[Karate Club](https://github.com/rusty1s/pytorch_geometric) - Unsupervised learning on graphs. -[Pytorch Geometric](https://github.com/benedekrozemberczki/karateclub) - Graph representation learning with PyTorch. -[DLG](https://github.com/dmlc/dgl) - Graph representation learning with TensorFlow. +[Karate Club](https://github.com/rusty1s/pytorch_geometric) - Unsupervised learning on graphs. +[Pytorch Geometric](https://github.com/benedekrozemberczki/karateclub) - Graph representation learning with PyTorch. +[DLG](https://github.com/dmlc/dgl) - Graph representation learning with TensorFlow. #### Evolutionary Algorithms & Optimization [deap](https://github.com/DEAP/deap) - Evolutionary computation framework (Genetic Algorithm, Evolution strategies). @@ -614,7 +614,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [coloredlogs](https://github.com/xolox/python-coloredlogs) - Colored logging output. #### Resources -[Distill.pub](https://distill.pub/) - Blog. +[Distill.pub](https://distill.pub/) - Blog. [Machine Learning Videos](https://github.com/dustinvtran/ml-videos) [Data Science Notebooks](https://github.com/donnemartin/data-science-ipython-notebooks) [Recommender Systems (Microsoft)](https://github.com/Microsoft/Recommenders) @@ -622,28 +622,28 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Datascience Cheatsheets](https://github.com/FavioVazquez/ds-cheatsheets) ##### Other Awesome Lists -[Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) -[Awesome AI Booksmarks](https://github.com/goodrahstar/my-awesome-AI-bookmarks) -[Awesome AI on Kubernetes](https://github.com/CognonicLabs/awesome-AI-kubernetes) -[Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) -[Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) -[Awesome Causality](https://github.com/rguo12/awesome-causality-algorithms) -[Awesome Community Detection](https://github.com/benedekrozemberczki/awesome-community-detection) -[Awesome CSV](https://github.com/secretGeek/AwesomeCSV) -[Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) -[Awesome Dash](https://github.com/ucg8j/awesome-dash) -[Awesome Decision Trees](https://github.com/benedekrozemberczki/awesome-decision-tree-papers) -[Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning) -[Awesome ETL](https://github.com/pawl/awesome-etl) -[Awesome Financial Machine Learning](https://github.com/firmai/financial-machine-learning) -[Awesome Fraud Detection](https://github.com/benedekrozemberczki/awesome-fraud-detection-papers) -[Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) -[Awesome Graph Classification](https://github.com/benedekrozemberczki/awesome-graph-classification) -[Awesome Gradient Boosting](https://github.com/benedekrozemberczki/awesome-gradient-boosting-papers) -[Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) -[Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) -[Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) -[Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) +[Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) +[Awesome AI Booksmarks](https://github.com/goodrahstar/my-awesome-AI-bookmarks) +[Awesome AI on Kubernetes](https://github.com/CognonicLabs/awesome-AI-kubernetes) +[Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) +[Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) +[Awesome Causality](https://github.com/rguo12/awesome-causality-algorithms) +[Awesome Community Detection](https://github.com/benedekrozemberczki/awesome-community-detection) +[Awesome CSV](https://github.com/secretGeek/AwesomeCSV) +[Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) +[Awesome Dash](https://github.com/ucg8j/awesome-dash) +[Awesome Decision Trees](https://github.com/benedekrozemberczki/awesome-decision-tree-papers) +[Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning) +[Awesome ETL](https://github.com/pawl/awesome-etl) +[Awesome Financial Machine Learning](https://github.com/firmai/financial-machine-learning) +[Awesome Fraud Detection](https://github.com/benedekrozemberczki/awesome-fraud-detection-papers) +[Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) +[Awesome Graph Classification](https://github.com/benedekrozemberczki/awesome-graph-classification) +[Awesome Gradient Boosting](https://github.com/benedekrozemberczki/awesome-gradient-boosting-papers) +[Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) +[Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) +[Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) +[Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) @@ -654,7 +654,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Semantic Segmentation](https://github.com/mrgloom/awesome-semantic-segmentation) [Awesome Sentence Embedding](https://github.com/Separius/awesome-sentence-embedding) [Awesome Time Series](https://github.com/MaxBenChrist/awesome_time_series_in_python) -[Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) +[Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) #### Things I google a lot [Frequency codes for time series](https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) From fc79c6f54024d1bf7a4bc65e8364351cf8afcaa9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 18 Feb 2020 13:07:52 +0100 Subject: [PATCH 075/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 90c1d6a..7479760 100644 --- a/README.md +++ b/README.md @@ -578,9 +578,10 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [pyup](https://github.com/pyupio/pyup) - Dependency management. [pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. -##### Data Versioning +##### Data Versioning and Pipelines [dvc](https://github.com/iterative/dvc) - Version control for large files. [hangar](https://github.com/tensorwerk/hangar-py) - Version control for tensor data. +[kedro](https://github.com/quantumblacklabs/kedro) - Build data pipelines. ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From dd676f293561525dc9ebae53cbae45dc74de6ea9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 23 Feb 2020 21:54:02 +0100 Subject: [PATCH 076/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7479760..c57393c 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [qgrid](https://github.com/quantopian/qgrid) - Pandas `DataFrame` sorting. [pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for jupyter notebooks. [itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. +[jupyter-datatables](https://github.com/CermakM/jupyter-datatables) - Interactive tables in Jupyter. #### Pandas Alternatives and Additions [modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. @@ -161,7 +162,6 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/). -UMAP does not preserve global structure any better than t-SNE when using the same initialization - [paper](https://www.biorxiv.org/content/10.1101/2019.12.19.877522v1). [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). [sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) - Truncated SVD (aka LSA). @@ -381,6 +381,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) +[tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. #### Interpretable Classifiers and Regressors [skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. From abe38739d489415c931d49a05d300c76e2b1a510 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 28 Feb 2020 13:57:45 +0100 Subject: [PATCH 077/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c57393c..8d4d5c1 100644 --- a/README.md +++ b/README.md @@ -216,6 +216,7 @@ Plotting (Descartes, Catropy) Predict economic indicators from Open Street Map [ipynb](https://github.com/njanakiev/osm-predict-economic-measurements/blob/master/osm-predict-economic-indicators.ipynb). [PySal](https://github.com/pysal/pysal) - Python Spatial Analysis Library. [geography](https://github.com/ushahidi/geograpy) - Extract countries, regions and cities from a URL or text. +[cartogram](https://go-cart.io/cartogram) - Distorted maps based on population. #### Recommender Systems Examples: [1](https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/), [2](https://medium.com/@james_aka_yale/the-4-recommendation-engines-that-can-predict-your-movie-tastes-bbec857b8223), [2-ipynb](https://github.com/khanhnamle1994/movielens/blob/master/Content_Based_and_Collaborative_Filtering_Models.ipynb), [3](https://www.kaggle.com/morrisb/how-to-recommend-anything-deep-recommender). From a0517aa8658d5595393bdc3186262d9c32b6017b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 28 Feb 2020 15:09:26 +0100 Subject: [PATCH 078/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8d4d5c1..885c015 100644 --- a/README.md +++ b/README.md @@ -162,6 +162,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/). +[sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). [sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) - Truncated SVD (aka LSA). From d4beee8c0ab0b30627ab3e99c7ad9c1498b0ca8b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 4 Mar 2020 01:21:02 +0100 Subject: [PATCH 079/550] causalml --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 885c015..30eaac6 100644 --- a/README.md +++ b/README.md @@ -518,6 +518,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [innvestigate](https://github.com/albermax/innvestigate) - A toolbox to investigate neural network predictions. [dalex](https://github.com/pbiecek/DALEX) - Explanations for ML models (R package). [interpret](https://github.com/microsoft/interpret) - Fit interpretable models, explain models (Microsoft). +[causalml](https://github.com/uber/causalml) - Causal inference by Uber. #### Automated Machine Learning [AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on tensorflow. From 4437b85410f2962315e6543a9575a684867cad06 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 4 Mar 2020 17:34:45 +0100 Subject: [PATCH 080/550] pytorch-optimizer --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 30eaac6..2418b57 100644 --- a/README.md +++ b/README.md @@ -298,7 +298,7 @@ fast.ai course - [Lessons 1-7](https://course.fast.ai/videos/?lesson=1), [Lesson [Tensorflow without a PhD](https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd) - Neural Network course by Google. Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [PPT](http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture12.pdf) [Tensorflow Playground](https://playground.tensorflow.org/) -[Visualization of optimization algorithms](https://vis.ensmallen.org/) +[Visualization of optimization algorithms](https://vis.ensmallen.org/), [Another visualization](https://github.com/jettify/pytorch-optimizer) [cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. ##### Image Related @@ -323,6 +323,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [tensorlayer](https://github.com/tensorlayer/tensorlayer) - Neural Networks on top of tensorflow, [tricks](https://github.com/wagamamaz/tensorlayer-tricks). [tensorforce](https://github.com/reinforceio/tensorforce) - Tensorflow for applied reinforcement learning. [fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. +[pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) - Collection of optimizers for pytorch. [ignite](https://github.com/pytorch/ignite) - Highlevel library for pytorch. [skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. From f5eb1eba9a672ea51802b6347f14f2aec0e17feb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 6 Mar 2020 22:38:16 +0100 Subject: [PATCH 081/550] pandarallel --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2418b57..7c3ae9d 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Pandas Alternatives and Additions [modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. [vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. +[pandarallel](https://github.com/nalepae/pandarallel) - Parallelize pandas operations. [xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. [swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. From ea76b10d69b0a8bbf8aab1217c2b7a03f9f04fad Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 9 Mar 2020 21:19:39 +0100 Subject: [PATCH 082/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 7c3ae9d..38565fe 100644 --- a/README.md +++ b/README.md @@ -309,6 +309,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. [augmix](https://github.com/google-research/augmix) - Image augmentation from Google. +##### Lossfunction Related +[SegLoss](https://github.com/JunMa11/SegLoss) - List of loss functions for medical image segmentation. + #### Text Related [ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras. [textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. From 295a9cad131b0be6d036e5657ca2650bedff7320 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 9 Mar 2020 21:24:31 +0100 Subject: [PATCH 083/550] Added kornia, SegLoss --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 38565fe..6a95448 100644 --- a/README.md +++ b/README.md @@ -308,6 +308,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [keras preprocessing](https://keras.io/preprocessing/image/) - Preprocess images. [albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. [augmix](https://github.com/google-research/augmix) - Image augmentation from Google. +[kornia](https://github.com/kornia/kornia) - Image augmentation, feature extraction and loss functions. ##### Lossfunction Related [SegLoss](https://github.com/JunMa11/SegLoss) - List of loss functions for medical image segmentation. From c4516908d37439cd87f94ecf5f28493640b878be Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 10 Mar 2020 23:42:16 +0100 Subject: [PATCH 084/550] automl_zero --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6a95448..5002174 100644 --- a/README.md +++ b/README.md @@ -533,7 +533,8 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. [nni](https://github.com/Microsoft/nni) - Toolkit for neural architecture search and hyper-parameter tuning by Microsoft. [automl-gs](https://github.com/minimaxir/automl-gs) - Automated machine learning. -[mljar](https://github.com/mljar/mljar-supervised) - Automated machine learning. +[mljar](https://github.com/mljar/mljar-supervised) - Automated machine learning. +[automl_zero](https://github.com/google-research/google-research/tree/master/automl_zero) - Automatically discover computer programs that can solve machine learning tasks from Google. #### Graph Representation Learning [Karate Club](https://github.com/rusty1s/pytorch_geometric) - Unsupervised learning on graphs. From 0013e16b9f32babf19f0b22d5ca5d8dd17dd0a42 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 11 Mar 2020 11:45:37 +0100 Subject: [PATCH 085/550] Update README.md --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5002174..084d844 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ [rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - Plugin to display .csv files with nice colors. #### Environment and Jupyter -[General tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/), [Clean Coding (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) +[General Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) Fixing environment: [link](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) - Project template for data science projects. @@ -27,7 +27,10 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. [jupyter-datatables](https://github.com/CermakM/jupyter-datatables) - Interactive tables in Jupyter. -#### Pandas Alternatives and Additions +#### Pandas Tricks, Alternatives and Additions +[Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) +[Using df.pipe() (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) + [modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. [vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. [pandarallel](https://github.com/nalepae/pandarallel) - Parallelize pandas operations. From 2d39b4f4b663d6153f9a268bd1d1b2c85af049b0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 12 Mar 2020 00:41:52 +0100 Subject: [PATCH 086/550] time series anomaly detection --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 084d844..c7b11d1 100644 --- a/README.md +++ b/README.md @@ -439,7 +439,8 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor [seglearn](https://github.com/dmbee/seglearn) - Time Series library. [pyts](https://github.com/johannfaouzi/pyts) - Time series transformation and classification, [Imaging time series](https://pyts.readthedocs.io/en/latest/auto_examples/index.html#imaging-time-series). Turn time series into images and use Neural Nets: [example](https://gist.github.com/oguiza/c9c373aec07b96047d1ba484f23b7b47), [example](https://github.com/kiss90/time-series-classification). -[sktime](https://github.com/alan-turing-institute/sktime), [sktime-dl](https://github.com/uea-machine-learning/sktime-dl) - Toolbox for (deep) learning with time series. +[sktime](https://github.com/alan-turing-institute/sktime), [sktime-dl](https://github.com/uea-machine-learning/sktime-dl) - Toolbox for (deep) learning with time series. +[adtk](https://github.com/arundo/adtk) - Time Series Anomaly Detection. ##### Time Series Evaluation From eea12c9640301d65dcd6b9f392c9057a6332534c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 20 Mar 2020 23:37:43 +0100 Subject: [PATCH 087/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index c7b11d1..d4b17e0 100644 --- a/README.md +++ b/README.md @@ -345,6 +345,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. +#### Training-related +[livelossplot](https://github.com/stared/livelossplot) - Live training loss plot in Jupyter Notebook. + #### Object detection / Instance Segmentation [yolact](https://github.com/dbolya/yolact) - Fully convolutional model for real-time instance segmentation. [EfficientDet Pytorch](https://github.com/toandaominh1997/EfficientDet.Pytorch), [EfficientDet Keras](https://github.com/xuannianz/EfficientDet) - Scalable and Efficient Object Detection. From 1d10b124d0eac667d50c96294633e6f2046243b8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 28 Mar 2020 09:53:59 +0100 Subject: [PATCH 088/550] Visual debugger for Jupyter. --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d4b17e0..e7c5e41 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for jupyter notebooks. [itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. [jupyter-datatables](https://github.com/CermakM/jupyter-datatables) - Interactive tables in Jupyter. +[debugger](https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559) - Visual debugger for Jupyter. #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) From 2bd80c6b2e2938a8f305658435f27db562b906dd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 28 Mar 2020 16:48:28 +0100 Subject: [PATCH 089/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e7c5e41..e524734 100644 --- a/README.md +++ b/README.md @@ -615,6 +615,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [cortex](https://github.com/cortexlabs/cortex) - Deploy machine learning models. #### Math and Background +[All kinds of math and statistics resources](https://realnotcomplex.com/) Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machine Learning ](https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/) From 81a763c190590a3adad268a2addeba8c57a926f8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 6 Apr 2020 08:11:11 +0200 Subject: [PATCH 090/550] Update README.md --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e524734..38c3861 100644 --- a/README.md +++ b/README.md @@ -452,11 +452,14 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [tscv](https://github.com/WenjieZ/TSCV) - Evaluation with gap. #### Financial Data +[pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. +[ffn](https://github.com/pmorissette/ffn) - Financial functions. +[bt](https://github.com/pmorissette/bt) - Backtesting algorithms. +The Quantopian Stack (some features may require signup on their platform): [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. -[stockstats](https://github.com/jealous/stockstats) - Pandas DataFrame wrapper for working with stock data. -[pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. +[empyrical](https://github.com/quantopian/empyrical) - Financial risk metrics. #### Survival Analysis [Time-dependent Cox Model in R](https://stats.stackexchange.com/questions/101353/cox-regression-with-time-varying-covariates). From bbe09a7abb6f48a602d5eb919c27467f97e651a7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 6 Apr 2020 08:44:56 +0200 Subject: [PATCH 091/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 38c3861..c06ed13 100644 --- a/README.md +++ b/README.md @@ -498,6 +498,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [Pyro](https://github.com/pyro-ppl/pyro) - Deep Universal Probabilistic Programming. [tensorflow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). [bambi](https://github.com/bambinos/bambi) - High-level Bayesian model-building interface on top of PyMC3. +[neural-tangents](https://github.com/google/neural-tangents) - Infinite Neural Networks. #### Stacking Models and Ensembles [Model Stacking Blog Post](http://blog.kaggle.com/2017/06/15/stacking-made-easy-an-introduction-to-stacknet-by-competitions-grandmaster-marios-michailidis-kazanova/) From 75a0a2a705dd1e8a9480764896f3892871f8fb7f Mon Sep 17 00:00:00 2001 From: Francesco Murdaca Date: Mon, 6 Apr 2020 10:16:37 +0200 Subject: [PATCH 092/550] Correct inverted links Signed-off-by: Francesco Murdaca --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e524734..0bee9df 100644 --- a/README.md +++ b/README.md @@ -545,8 +545,8 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [automl_zero](https://github.com/google-research/google-research/tree/master/automl_zero) - Automatically discover computer programs that can solve machine learning tasks from Google. #### Graph Representation Learning -[Karate Club](https://github.com/rusty1s/pytorch_geometric) - Unsupervised learning on graphs. -[Pytorch Geometric](https://github.com/benedekrozemberczki/karateclub) - Graph representation learning with PyTorch. +[Karate Club](https://github.com/benedekrozemberczki/karateclub) - Unsupervised learning on graphs. +[Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric) - Graph representation learning with PyTorch. [DLG](https://github.com/dmlc/dgl) - Graph representation learning with TensorFlow. #### Evolutionary Algorithms & Optimization From e0ebcab37b5d974f96b0cb0f72fa385acf166c95 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Apr 2020 18:49:37 +0200 Subject: [PATCH 093/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c8eca29..ae56c86 100644 --- a/README.md +++ b/README.md @@ -455,6 +455,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. [ffn](https://github.com/pmorissette/ffn) - Financial functions. [bt](https://github.com/pmorissette/bt) - Backtesting algorithms. +[alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. The Quantopian Stack (some features may require signup on their platform): [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. From 9661db1856aa0167e0b8c7ece9456b516f90ee11 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 9 Apr 2020 16:01:36 +0200 Subject: [PATCH 094/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index ae56c86..3701e6a 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) [Using df.pipe() (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) +[pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. [modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. [vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. @@ -39,6 +40,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. +[pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. #### Helpful [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. @@ -417,6 +419,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [pyramid](https://github.com/tgsmith61591/pyramid), [pmdarima](https://github.com/tgsmith61591/pmdarima) - Wrapper for (Auto-) ARIMA. [pyflux](https://github.com/RJT1990/pyflux) - Time series prediction algorithms (ARIMA, GARCH, GAS, Bayesian). [prophet](https://github.com/facebook/prophet) - Time series prediction library. +[atspy](https://github.com/firmai/atspy) - Automated Time Series Models. [pm-prophet](https://github.com/luke14free/pm-prophet) - Time series prediction and decomposition library. [htsprophet](https://github.com/CollinRooney12/htsprophet) - Hierarchical Time Series Forecasting using Prophet. [nupic](https://github.com/numenta/nupic) - Hierarchical Temporal Memory (HTM) for Time Series Prediction and Anomaly Detection. From 80a76ad4731ae1e0697f0b132b44b145dabed0da Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Apr 2020 17:12:25 +0200 Subject: [PATCH 095/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 3701e6a..0edad42 100644 --- a/README.md +++ b/README.md @@ -455,7 +455,9 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [tscv](https://github.com/WenjieZ/TSCV) - Evaluation with gap. #### Financial Data +[Courses](https://quantecon.org/) [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. +[yfinance](https://github.com/ranaroussi/yfinance) - Read stock data from Yahoo Finance. [ffn](https://github.com/pmorissette/ffn) - Financial functions. [bt](https://github.com/pmorissette/bt) - Backtesting algorithms. [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. From 5de68c6c8669a824882d1d0173aabbe517dc41f4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 24 Apr 2020 14:53:33 +0200 Subject: [PATCH 096/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0edad42..53f92a4 100644 --- a/README.md +++ b/README.md @@ -586,6 +586,7 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 sklearn - [PassiveAggressiveClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html). [creme-ml](https://github.com/creme-ml/creme) - Incremental learning framework, [talk](https://www.youtube.com/watch?v=P3M6dt7bY9U). [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Online Learning algorithms. +[onelearn](https://github.com/onelearn/onelearn) - Online Random Forests. #### Active Learning [Talk](https://www.youtube.com/watch?v=0efyjq5rWS4) From 16c573695ccf8121956b984fd9f0cd153dc249d6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 1 May 2020 12:42:47 +0200 Subject: [PATCH 097/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 53f92a4..9d25ed0 100644 --- a/README.md +++ b/README.md @@ -455,11 +455,11 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [tscv](https://github.com/WenjieZ/TSCV) - Evaluation with gap. #### Financial Data -[Courses](https://quantecon.org/) [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. [yfinance](https://github.com/ranaroussi/yfinance) - Read stock data from Yahoo Finance. [ffn](https://github.com/pmorissette/ffn) - Financial functions. [bt](https://github.com/pmorissette/bt) - Backtesting algorithms. +[backtrader](https://github.com/mementum/backtrader) - Backtesting for trading strategies. [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. The Quantopian Stack (some features may require signup on their platform): [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. From 031c726f6581876540918c17b87b0ed5240d63cd Mon Sep 17 00:00:00 2001 From: Benedek Rozemberczki Date: Mon, 18 May 2020 12:44:12 +0000 Subject: [PATCH 098/550] Added Little Ball of Fur. Subsampling from large graphs. --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9d25ed0..59f6186 100644 --- a/README.md +++ b/README.md @@ -119,6 +119,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). +[littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. From e4fdabdd2401bd22b77ff84e9d9a9ae75d362334 Mon Sep 17 00:00:00 2001 From: JustGlowing Date: Tue, 26 May 2020 11:40:09 +0100 Subject: [PATCH 099/550] adding MiniSom --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 59f6186..6921617 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,8 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. -[pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. +[pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. +[MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. #### Helpful [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. @@ -394,12 +395,13 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering) [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). -[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. +[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) -[tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. +[tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. +[MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. #### Interpretable Classifiers and Regressors [skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. From d2b098b6dd572bc1d96ccc344ce1ae234d11b424 Mon Sep 17 00:00:00 2001 From: JustGlowing Date: Wed, 27 May 2020 10:47:55 +0100 Subject: [PATCH 100/550] refinements --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 6921617..8642a1d 100644 --- a/README.md +++ b/README.md @@ -40,8 +40,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. -[pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. -[MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. +[pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. #### Helpful [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. @@ -401,7 +400,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. -[MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. +[MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. #### Interpretable Classifiers and Regressors [skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. From e3ee533ac85018915d7684225d78bf70cde64e6d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 27 May 2020 12:51:34 +0200 Subject: [PATCH 101/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8642a1d..398e7b1 100644 --- a/README.md +++ b/README.md @@ -399,7 +399,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) -[tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. +[tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. #### Interpretable Classifiers and Regressors From b50a957be9a081431fe9bcf8bda86c4e64b49ab5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 27 May 2020 12:52:22 +0200 Subject: [PATCH 102/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 398e7b1..c6472f0 100644 --- a/README.md +++ b/README.md @@ -394,7 +394,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering) [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). -[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. +[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. From 1eae8a91ecaa82be5b64982130e37f46eea064d4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 16 Jul 2020 11:00:55 +0200 Subject: [PATCH 103/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c6472f0..618d093 100644 --- a/README.md +++ b/README.md @@ -477,6 +477,7 @@ The Quantopian Stack (some features may require signup on their platform): [survivalstan](https://github.com/hammerlab/survivalstan) - Survival analysis, [intro](http://www.hammerlab.org/2017/06/26/introducing-survivalstan/). [convoys](https://github.com/better/convoys) - Analyze time lagged conversions. RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests). +[pysurvival](https://github.com/square/pysurvival) - Survival analysis . #### Outlier Detection & Anomaly Detection [sklearn](https://scikit-learn.org/stable/modules/outlier_detection.html) - Isolation Forest and others. From 773ec7162da5359a7bcdd19b56c2dc5f92683b20 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 16 Jul 2020 11:18:57 +0200 Subject: [PATCH 104/550] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 618d093..8403799 100644 --- a/README.md +++ b/README.md @@ -371,6 +371,10 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Posts: [1](https://www.thomasjpfan.com/2018/07/nuclei-image-segmentation-tutorial/), [2](https://www.thomasjpfan.com/2017/08/hassle-free-unets/) [deeplearning-models](https://github.com/rasbt/deeplearning-models) - Deep learning models. +#### Variational Autoencoders (VAE) +[disentanglement_lib](https://github.com/google-research/disentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE. + + #### GPU [cuML](https://github.com/rapidsai/cuml) - Run traditional tabular ML tasks on GPUs. [thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. From 7354c8824e0e7514579434948360f54ec7bc141e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 16 Jul 2020 11:51:23 +0200 Subject: [PATCH 105/550] Update README.md --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index 8403799..3b2a90d 100644 --- a/README.md +++ b/README.md @@ -162,6 +162,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [linselect](https://github.com/efavdb/linselect) - Feature selection package. [mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/) - Exhaustive feature selection. [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. +[INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. #### Dimensionality Reduction [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU) @@ -513,6 +514,11 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [bambi](https://github.com/bambinos/bambi) - High-level Bayesian model-building interface on top of PyMC3. [neural-tangents](https://github.com/google/neural-tangents) - Infinite Neural Networks. +#### Gaussian Processes +[GPyOpt](https://github.com/SheffieldML/GPyOpt) - Gaussian process optimization. +[GPflow](https://github.com/GPflow/GPflow) - Gaussian processes (Tensorflow). +[gpytorch](https://gpytorch.ai/) - Gaussian processes (Pytorch). + #### Stacking Models and Ensembles [Model Stacking Blog Post](http://blog.kaggle.com/2017/06/15/stacking-made-easy-an-introduction-to-stacknet-by-competitions-grandmaster-marios-michailidis-kazanova/) [mlxtend](https://github.com/rasbt/mlxtend) - `EnsembleVoteClassifier`, `StackingRegressor`, `StackingCVRegressor` for model stacking. From ae0c6a34f5a5636b8fb9fd24544768abff307545 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 31 Jul 2020 10:20:15 +0200 Subject: [PATCH 106/550] pingouin --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3b2a90d..23225fb 100644 --- a/README.md +++ b/README.md @@ -91,6 +91,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) #### Statistical Tests and Packages +[pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. [Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. From f2b31e76d10c0ff67a6011e97bf40bc1ba76e12d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 31 Jul 2020 10:22:47 +0200 Subject: [PATCH 107/550] Update README.md --- README.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 23225fb..14320be 100644 --- a/README.md +++ b/README.md @@ -81,21 +81,12 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Classical Statistics -##### Texts -[Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) -[Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) -[Chatruc - The Central Limit Theorem and its misuse](https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) -[Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) -[Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) -[Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) -[Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) - #### Statistical Tests and Packages [pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. -[researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). -[scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. -[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. -[scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. +[scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. +[researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). +[scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. +[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). ##### Visualizations @@ -112,6 +103,15 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Inverse Propensity Weighting](https://www.youtube.com/watch?v=SUq0shKLPPs) [Dealing with Selection Bias By Propensity Based Feature Selection](https://www.youtube.com/watch?reload=9&v=3ZWCKr0vDtc) +##### Texts +[Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) +[Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) +[Chatruc - The Central Limit Theorem and its misuse](https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) +[Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) +[Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) +[Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) +[Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) + #### Frameworks [scikit-learn](https://github.com/scikit-learn/scikit-learn) - General machine learning framework. [h2o](https://github.com/h2oai/h2o-3) - Machine learning framework. From 7561c1e8038d7921df14a254395ff42ef8be3f6f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 31 Jul 2020 10:23:37 +0200 Subject: [PATCH 108/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 14320be..2f3497f 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Classical Statistics -#### Statistical Tests and Packages +##### Statistical Tests and Packages [pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). @@ -99,7 +99,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Distribution of p-values when comparing two groups](https://rpsychologist.com/d3/pdist/) [Understanding the t-distribution and its normal approximation](https://rpsychologist.com/d3/tdist/) -#### Talks +##### Talks [Inverse Propensity Weighting](https://www.youtube.com/watch?v=SUq0shKLPPs) [Dealing with Selection Bias By Propensity Based Feature Selection](https://www.youtube.com/watch?reload=9&v=3ZWCKr0vDtc) From 37e6a87cd196421bb1fedfff08c73041f19c0b6d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 3 Aug 2020 23:03:29 +0200 Subject: [PATCH 109/550] combo --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2f3497f..479fb71 100644 --- a/README.md +++ b/README.md @@ -526,6 +526,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [vecstack](https://github.com/vecxoz/vecstack) - Stacking ML models. [StackNet](https://github.com/kaz-Anova/StackNet) - Stacking ML models. [mlens](https://github.com/flennerhag/mlens) - Ensemble learning. +[combo](https://github.com/yzhao062/combo) - Combining ML models (stacking, ensembling). #### Model Evaluation [pycm](https://github.com/sepandhaghighi/pycm) - Multi-class confusion matrix. From 89fa7977a20feedb8b0e664267e425fe8c077dc0 Mon Sep 17 00:00:00 2001 From: Bernardo da Eira Duarte Date: Tue, 4 Aug 2020 01:08:25 -0300 Subject: [PATCH 110/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 479fb71..6e44cf4 100644 --- a/README.md +++ b/README.md @@ -278,6 +278,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando. [stanfordnlp](https://github.com/stanfordnlp/stanfordnlp) - NLP Library. [Chatistics](https://github.com/MasterScrat/Chatistics) - Turn Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames. +[textvec](https://github.com/textvec/textvec) - Supervised text vectorization tool. ##### Papers [Search Engine Correlation](https://arxiv.org/pdf/1107.2691.pdf) From 7092dba1b7fd959a46bdd4c578f22b32aced07f8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 25 Aug 2020 23:47:59 +0200 Subject: [PATCH 111/550] hummingbird --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6e44cf4..89d36a5 100644 --- a/README.md +++ b/README.md @@ -210,7 +210,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. [streamlit](https://github.com/streamlit/streamlit) - Dashboards. -#### Geopraphical Tools +#### Geographical Tools [folium](https://github.com/python-visualization/folium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https://github.com/jupyter-widgets/ipyleaflet). [gmaps](https://github.com/pbugnion/gmaps) - Google Maps for Jupyter notebooks. [stadiamaps](https://stadiamaps.com/) - Plot geographical maps. @@ -382,6 +382,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [cuML](https://github.com/rapidsai/cuml) - Run traditional tabular ML tasks on GPUs. [thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. [thundersvm](https://github.com/Xtra-Computing/thundersvm) - Support Vector Machines. +[hummingbird](https://github.com/microsoft/hummingbird) - Convert ML models to models that run on the GPU (by Microsoft). #### Regression Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf), [forum](https://www.quora.com/How-does-support-vector-regression-work), [paper](http://alex.smola.org/papers/2003/SmoSch03b.pdf) From f5a45bd268e01b6dcb3b993297dd2f9bb7a12451 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 26 Aug 2020 12:54:36 +0200 Subject: [PATCH 112/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 89d36a5..229ef01 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. [jupyter-datatables](https://github.com/CermakM/jupyter-datatables) - Interactive tables in Jupyter. [debugger](https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559) - Visual debugger for Jupyter. +[nbcommands](https://github.com/vinayak-mehta/nbcommands) - View and search notebooks from terminal. #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) From 65e355854a07bfec451c7ab05c4aa67db50a08ba Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 26 Aug 2020 23:24:31 +0200 Subject: [PATCH 113/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 229ef01..497b24b 100644 --- a/README.md +++ b/README.md @@ -695,6 +695,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) [Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) +[Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) [Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science) From c5e054398e32ee1b7c748ac042a609345eb4f591 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 2 Sep 2020 11:46:42 +0200 Subject: [PATCH 114/550] handcalcs --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 497b24b..5c3d928 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [jupyter-datatables](https://github.com/CermakM/jupyter-datatables) - Interactive tables in Jupyter. [debugger](https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559) - Visual debugger for Jupyter. [nbcommands](https://github.com/vinayak-mehta/nbcommands) - View and search notebooks from terminal. +[handcalcs](https://github.com/connorferster/handcalcs) - More convenient way of writing mathematical equations in Jupyter. #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) From a79cbeaee03bca593a983b30f8d171240f1a1a35 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 9 Sep 2020 15:05:28 +0200 Subject: [PATCH 115/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 5c3d928..d378c73 100644 --- a/README.md +++ b/README.md @@ -191,6 +191,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [scikit-plot](https://github.com/reiinakano/scikit-plot) - ROC curves and other visualizations for ML models. [yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Visualizations for ML models (similar to scikit-plot). [bokeh](https://bokeh.pydata.org/en/latest/) - Interactive visualization library, [Examples](https://bokeh.pydata.org/en/latest/docs/user_guide/server.html), [Examples](https://github.com/WillKoehrsen/Bokeh-Python-Visualization). +[lets-plot](https://github.com/JetBrains/lets-plot/blob/master/README_PYTHON.md) - Plotting library. [animatplot](https://github.com/t-makaro/animatplot) - Animate plots build on matplotlib. [plotnine](https://github.com/has2k1/plotnine) - ggplot for Python. [altair](https://altair-viz.github.io/) - Declarative statistical visualization library. From 0f9ead9c536a5defd961d5c9c1354d76fe626c96 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 24 Sep 2020 09:44:27 +0200 Subject: [PATCH 116/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d378c73..5468850 100644 --- a/README.md +++ b/README.md @@ -479,6 +479,7 @@ The Quantopian Stack (some features may require signup on their platform): [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. [empyrical](https://github.com/quantopian/empyrical) - Financial risk metrics. +[eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. #### Survival Analysis [Time-dependent Cox Model in R](https://stats.stackexchange.com/questions/101353/cox-regression-with-time-varying-covariates). From 4c9d71be8a47cb916b9605c0b20a70f4f2c30444 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 13 Oct 2020 21:27:43 +0200 Subject: [PATCH 117/550] Graph-Based Neural Networks --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index 5468850..9242483 100644 --- a/README.md +++ b/README.md @@ -380,6 +380,14 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po #### Variational Autoencoders (VAE) [disentanglement_lib](https://github.com/google-research/disentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE. +#### Graph-Based Neural Networks +[How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) +[Introduction To Graph Convolutional Networks](http://tkipf.github.io/graph-convolutional-networks/) +[ogb](https://ogb.stanford.edu/) - Open Graph Benchmark, Benchmark datasets. +[networkx](https://github.com/networkx/networkx) - Graph library. +[pytorch-geometric](https://github.com/rusty1s/pytorch_geometric) - Various methods for deep learning on graphs. +[dgl](https://github.com/dmlc/dgl) - Deep Graph Library. +[graph_nets](https://github.com/deepmind/graph_nets) - Build graph networks in Tensorflow, by deepmind. #### GPU [cuML](https://github.com/rapidsai/cuml) - Run traditional tabular ML tasks on GPUs. From 68b3a1a244fa79cb2c005a2556821c65c68933de Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 15 Oct 2020 17:54:31 +0200 Subject: [PATCH 118/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 9242483..fef4f43 100644 --- a/README.md +++ b/README.md @@ -469,6 +469,7 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor Turn time series into images and use Neural Nets: [example](https://gist.github.com/oguiza/c9c373aec07b96047d1ba484f23b7b47), [example](https://github.com/kiss90/time-series-classification). [sktime](https://github.com/alan-turing-institute/sktime), [sktime-dl](https://github.com/uea-machine-learning/sktime-dl) - Toolbox for (deep) learning with time series. [adtk](https://github.com/arundo/adtk) - Time Series Anomaly Detection. +[rocket](https://github.com/angus924/rocket) - Time Series classification using random convolutional kernels. ##### Time Series Evaluation @@ -488,6 +489,7 @@ The Quantopian Stack (some features may require signup on their platform): [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. [empyrical](https://github.com/quantopian/empyrical) - Financial risk metrics. [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. +[tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. #### Survival Analysis [Time-dependent Cox Model in R](https://stats.stackexchange.com/questions/101353/cox-regression-with-time-varying-covariates). From 09f156a3b37c222314ee3fde6402a85f31fb718f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 15 Oct 2020 17:55:52 +0200 Subject: [PATCH 119/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index fef4f43..e5067b0 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. #### Helpful -[tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. +[tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). [icecream](https://github.com/gruns/icecream) - Simple debugging output. [loguru](https://github.com/Delgan/loguru) - Python logging. [pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. From 85e98a7a38f0622857e5a76550af43eeebda0a92 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 18 Oct 2020 19:18:33 +0200 Subject: [PATCH 120/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e5067b0..9f2788b 100644 --- a/README.md +++ b/README.md @@ -414,7 +414,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. -[hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU). +[hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) From 2262ee7d321783a26116985d4a193957eee376c8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 18 Oct 2020 19:51:31 +0200 Subject: [PATCH 121/550] Update README.md --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 9f2788b..eb6fdee 100644 --- a/README.md +++ b/README.md @@ -407,19 +407,20 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Talk](https://www.youtube.com/watch?v=DkLPYccEJ8Y), [Notebook](https://github.com/ianozsvald/data_science_delivered/blob/master/ml_creating_correct_capable_classifiers.ipynb) [Blog post: Probability Scoring](https://machinelearningmastery.com/how-to-score-probability-predictions-in-python/) [All classification metrics](http://rali.iro.umontreal.ca/rali/sites/default/files/publis/SokolovaLapalme-JIPM09.pdf) -[DESlib](https://github.com/scikit-learn-contrib/DESlib) - Dynamic classifier and ensemble selection +[DESlib](https://github.com/scikit-learn-contrib/DESlib) - Dynamic classifier and ensemble selection. #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering) +[Assessing the quality of a clustering (video)](https://www.youtube.com/watch?v=Mf6MqIS2ql4) +[hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). -[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. -[hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. +[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. #### Interpretable Classifiers and Regressors [skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. From 3936cf7c1d8ffdb0b0b1d3e641980fa5ef8c17d0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 24 Oct 2020 16:05:51 +0200 Subject: [PATCH 122/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index eb6fdee..8cde979 100644 --- a/README.md +++ b/README.md @@ -408,6 +408,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Blog post: Probability Scoring](https://machinelearningmastery.com/how-to-score-probability-predictions-in-python/) [All classification metrics](http://rali.iro.umontreal.ca/rali/sites/default/files/publis/SokolovaLapalme-JIPM09.pdf) [DESlib](https://github.com/scikit-learn-contrib/DESlib) - Dynamic classifier and ensemble selection. +[human-learn](https://github.com/koaning/human-learn) - Create and tune classifier based on your rule set. #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering) From 261cb3c733eb5d3c5817fcc83bd02ca4962366c6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 24 Oct 2020 16:08:23 +0200 Subject: [PATCH 123/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 8cde979..d1e4bee 100644 --- a/README.md +++ b/README.md @@ -140,7 +140,6 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) - Pipeline, [examples](https://github.com/jem1031/pandas-pipelines-custom-transformers). [pdpipe](https://github.com/shaypal5/pdpipe) - Pipelines for DataFrames. [scikit-lego](https://github.com/koaning/scikit-lego) - Custom transformers for pipelines. -[few](https://github.com/lacava/few) - Feature engineering wrapper for sklearn. [skoot](https://github.com/tgsmith61591/skoot) - Pipeline helper functions. [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding) - Categorical encoding of variables, [vtreat (R package)](https://cran.r-project.org/web/packages/vtreat/vignettes/vtreat.html). [dirty_cat](https://github.com/dirty-cat/dirty_cat) - Encoding dirty categorical variables. From 0e70b9ba0b596ffeefe8e71d3d0ffe1bf1181854 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 26 Oct 2020 14:25:01 +0100 Subject: [PATCH 124/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d1e4bee..7c51536 100644 --- a/README.md +++ b/README.md @@ -414,6 +414,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Assessing the quality of a clustering (video)](https://www.youtube.com/watch?v=Mf6MqIS2ql4) [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. +[fpc](https://cran.r-project.org/web/packages/fpc/index.html) - Various methods for clustering and cluster validation (R package). [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. From 68e1f8c5e8d51fd9592ffda8df247da503d53591 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 12 Nov 2020 10:18:32 +0100 Subject: [PATCH 125/550] Added list of cool machine learning books --- README.md | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 7c51536..d044e4c 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [icecream](https://github.com/gruns/icecream) - Simple debugging output. [loguru](https://github.com/Delgan/loguru) - Python logging. [pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. -[intake](https://github.com/intake/intake) - Loading datasets made easier, [talk](https://www.youtube.com/watch?v=s7Ww5-vD2Os&t=33m40s). +[intake](https://github.com/intake/intake) - Loading datasets made easier, [talk](https://www.youtube.com/watch?v=s7Ww5-vD2Os&t=33m40s). #### Extraction [textract](https://github.com/deanmalmgren/textract) - Extract text from any document. @@ -122,7 +122,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). -[littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. +[littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. @@ -164,7 +164,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [linselect](https://github.com/efavdb/linselect) - Feature selection package. [mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/) - Exhaustive feature selection. [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. -[INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. +[INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. #### Dimensionality Reduction [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU) @@ -204,7 +204,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. #### Dashboards -[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo), [resources](https://github.com/ucg8j/awesome-dash) +[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo), [resources](https://github.com/ucg8j/awesome-dash) [panel](https://panel.pyviz.org/index.html) - Dashboarding solution. [bokeh](https://github.com/bokeh/bokeh) - Dashboarding solution. [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. @@ -303,7 +303,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [cv2](https://github.com/skvark/opencv-python) - OpenCV, classical algorithms: [Gaussian Filter](https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html), [Morphological Transformations](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html). [scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. -#### Neural Networks +#### Neural Networks ##### Tutorials & Viewer [Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) @@ -327,7 +327,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), #### Text Related [ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras. -[textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. +[textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. [ctrl](https://github.com/salesforce/ctrl) - Text generation. ##### Libs @@ -337,7 +337,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt: Convenient hyperparameter optimization wrapper. [elephas](https://github.com/maxpumperla/elephas) - Distributed Deep learning with Keras & Spark. [tflearn](https://github.com/tflearn/tflearn) - Neural Networks on top of tensorflow. -[tensorlayer](https://github.com/tensorlayer/tensorlayer) - Neural Networks on top of tensorflow, [tricks](https://github.com/wagamamaz/tensorlayer-tricks). +[tensorlayer](https://github.com/tensorlayer/tensorlayer) - Neural Networks on top of tensorflow, [tricks](https://github.com/wagamamaz/tensorlayer-tricks). [tensorforce](https://github.com/reinforceio/tensorforce) - Tensorflow for applied reinforcement learning. [fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) - Collection of optimizers for pytorch. @@ -347,7 +347,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [PlotNeuralNet](https://github.com/HarisIqbal88/PlotNeuralNet) - Plot neural networks. [lucid](https://github.com/tensorflow/lucid) - Neural network interpretability, [Activation Maps](https://openai.com/blog/introducing-activation-atlases/). [tcav](https://github.com/tensorflow/tcav) - Interpretability method. -[AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD, [alt](https://github.com/titu1994/keras-adabound). +[AdaBound](https://github.com/Luolc/AdaBound) - Optimizer that trains as fast as Adam and as good as SGD, [alt](https://github.com/titu1994/keras-adabound). [foolbox](https://github.com/bethgelab/foolbox) - Adversarial examples that fool neural networks. [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Training metrics. [imgclsmob](https://github.com/osmr/imgclsmob) - Pretrained models. @@ -363,7 +363,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [detectron2](https://github.com/facebookresearch/detectron2) - Object Detection (Mask R-CNN) by Facebook. [simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. [CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. -[FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. +[FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. #### Image Classification [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Promising neural network architecture. @@ -468,7 +468,7 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor [RobustSTL](https://github.com/LeeDoYup/RobustSTL) - Robust Seasonal-Trend Decomposition. [seglearn](https://github.com/dmbee/seglearn) - Time Series library. [pyts](https://github.com/johannfaouzi/pyts) - Time series transformation and classification, [Imaging time series](https://pyts.readthedocs.io/en/latest/auto_examples/index.html#imaging-time-series). -Turn time series into images and use Neural Nets: [example](https://gist.github.com/oguiza/c9c373aec07b96047d1ba484f23b7b47), [example](https://github.com/kiss90/time-series-classification). +Turn time series into images and use Neural Nets: [example](https://gist.github.com/oguiza/c9c373aec07b96047d1ba484f23b7b47), [example](https://github.com/kiss90/time-series-classification). [sktime](https://github.com/alan-turing-institute/sktime), [sktime-dl](https://github.com/uea-machine-learning/sktime-dl) - Toolbox for (deep) learning with time series. [adtk](https://github.com/arundo/adtk) - Time Series Anomaly Detection. [rocket](https://github.com/angus924/rocket) - Time Series classification using random convolutional kernels. @@ -686,6 +686,9 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks [Datascience Cheatsheets](https://github.com/FavioVazquez/ds-cheatsheets) +##### List of Books +[Mat Kelceys list of cool machine learning books](http://matpalm.com/blog/cool_machine_learning_books/) + ##### Other Awesome Lists [Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) [Awesome AI Booksmarks](https://github.com/goodrahstar/my-awesome-AI-bookmarks) From 264aa426be0c1abbbaf52cd9f66370bec9cd0791 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 14 Nov 2020 01:54:46 +0100 Subject: [PATCH 126/550] Update README.md --- README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index d044e4c..aca45a8 100644 --- a/README.md +++ b/README.md @@ -486,10 +486,11 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [backtrader](https://github.com/mementum/backtrader) - Backtesting for trading strategies. [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. The Quantopian Stack (some features may require signup on their platform): -[pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. -[zipline](https://github.com/quantopian/zipline) - Algorithmic trading. -[alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. -[empyrical](https://github.com/quantopian/empyrical) - Financial risk metrics. +* [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. +* [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. +* [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. +* [empyrical](https://github.com/quantopian/empyrical) - Financial risk metrics. +* [trading_calendars](https://github.com/quantopian/trading_calendars) - Calendars for various securities exchanges. [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. [tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. From 011975c7a420d7e3252fdd5089fcc3d0da9b927a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 14 Nov 2020 01:57:02 +0100 Subject: [PATCH 127/550] Update README.md --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index aca45a8..02a8bd9 100644 --- a/README.md +++ b/README.md @@ -485,14 +485,14 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [bt](https://github.com/pmorissette/bt) - Backtesting algorithms. [backtrader](https://github.com/mementum/backtrader) - Backtesting for trading strategies. [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. -The Quantopian Stack (some features may require signup on their platform): -* [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. -* [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. -* [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. -* [empyrical](https://github.com/quantopian/empyrical) - Financial risk metrics. -* [trading_calendars](https://github.com/quantopian/trading_calendars) - Calendars for various securities exchanges. [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. [tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. +The Quantopian Stack (some features may require signup on their platform): +[pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. +[zipline](https://github.com/quantopian/zipline) - Algorithmic trading. +[alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. +[empyrical](https://github.com/quantopian/empyrical) - Financial risk metrics. +[trading_calendars](https://github.com/quantopian/trading_calendars) - Calendars for various securities exchanges. #### Survival Analysis [Time-dependent Cox Model in R](https://stats.stackexchange.com/questions/101353/cox-regression-with-time-varying-covariates). From ad155af92ff3dc52a2d81ca7324f1dc74de45a5d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 15 Nov 2020 19:41:06 +0100 Subject: [PATCH 128/550] Added sequential analysis --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index 02a8bd9..c7ccd08 100644 --- a/README.md +++ b/README.md @@ -91,6 +91,12 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). +##### Interim Analyses / Sequential Analysis / Stopping +[Squential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. +[Treatment Effects Monitoring](https://online.stat.psu.edu/stat509/node/75/) - Design and Analysis of Clinical Trials PennState. +[sequential](https://cran.r-project.org/web/packages/Sequential/Sequential.pdf) - Exact Sequential Analysis for Poisson and Binomial Data (R package). +[confseq](https://github.com/gostevehoward/confseq) - Uniform boundaries, confidence sequences, and always-valid p-values. + ##### Visualizations [Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https://rpsychologist.com/d3/NHST/) [Correlation](https://rpsychologist.com/d3/correlation/) From f56cd77d56c8cf286ce8b450e6e0d6b4ae937f26 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 20 Nov 2020 19:20:25 +0100 Subject: [PATCH 129/550] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index c7ccd08..78f8361 100644 --- a/README.md +++ b/README.md @@ -485,6 +485,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [tscv](https://github.com/WenjieZ/TSCV) - Evaluation with gap. #### Financial Data +Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html), [2](https://calmcode.io/cvxpy-two/introduction.html) [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. [yfinance](https://github.com/ranaroussi/yfinance) - Read stock data from Yahoo Finance. [ffn](https://github.com/pmorissette/ffn) - Financial functions. @@ -599,6 +600,9 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric) - Graph representation learning with PyTorch. [DLG](https://github.com/dmlc/dgl) - Graph representation learning with TensorFlow. +#### Convex optimization +[cvxpy](https://github.com/cvxgrp/cvxpy) - Modeling language for convex optimization problems. Tutorial: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html), [2](https://calmcode.io/cvxpy-two/introduction.html) + #### Evolutionary Algorithms & Optimization [deap](https://github.com/DEAP/deap) - Evolutionary computation framework (Genetic Algorithm, Evolution strategies). [evol](https://github.com/godatadriven/evol) - DSL for composable evolutionary algorithms, [talk](https://www.youtube.com/watch?v=68ABAU_V8qI&t=11m49s). From 644cfc67d69149d2e787267a2c41c8ba39a5b2f4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 20 Nov 2020 21:57:19 +0100 Subject: [PATCH 130/550] Update README.md --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 78f8361..8aeba6a 100644 --- a/README.md +++ b/README.md @@ -210,13 +210,12 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. #### Dashboards -[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. Tutorial: [1](https://www.youtube.com/watch?v=J_Cy_QjG6NE), [2](https://www.youtube.com/watch?v=hRH01ZzT2NI), [3](https://www.youtube.com/watch?v=wv2MXJIdKRY), [4](https://www.youtube.com/watch?v=37Zj955LFT0), [5](https://www.youtube.com/watch?v=luixWRpp6Jo), [resources](https://github.com/ucg8j/awesome-dash) -[panel](https://panel.pyviz.org/index.html) - Dashboarding solution. -[bokeh](https://github.com/bokeh/bokeh) - Dashboarding solution. +[streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). +[dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. [Resources](https://github.com/ucg8j/awesome-dash). [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. +[panel](https://panel.pyviz.org/index.html) - Dashboarding solution. [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. -[streamlit](https://github.com/streamlit/streamlit) - Dashboards. #### Geographical Tools [folium](https://github.com/python-visualization/folium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https://github.com/jupyter-widgets/ipyleaflet). @@ -494,7 +493,9 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. [tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. -The Quantopian Stack (some features may require signup on their platform): +[surpriver](https://github.com/tradytics/surpriver) - Find high moving stocks before they move using anomaly detection and machine learning. + +##### Quantopian Stack [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. [zipline](https://github.com/quantopian/zipline) - Algorithmic trading. [alphalens](https://github.com/quantopian/alphalens) - Performance analysis of predictive stock factors. @@ -737,6 +738,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) #### Things I google a lot +[Color codes](https://github.com/d3/d3-3.x-api-reference/blob/master/Ordinal-Scales.md#categorical-colors) [Frequency codes for time series](https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) [Date parsing codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) [Feature Calculators tsfresh](https://github.com/blue-yonder/tsfresh/blob/master/tsfresh/feature_extraction/feature_calculators.py) From dd48791610ba2085d488277254d81ea2e660ec37 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 20 Nov 2020 23:49:38 +0100 Subject: [PATCH 131/550] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8aeba6a..75cb8de 100644 --- a/README.md +++ b/README.md @@ -487,13 +487,14 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html), [2](https://calmcode.io/cvxpy-two/introduction.html) [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. [yfinance](https://github.com/ranaroussi/yfinance) - Read stock data from Yahoo Finance. +[ta](https://github.com/bukosabino/ta) - Technical analysis library. +[backtrader](https://github.com/mementum/backtrader) - Backtesting for trading strategies. +[surpriver](https://github.com/tradytics/surpriver) - Find high moving stocks before they move using anomaly detection and machine learning. [ffn](https://github.com/pmorissette/ffn) - Financial functions. [bt](https://github.com/pmorissette/bt) - Backtesting algorithms. -[backtrader](https://github.com/mementum/backtrader) - Backtesting for trading strategies. [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. [tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. -[surpriver](https://github.com/tradytics/surpriver) - Find high moving stocks before they move using anomaly detection and machine learning. ##### Quantopian Stack [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. From d47915eee91d0713d84d209b56dc9976434093c7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 2 Dec 2020 10:36:46 +0100 Subject: [PATCH 132/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 75cb8de..94d914e 100644 --- a/README.md +++ b/README.md @@ -648,7 +648,6 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe #### Deployment and Lifecycle Management ##### Dependency Management -[pipreqs](https://github.com/bndr/pipreqs) - Generate a requirements.txt from import statements. [dephell](https://github.com/dephell/dephell) - Dependency management. [poetry](https://github.com/python-poetry/poetry) - Dependency management. [pyup](https://github.com/pyupio/pyup) - Dependency management. From 2fe4bae8af28bcde34b0b0fb367402a85f3b3c63 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 25 Dec 2020 21:24:25 +0100 Subject: [PATCH 133/550] uncertainty-toolbox --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 94d914e..c0ae245 100644 --- a/README.md +++ b/README.md @@ -562,6 +562,9 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learning-curve/). [yellowbrick](http://www.scikit-yb.org/en/latest/api/model_selection/learning_curve.html) - Learning curve. +#### Model Uncertainty +[uncertainty-toolbox](https://github.com/uncertainty-toolbox/uncertainty-toolbox) - Predictive uncertainty quantification, calibration, metrics, and visualization. + #### Model Explanation, Interpretability, Feature Importance [Book](https://christophm.github.io/interpretable-ml-book/agnostic.html), [Examples](https://github.com/jphall663/interpretable_machine_learning_with_python) [shap](https://github.com/slundberg/shap) - Explain predictions of machine learning models, [talk](https://www.youtube.com/watch?v=C80SQe16Rao). From 3a0db0c312f1bc0937be81610c9ac306ec20d5b0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 30 Dec 2020 10:59:47 +0100 Subject: [PATCH 134/550] pytorch lightning, hiplot, norfair, netron, pycaret --- README.md | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index c0ae245..0da5b34 100644 --- a/README.md +++ b/README.md @@ -208,6 +208,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [pm](https://github.com/anvaka/pm) - Navigatable 3D graph visualization (JS package), [example](https://w2v-vis-dot-hcg-team-di.appspot.com/#/galaxy/word2vec?cx=5698&cy=-5135&cz=5923&lx=0.1127&ly=0.3238&lz=-0.1680&lw=0.9242&ml=150&s=1.75&l=1&v=hc). [python-ternary](https://github.com/marcharper/python-ternary) - Triangle plots. [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. +[hiplot](https://github.com/facebookresearch/hiplot) - High dimensional Interactive Plotting. #### Dashboards [streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). @@ -251,6 +252,7 @@ Examples: [1](https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and- [lightgbm](https://github.com/Microsoft/LightGBM) - Gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, [doc](https://sites.google.com/view/lauraepp/parameters). [xgboost](https://github.com/dmlc/xgboost) - Gradient boosting (GBDT, GBRT or GBM) library, [doc](https://sites.google.com/view/lauraepp/parameters), Methods for CIs: [link1](https://stats.stackexchange.com/questions/255783/confidence-interval-for-xgb-forecast), [link2](https://towardsdatascience.com/regression-prediction-intervals-with-xgboost-428e0a018b). [catboost](https://github.com/catboost/catboost) - Gradient boosting. +[pycaret](https://github.com/pycaret/pycaret) - Wrapper for xgboost, lightgbm, catboost etc. [thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. [h2o](https://github.com/h2oai/h2o-3) - Gradient boosting. [forestci](https://github.com/scikit-learn-contrib/forest-confidence-interval) - Confidence intervals for random forests. @@ -330,7 +332,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Lossfunction Related [SegLoss](https://github.com/JunMa11/SegLoss) - List of loss functions for medical image segmentation. -#### Text Related +##### Text Related [ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras. [textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. [ctrl](https://github.com/salesforce/ctrl) - Text generation. @@ -358,19 +360,24 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [imgclsmob](https://github.com/osmr/imgclsmob) - Pretrained models. [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. +[pytorch-lightning](https://github.com/PyTorchLightning/PyTorch-lightning) - Wrapper around PyTorch. -#### Training-related +##### Training-related [livelossplot](https://github.com/stared/livelossplot) - Live training loss plot in Jupyter Notebook. -#### Object detection / Instance Segmentation +##### Architecture Visualization +[netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. + +##### Object detection / Instance Segmentation [yolact](https://github.com/dbolya/yolact) - Fully convolutional model for real-time instance segmentation. [EfficientDet Pytorch](https://github.com/toandaominh1997/EfficientDet.Pytorch), [EfficientDet Keras](https://github.com/xuannianz/EfficientDet) - Scalable and Efficient Object Detection. [detectron2](https://github.com/facebookresearch/detectron2) - Object Detection (Mask R-CNN) by Facebook. [simpledet](https://github.com/TuSimple/simpledet) - Object Detection and Instance Recognition. [CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. [FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. +[norfair](https://github.com/tryolabs/norfair) - Real-time 2D object tracking. -#### Image Classification +##### Image Classification [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Promising neural network architecture. ##### Applications and Snippets @@ -381,10 +388,10 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Posts: [1](https://www.thomasjpfan.com/2018/07/nuclei-image-segmentation-tutorial/), [2](https://www.thomasjpfan.com/2017/08/hassle-free-unets/) [deeplearning-models](https://github.com/rasbt/deeplearning-models) - Deep learning models. -#### Variational Autoencoders (VAE) +##### Variational Autoencoders (VAE) [disentanglement_lib](https://github.com/google-research/disentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE. -#### Graph-Based Neural Networks +##### Graph-Based Neural Networks [How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) [Introduction To Graph Convolutional Networks](http://tkipf.github.io/graph-convolutional-networks/) [ogb](https://ogb.stanford.edu/) - Open Graph Benchmark, Benchmark datasets. @@ -393,11 +400,13 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [dgl](https://github.com/dmlc/dgl) - Deep Graph Library. [graph_nets](https://github.com/deepmind/graph_nets) - Build graph networks in Tensorflow, by deepmind. +#### Model conversion +[hummingbird](https://github.com/microsoft/hummingbird) - Compile trained ML models into tensor computations (by Microsoft). + #### GPU [cuML](https://github.com/rapidsai/cuml) - Run traditional tabular ML tasks on GPUs. [thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. [thundersvm](https://github.com/Xtra-Computing/thundersvm) - Support Vector Machines. -[hummingbird](https://github.com/microsoft/hummingbird) - Convert ML models to models that run on the GPU (by Microsoft). #### Regression Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf), [forum](https://www.quora.com/How-does-support-vector-regression-work), [paper](http://alex.smola.org/papers/2003/SmoSch03b.pdf) From 9994eb335b2612d46f7d2495480971c8dfd934cd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 3 Jan 2021 13:29:16 +0100 Subject: [PATCH 135/550] Added Epidemiology Section. --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0da5b34..a021460 100644 --- a/README.md +++ b/README.md @@ -73,7 +73,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. [zappy](https://github.com/lasersonlab/zappy) - Distributed numpy arrays. -##### Command line tools, CSV +#### Command line tools, CSV [ni](https://github.com/spencertipping/ni) - Command line tool for big data. [xsv](https://github.com/BurntSushi/xsv) - Command line tool for indexing, slicing, analyzing, splitting and joining CSV files. [csvkit](https://csvkit.readthedocs.io/en/1.0.3/) - Another command line tool for CSV files. @@ -86,7 +86,6 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 ##### Statistical Tests and Packages [pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. -[researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. [Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). @@ -120,8 +119,11 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) +#### Epidemiology +[researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). +[zEpid](https://github.com/pzivich/zEpid) - Epidemiology analysis package, [Tutorial](https://github.com/pzivich/Python-for-Epidemiologists). + #### Frameworks -[scikit-learn](https://github.com/scikit-learn/scikit-learn) - General machine learning framework. [h2o](https://github.com/h2oai/h2o-3) - Machine learning framework. [caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). [mxnet](https://github.com/apache/incubator-mxnet) - Deep learning framework, [book](https://d2l.ai/index.html). From d6ef37d588d878d46cf38b581f45507710af47fd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 3 Jan 2021 13:29:58 +0100 Subject: [PATCH 136/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index a021460..6708530 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,6 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) [Using df.pipe() (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) [pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. - [modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. [vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. [pandarallel](https://github.com/nalepae/pandarallel) - Parallelize pandas operations. From 0e305544c8051c9426d1f981f0c97bd49b18b670 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 5 Jan 2021 21:44:09 +0100 Subject: [PATCH 137/550] snapml --- README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 6708530..0d1576e 100644 --- a/README.md +++ b/README.md @@ -122,11 +122,6 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [zEpid](https://github.com/pzivich/zEpid) - Epidemiology analysis package, [Tutorial](https://github.com/pzivich/Python-for-Epidemiologists). -#### Frameworks -[h2o](https://github.com/h2oai/h2o-3) - Machine learning framework. -[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). -[mxnet](https://github.com/apache/incubator-mxnet) - Deep learning framework, [book](https://d2l.ai/index.html). - #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). [littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. @@ -253,6 +248,8 @@ Examples: [1](https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and- [lightgbm](https://github.com/Microsoft/LightGBM) - Gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, [doc](https://sites.google.com/view/lauraepp/parameters). [xgboost](https://github.com/dmlc/xgboost) - Gradient boosting (GBDT, GBRT or GBM) library, [doc](https://sites.google.com/view/lauraepp/parameters), Methods for CIs: [link1](https://stats.stackexchange.com/questions/255783/confidence-interval-for-xgb-forecast), [link2](https://towardsdatascience.com/regression-prediction-intervals-with-xgboost-428e0a018b). [catboost](https://github.com/catboost/catboost) - Gradient boosting. +[h2o](https://github.com/h2oai/h2o-3) - Gradient boosting and general machine learning framework. +[snapml](https://www.zurich.ibm.com/snapml/) - Gradient boosting and general machine learning framework by IBM, for CPU and GPU. [PyPI](https://pypi.org/project/snapml/) [pycaret](https://github.com/pycaret/pycaret) - Wrapper for xgboost, lightgbm, catboost etc. [thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. [h2o](https://github.com/h2oai/h2o-3) - Gradient boosting. @@ -401,6 +398,10 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [dgl](https://github.com/dmlc/dgl) - Deep Graph Library. [graph_nets](https://github.com/deepmind/graph_nets) - Build graph networks in Tensorflow, by deepmind. +##### Other neural network and deep learning frameworks +[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). +[mxnet](https://github.com/apache/incubator-mxnet) - Deep learning framework, [book](https://d2l.ai/index.html). + #### Model conversion [hummingbird](https://github.com/microsoft/hummingbird) - Compile trained ML models into tensor computations (by Microsoft). From 2eec2e665b6893a130586f94bef82be0109740aa Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 6 Jan 2021 13:04:26 +0100 Subject: [PATCH 138/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0d1576e..bee46af 100644 --- a/README.md +++ b/README.md @@ -174,7 +174,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) - Multidimensional scaling (MDS). [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. -[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/). +[umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). From fd792455cd1caccc0051e89c56dc9394d8c4a0fb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 14 Jan 2021 09:45:59 +0100 Subject: [PATCH 139/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index bee46af..159b5ef 100644 --- a/README.md +++ b/README.md @@ -498,6 +498,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html), [2](https://calmcode.io/cvxpy-two/introduction.html) [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. [yfinance](https://github.com/ranaroussi/yfinance) - Read stock data from Yahoo Finance. +[findatapy](https://github.com/cuemacro/findatapy) - Read stock data from various sources. [ta](https://github.com/bukosabino/ta) - Technical analysis library. [backtrader](https://github.com/mementum/backtrader) - Backtesting for trading strategies. [surpriver](https://github.com/tradytics/surpriver) - Find high moving stocks before they move using anomaly detection and machine learning. From 3ad1ef9249add8ef13339317227b2a6f40186508 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 15 Jan 2021 10:48:16 +0100 Subject: [PATCH 140/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 159b5ef..3ff3d78 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,9 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Classical Statistics ##### Statistical Tests and Packages +[Verifying the Assumptions of Linear Models](https://github.com/erykml/medium_articles/blob/master/Statistics/linear_regression_assumptions.ipynb) +[Mediation and Moderation Intro](https://ademos.people.uic.edu/Chapter14.html) +[statsmodels](https://www.statsmodels.org/stable/index.html) - Statistical tests. [pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. From 04279b97eac8e509ba06878e4de5f5799aa74816 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 15 Jan 2021 12:37:33 +0100 Subject: [PATCH 141/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 3ff3d78..9e73e29 100644 --- a/README.md +++ b/README.md @@ -665,6 +665,9 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe #### Deployment and Lifecycle Management +##### Docker +[Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) + ##### Dependency Management [dephell](https://github.com/dephell/dephell) - Dependency management. [poetry](https://github.com/python-poetry/poetry) - Dependency management. From 38a6574ce95b8a7b8e0a0c75a6b0d0369b4aacba Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 15 Jan 2021 16:35:55 +0100 Subject: [PATCH 142/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9e73e29..ea56850 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [turicreate](https://github.com/apple/turicreate) - Helpful `SFrame` class for out-of-memory dataframes. [h2o](https://github.com/h2oai/h2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes. [datatable](https://github.com/h2oai/datatable) - Data Table for big data support. -[cuDF](https://github.com/rapidsai/cudf) - GPU DataFrame Library. +[cuDF](https://github.com/rapidsai/cudf) - GPU DataFrame Library, [Intro](https://www.youtube.com/watch?v=6XzS5XcpicM&t=2m50s). [ray](https://github.com/ray-project/ray/) - Flexible, high-performance distributed execution framework. [mars](https://github.com/mars-project/mars) - Tensor-based unified framework for large-scale data computation. [bottleneck](https://github.com/kwgoodman/bottleneck) - Fast NumPy array functions written in C. @@ -409,7 +409,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [hummingbird](https://github.com/microsoft/hummingbird) - Compile trained ML models into tensor computations (by Microsoft). #### GPU -[cuML](https://github.com/rapidsai/cuml) - Run traditional tabular ML tasks on GPUs. +[cuML](https://github.com/rapidsai/cuml) - RAPIDS, Run traditional tabular ML tasks on GPUs, [Intro](https://www.youtube.com/watch?v=6XzS5XcpicM&t=2m50s). [thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. [thundersvm](https://github.com/Xtra-Computing/thundersvm) - Support Vector Machines. From 724992e2edb613e981eae2c9c86fd56a236f4053 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 15 Jan 2021 16:45:21 +0100 Subject: [PATCH 143/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ea56850..f753192 100644 --- a/README.md +++ b/README.md @@ -397,6 +397,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [Introduction To Graph Convolutional Networks](http://tkipf.github.io/graph-convolutional-networks/) [ogb](https://ogb.stanford.edu/) - Open Graph Benchmark, Benchmark datasets. [networkx](https://github.com/networkx/networkx) - Graph library. +[cugraph](https://github.com/rapidsai/cugraph) - RAPIDS, Graph library on the GPU. [pytorch-geometric](https://github.com/rusty1s/pytorch_geometric) - Various methods for deep learning on graphs. [dgl](https://github.com/dmlc/dgl) - Deep Graph Library. [graph_nets](https://github.com/deepmind/graph_nets) - Build graph networks in Tensorflow, by deepmind. From 32cf20c0fc4afa90e554d327b32982e81d97b1f9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 21 Jan 2021 11:14:40 +0100 Subject: [PATCH 144/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index f753192..6559681 100644 --- a/README.md +++ b/README.md @@ -210,6 +210,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [hiplot](https://github.com/facebookresearch/hiplot) - High dimensional Interactive Plotting. #### Dashboards +[superset](https://github.com/apache/superset) - Dashboarding solution by Apache. [streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). [dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. [Resources](https://github.com/ucg8j/awesome-dash). [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. From c5b8b44f0d091a6b58da4397091eba2bc9ee8ce3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 21 Jan 2021 21:51:36 +0100 Subject: [PATCH 145/550] pigeon --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 6559681..9741c7e 100644 --- a/README.md +++ b/README.md @@ -379,6 +379,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. [norfair](https://github.com/tryolabs/norfair) - Real-time 2D object tracking. +##### Image Annotation +[pigeon](https://github.com/agermanidis/pigeon) - Create annotations from within a Jupyter notebook. + ##### Image Classification [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Promising neural network architecture. From 5cb9dae01a0eaaf8c3c1b42e083e6afd75043ea6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 27 Jan 2021 10:03:10 +0100 Subject: [PATCH 146/550] visdom --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9741c7e..5cdd607 100644 --- a/README.md +++ b/README.md @@ -208,6 +208,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [python-ternary](https://github.com/marcharper/python-ternary) - Triangle plots. [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. [hiplot](https://github.com/facebookresearch/hiplot) - High dimensional Interactive Plotting. +[visdom](https://github.com/fossasia/visdom) - Live Visualizations. #### Dashboards [superset](https://github.com/apache/superset) - Dashboarding solution by Apache. From 8320182ce3311fe49b5eedbf766309bfe15c992a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 28 Feb 2021 11:13:58 +0100 Subject: [PATCH 147/550] nfnets, iterative-stratification --- README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 5cdd607..5484324 100644 --- a/README.md +++ b/README.md @@ -185,6 +185,10 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. [trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. +#### Training-related +[iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. +[livelossplot](https://github.com/stared/livelossplot) - Live training loss plot in Jupyter Notebook. + #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). [cufflinks](https://github.com/santosjorge/cufflinks) - Dynamic visualization library, wrapper for [plotly](https://plot.ly/), [medium](https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e), [example](https://github.com/WillKoehrsen/Data-Analysis/blob/master/plotly/Plotly%20Whirlwind%20Introduction.ipynb). @@ -365,9 +369,6 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. [pytorch-lightning](https://github.com/PyTorchLightning/PyTorch-lightning) - Wrapper around PyTorch. -##### Training-related -[livelossplot](https://github.com/stared/livelossplot) - Live training loss plot in Jupyter Notebook. - ##### Architecture Visualization [netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. @@ -384,7 +385,8 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [pigeon](https://github.com/agermanidis/pigeon) - Create annotations from within a Jupyter notebook. ##### Image Classification -[efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Promising neural network architecture. +[nfnets](https://github.com/ypeleg/nfnets-keras) - Neural network. +[efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Neural network. ##### Applications and Snippets [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. @@ -499,7 +501,6 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [rocket](https://github.com/angus924/rocket) - Time Series classification using random convolutional kernels. ##### Time Series Evaluation - [TimeSeriesSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) - Sklearn time series split. [tscv](https://github.com/WenjieZ/TSCV) - Evaluation with gap. From 0ca3783b237a81b74c67bdddf93267672a49894a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 28 Feb 2021 12:05:45 +0100 Subject: [PATCH 148/550] segmentation_models --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 5484324..cfeaae4 100644 --- a/README.md +++ b/README.md @@ -373,6 +373,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. ##### Object detection / Instance Segmentation +[segmentation_models](https://github.com/qubvel/segmentation_models) - Segmentation models with pretrained backbones: Unet, FPN, Linknet, PSPNet. [yolact](https://github.com/dbolya/yolact) - Fully convolutional model for real-time instance segmentation. [EfficientDet Pytorch](https://github.com/toandaominh1997/EfficientDet.Pytorch), [EfficientDet Keras](https://github.com/xuannianz/EfficientDet) - Scalable and Efficient Object Detection. [detectron2](https://github.com/facebookresearch/detectron2) - Object Detection (Mask R-CNN) by Facebook. From 846e94c8c0c49f9446ae8a16b8516aab02b365bb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 4 Mar 2021 14:25:18 +0100 Subject: [PATCH 149/550] flexflow, numpy legate --- README.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index cfeaae4..b99fcdb 100644 --- a/README.md +++ b/README.md @@ -186,8 +186,8 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. #### Training-related -[iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. -[livelossplot](https://github.com/stared/livelossplot) - Live training loss plot in Jupyter Notebook. +[iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. +[livelossplot](https://github.com/stared/livelossplot) - Live training loss plot in Jupyter Notebook. #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). @@ -369,6 +369,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. [pytorch-lightning](https://github.com/PyTorchLightning/PyTorch-lightning) - Wrapper around PyTorch. +##### Distributed Libs +[flexflow](https://github.com/flexflow/FlexFlow) - Distributed TensorFlow Keras and PyTorch. + ##### Architecture Visualization [netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. @@ -386,8 +389,8 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [pigeon](https://github.com/agermanidis/pigeon) - Create annotations from within a Jupyter notebook. ##### Image Classification -[nfnets](https://github.com/ypeleg/nfnets-keras) - Neural network. -[efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Neural network. +[nfnets](https://github.com/ypeleg/nfnets-keras) - Neural network. +[efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Neural network. ##### Applications and Snippets [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. @@ -421,6 +424,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [cuML](https://github.com/rapidsai/cuml) - RAPIDS, Run traditional tabular ML tasks on GPUs, [Intro](https://www.youtube.com/watch?v=6XzS5XcpicM&t=2m50s). [thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. [thundersvm](https://github.com/Xtra-Computing/thundersvm) - Support Vector Machines. +Legate Numpy - Distributed Numpy array multiple using GPUs by Nvidia (not released yet) [video](https://www.youtube.com/watch?v=Jxxs_moibog). #### Regression Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf), [forum](https://www.quora.com/How-does-support-vector-regression-work), [paper](http://alex.smola.org/papers/2003/SmoSch03b.pdf) From df4777867f4a0e64d621a512473ff2aebae17283 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 4 Mar 2021 14:27:22 +0100 Subject: [PATCH 150/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b99fcdb..d0ba21a 100644 --- a/README.md +++ b/README.md @@ -70,7 +70,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [bolz](https://github.com/Blosc/bcolz) - A columnar data container that can be compressed. [cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. -[zappy](https://github.com/lasersonlab/zappy) - Distributed numpy arrays. +[zarr](https://github.com/zarr-developers/zarr-python) - Distributed numpy arrays. #### Command line tools, CSV [ni](https://github.com/spencertipping/ni) - Command line tool for big data. From a9b18a67ae865629e5919dc80bd7378e13fbb524 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 17 Mar 2021 11:42:20 +0100 Subject: [PATCH 151/550] luminaire --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index d0ba21a..9ac9e6b 100644 --- a/README.md +++ b/README.md @@ -504,6 +504,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [sktime](https://github.com/alan-turing-institute/sktime), [sktime-dl](https://github.com/uea-machine-learning/sktime-dl) - Toolbox for (deep) learning with time series. [adtk](https://github.com/arundo/adtk) - Time Series Anomaly Detection. [rocket](https://github.com/angus924/rocket) - Time Series classification using random convolutional kernels. +[luminaire](https://github.com/zillow/luminaire) - Anomaly Detection for time series. ##### Time Series Evaluation [TimeSeriesSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) - Sklearn time series split. @@ -549,6 +550,7 @@ RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests). Distances for comparing histograms and detecting outliers - [Talk](https://www.youtube.com/watch?v=U7xdiGc7IRU): [Kolmogorov-Smirnov](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ks_2samp.html), [Wasserstein](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html), [Energy Distance (Cramer)](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.energy_distance.html), [Kullback-Leibler divergence](https://scipy.github.io/devdocs/generated/scipy.stats.entropy.html). [banpei](https://github.com/tsurubee/banpei) - Anomaly detection library based on singular spectrum transformation. [telemanom](https://github.com/khundman/telemanom) - Detect anomalies in multivariate time series data using LSTMs. +[luminaire](https://github.com/zillow/luminaire) - Anomaly Detection for time series. #### Ranking [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear classification, regression and ranking. From e113c5c3def738db73494544f9557617a77c0799 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 00:27:16 +0100 Subject: [PATCH 152/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9ac9e6b..94ecd6f 100644 --- a/README.md +++ b/README.md @@ -706,7 +706,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [cortex](https://github.com/cortexlabs/cortex) - Deploy machine learning models. #### Math and Background -[All kinds of math and statistics resources](https://realnotcomplex.com/) +[All kinds of math and statistics resources](https://realnotcomplex.com/) Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm) Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machine Learning ](https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/) From 64e2a6007dc641f1fbbce146984e2e6e78d9e755 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 11:16:33 +0100 Subject: [PATCH 153/550] fast-histogram and mpl-scatter-density --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 94ecd6f..9014fed 100644 --- a/README.md +++ b/README.md @@ -193,6 +193,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). [cufflinks](https://github.com/santosjorge/cufflinks) - Dynamic visualization library, wrapper for [plotly](https://plot.ly/), [medium](https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e), [example](https://github.com/WillKoehrsen/Data-Analysis/blob/master/plotly/Plotly%20Whirlwind%20Introduction.ipynb). [physt](https://github.com/janpipek/physt) - Better histograms, [talk](https://www.youtube.com/watch?v=ZG-wH3-Up9Y), [notebook](https://nbviewer.jupyter.org/github/janpipek/pydata2018-berlin/blob/master/notebooks/talk.ipynb). +[fast-histogram](https://github.com/astrofrog/fast-histogram) - Fast histograms. [matplotlib_venn](https://github.com/konstantint/matplotlib-venn) - Venn diagrams, [alternative](https://github.com/penrose/penrose). [joypy](https://github.com/sbebo/joypy) - Draw stacked density plots. [mosaic plots](https://www.statsmodels.org/dev/generated/statsmodels.graphics.mosaicplot.mosaic.html) - Categorical variable visualization, [example](https://sukhbinder.wordpress.com/2018/09/18/mosaic-plot-in-python/). @@ -213,6 +214,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. [hiplot](https://github.com/facebookresearch/hiplot) - High dimensional Interactive Plotting. [visdom](https://github.com/fossasia/visdom) - Live Visualizations. +[mpl-scatter-density](https://github.com/astrofrog/mpl-scatter-density) - Scatter density plots. Alternative to 2d-histograms. #### Dashboards [superset](https://github.com/apache/superset) - Dashboarding solution by Apache. From bd65cee5969d0a078f001f0a27f9656d351ade7e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 11:25:42 +0100 Subject: [PATCH 154/550] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 9014fed..5d05c57 100644 --- a/README.md +++ b/README.md @@ -216,6 +216,10 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [visdom](https://github.com/fossasia/visdom) - Live Visualizations. [mpl-scatter-density](https://github.com/astrofrog/mpl-scatter-density) - Scatter density plots. Alternative to 2d-histograms. +#### Colors +[palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). +[colorcet](https://github.com/holoviz/colorcet) - Collection of perceptually uniform colormaps. + #### Dashboards [superset](https://github.com/apache/superset) - Dashboarding solution by Apache. [streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). From 58aa39ae5a9879f9e901bfbb98d7b1a8e6f28d10 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 14:20:06 +0100 Subject: [PATCH 155/550] Update README.md --- README.md | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 5d05c57..676bd5e 100644 --- a/README.md +++ b/README.md @@ -448,11 +448,9 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [human-learn](https://github.com/koaning/human-learn) - Create and tune classifier based on your rule set. #### Clustering -[Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering) -[Assessing the quality of a clustering (video)](https://www.youtube.com/watch?v=Mf6MqIS2ql4) +[Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering). [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. -[fpc](https://cran.r-project.org/web/packages/fpc/index.html) - Various methods for clustering and cluster validation (R package). [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. @@ -461,6 +459,30 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. +##### Cluster Evalutation +* [Adjusted Rand Index](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html) +* [Normalized Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html) +* [Adjusted Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html) +* [Fowlkes-Mallows Score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fowlkes_mallows_score.html) +* [Silhouette Coefficient](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html) + +[Assessing the quality of a clustering (video)](https://www.youtube.com/watch?v=Mf6MqIS2ql4) +[fpc](https://cran.r-project.org/web/packages/fpc/index.html) - Various methods for clustering and cluster validation (R package). +* Minimum distance between any two clusters +* Distance between centroids +* p-separation index: Like minimum distance. Look at the average distance to nearest point in different cluster for p=10% "border" points in any cluster. Measuring density, measuring mountains vs valleys +* Estimate density by weighted count of close points Other measures +* Within-cluster average distance +* Mean of within-cluster average distance over nearest-cluster average distance (silhouette score) +* Within-cluster similarity measure to normal/uniform +* Within-cluster (squared) distance to centroid (this is the k-Means loss function) +* Correlation coefficient between distance we originally had to the distance the are induced by the clustering (Huberts Gamma) +* Entropy of cluster sizes +* Average largest within-cluster gap +* Variation of clusterings on bootstrapped data + + + #### Interpretable Classifiers and Regressors [skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. [sklearn-expertsys](https://github.com/tmadl/sklearn-expertsys) - Interpretable classifiers, Bayesian Rule List classifier. From b5d98fd2bae28dffb36cee5d42eaad815da0b11e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 14:31:25 +0100 Subject: [PATCH 156/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 676bd5e..54cb98c 100644 --- a/README.md +++ b/README.md @@ -459,8 +459,9 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. -##### Cluster Evalutation +##### Clustering Evalutation * [Adjusted Rand Index](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html) +* [Variation of Information](https://gist.github.com/jwcarr/626cbc80e0006b526688), [Julia](https://clusteringjl.readthedocs.io/en/latest/varinfo.html) * [Normalized Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html) * [Adjusted Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html) * [Fowlkes-Mallows Score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fowlkes_mallows_score.html) From 5ff17055990e36f44c2733aea71e4e98d8490a2a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 14:58:07 +0100 Subject: [PATCH 157/550] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 54cb98c..a495dcf 100644 --- a/README.md +++ b/README.md @@ -460,12 +460,14 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. ##### Clustering Evalutation +* [Consensus Score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.consensus_score.html#sklearn.metrics.consensus_score) - The similarity of two sets of biclusters. * [Adjusted Rand Index](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html) -* [Variation of Information](https://gist.github.com/jwcarr/626cbc80e0006b526688), [Julia](https://clusteringjl.readthedocs.io/en/latest/varinfo.html) * [Normalized Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html) * [Adjusted Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html) * [Fowlkes-Mallows Score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fowlkes_mallows_score.html) * [Silhouette Coefficient](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html) +* [Variation of Information](https://gist.github.com/jwcarr/626cbc80e0006b526688), [Julia](https://clusteringjl.readthedocs.io/en/latest/varinfo.html) +* [Pair Confusion Matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cluster.pair_confusion_matrix.html) [Assessing the quality of a clustering (video)](https://www.youtube.com/watch?v=Mf6MqIS2ql4) [fpc](https://cran.r-project.org/web/packages/fpc/index.html) - Various methods for clustering and cluster validation (R package). From b0331ef73801ef1493565724f9822a5198cd6078 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 15:01:32 +0100 Subject: [PATCH 158/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index a495dcf..c4c2d4d 100644 --- a/README.md +++ b/README.md @@ -460,7 +460,6 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. ##### Clustering Evalutation -* [Consensus Score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.consensus_score.html#sklearn.metrics.consensus_score) - The similarity of two sets of biclusters. * [Adjusted Rand Index](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html) * [Normalized Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html) * [Adjusted Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html) From 00408ecde883af0f991b0b3a0128f52717e91f3a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 15:22:21 +0100 Subject: [PATCH 159/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c4c2d4d..70040cf 100644 --- a/README.md +++ b/README.md @@ -467,6 +467,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach * [Silhouette Coefficient](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html) * [Variation of Information](https://gist.github.com/jwcarr/626cbc80e0006b526688), [Julia](https://clusteringjl.readthedocs.io/en/latest/varinfo.html) * [Pair Confusion Matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cluster.pair_confusion_matrix.html) +* [Consensus Score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.consensus_score.html) - The similarity of two sets of biclusters. [Assessing the quality of a clustering (video)](https://www.youtube.com/watch?v=Mf6MqIS2ql4) [fpc](https://cran.r-project.org/web/packages/fpc/index.html) - Various methods for clustering and cluster validation (R package). From 509f4fd310d6d8fc42bae23500e19e713753c9b2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Mar 2021 15:23:07 +0100 Subject: [PATCH 160/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 70040cf..eda15f5 100644 --- a/README.md +++ b/README.md @@ -460,6 +460,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. ##### Clustering Evalutation +[Wagner, Wagner - Comparing Clusterings - An Overview](https://publikationen.bibliothek.kit.edu/1000011477/812079) * [Adjusted Rand Index](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html) * [Normalized Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html) * [Adjusted Mutual Information](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html) From eb2d770224eb641ba026a607a8535db51ad60c52 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 22 Mar 2021 16:53:50 +0100 Subject: [PATCH 161/550] pandasgui --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index eda15f5..c67cd24 100644 --- a/README.md +++ b/README.md @@ -127,7 +127,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). -[littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. +[pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. @@ -136,6 +136,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Utility functions (`OneHotEncoder(min_obs=100)`) [pyupset](https://github.com/ImSoErgodic/py-upset) - Visualizing intersecting sets. [pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance, similarity between histograms. +[littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. #### Train / Test Split [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Stratification of multilabel data. From 942df19bbc01a7589ffb1085371976e61a3d7457 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 23 Mar 2021 15:53:01 +0100 Subject: [PATCH 162/550] Ridge plots --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c67cd24..8db7a63 100644 --- a/README.md +++ b/README.md @@ -196,7 +196,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [physt](https://github.com/janpipek/physt) - Better histograms, [talk](https://www.youtube.com/watch?v=ZG-wH3-Up9Y), [notebook](https://nbviewer.jupyter.org/github/janpipek/pydata2018-berlin/blob/master/notebooks/talk.ipynb). [fast-histogram](https://github.com/astrofrog/fast-histogram) - Fast histograms. [matplotlib_venn](https://github.com/konstantint/matplotlib-venn) - Venn diagrams, [alternative](https://github.com/penrose/penrose). -[joypy](https://github.com/sbebo/joypy) - Draw stacked density plots. +[joypy](https://github.com/sbebo/joypy) - Draw stacked density plots (=ridge plots), [Ridge plots in seaborn](https://seaborn.pydata.org/examples/kde_ridgeplot.html). [mosaic plots](https://www.statsmodels.org/dev/generated/statsmodels.graphics.mosaicplot.mosaic.html) - Categorical variable visualization, [example](https://sukhbinder.wordpress.com/2018/09/18/mosaic-plot-in-python/). [scikit-plot](https://github.com/reiinakano/scikit-plot) - ROC curves and other visualizations for ML models. [yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Visualizations for ML models (similar to scikit-plot). From 2cb43959872802051c64fafa67cb89508f5ee3cc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 26 Mar 2021 12:46:25 +0100 Subject: [PATCH 163/550] quantstats --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8db7a63..b545ce2 100644 --- a/README.md +++ b/README.md @@ -556,6 +556,7 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. [tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. +[quantstats](https://github.com/ranaroussi/quantstats) - Portfolio management. ##### Quantopian Stack [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. From d735cb68a3d7596abc4873e6493aadc32c01f630 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 29 Mar 2021 00:19:01 +0200 Subject: [PATCH 164/550] Epidemiology --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index b545ce2..12d6bac 100644 --- a/README.md +++ b/README.md @@ -122,9 +122,13 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) #### Epidemiology +[R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) +[incidence2](https://github.com/reconhub/incidence2) - Computation, handling, visualisation and simple modelling of incidence (R package). +[EpiEstim](https://github.com/mrc-ide/EpiEstim) - Estimate time varying instantaneous reproduction number R during epidemics (R package) [paper](https://academic.oup.com/aje/article/178/9/1505/89262). [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [zEpid](https://github.com/pzivich/zEpid) - Epidemiology analysis package, [Tutorial](https://github.com/pzivich/Python-for-Epidemiologists). + #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). [pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. From d9a5bcc6be6c2e783e83d3f417442c422c5b7e90 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 29 Mar 2021 11:10:09 +0200 Subject: [PATCH 165/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 12d6bac..0d97dd9 100644 --- a/README.md +++ b/README.md @@ -397,6 +397,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [norfair](https://github.com/tryolabs/norfair) - Real-time 2D object tracking. ##### Image Annotation +[cvat](https://github.com/openvinotoolkit/cvat) - Image annotation tool. [pigeon](https://github.com/agermanidis/pigeon) - Create annotations from within a Jupyter notebook. ##### Image Classification From 69e83adde1309e75f25aac1b89adbb2278c31c5e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 6 Apr 2021 14:43:49 +0200 Subject: [PATCH 166/550] drawdata --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0d97dd9..d18b0f3 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. #### Helpful +[drawdata](https://github.com/koaning/drawdata) - Quickly draw some points and export them as csv, [website](https://drawdata.xyz/). [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). [icecream](https://github.com/gruns/icecream) - Simple debugging output. [loguru](https://github.com/Delgan/loguru) - Python logging. From 707cfa5571014adab92dd85c81d708116659f4b1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 3 May 2021 10:14:15 +0200 Subject: [PATCH 167/550] Update README.md --- README.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index d18b0f3..c88d182 100644 --- a/README.md +++ b/README.md @@ -178,18 +178,17 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. #### Dimensionality Reduction -[Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU) +[Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). +[sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). -[sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) - Multidimensional scaling (MDS). -[sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) - t-distributed Stochastic Neighbor Embedding (t-SNE), [intro](https://distill.pub/2016/misread-tsne/). Faster implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE). -[FIt-SNE](https://github.com/KlugerLab/FIt-SNE) - Fast Fourier Transform-accelerated Interpolation-based t-SNE. +Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE), [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). -[sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) - Truncated SVD (aka LSA). [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. [trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. +[scanpy](https://github.com/theislab/scanpy) - [Force-directed graph drawing](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.draw_graph.html#scanpy.tl.draw_graph), [Diffusion Maps](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.diffmap.html). #### Training-related [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. From 7da2e2dece01b74f078f6edaac63fb2e15f25fd8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 3 May 2021 10:15:58 +0200 Subject: [PATCH 168/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c88d182..3a50e32 100644 --- a/README.md +++ b/README.md @@ -184,7 +184,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE), [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). -[scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ). +[scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ), [paper](https://www.uncg.edu/mat/faculty/cdsmyth/topological-approaches-skin.pdf). [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. [trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. From c02ce5d78bd375f56870af2c3eb1091498b755ae Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 3 May 2021 23:34:48 +0200 Subject: [PATCH 169/550] samplics --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3a50e32..073fe6b 100644 --- a/README.md +++ b/README.md @@ -234,6 +234,9 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. +#### Survey Tools +[samplics](https://github.com/samplics-org/samplics) - Sampling techniques for complex survey designs. + #### Geographical Tools [folium](https://github.com/python-visualization/folium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https://github.com/jupyter-widgets/ipyleaflet). [gmaps](https://github.com/pbugnion/gmaps) - Google Maps for Jupyter notebooks. @@ -491,8 +494,6 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach * Average largest within-cluster gap * Variation of clusterings on bootstrapped data - - #### Interpretable Classifiers and Regressors [skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. [sklearn-expertsys](https://github.com/tmadl/sklearn-expertsys) - Interpretable classifiers, Bayesian Rule List classifier. From fdc09f26efae647f4b5328b422b6e68f97d96c41 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 5 May 2021 22:24:00 +0200 Subject: [PATCH 170/550] Update README.md --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 073fe6b..0bc0b06 100644 --- a/README.md +++ b/README.md @@ -140,7 +140,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [tspreprocess](https://github.com/MaxBenChrist/tspreprocess) - Time series preprocessing: Denoising, Compression, Resampling. [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Utility functions (`OneHotEncoder(min_obs=100)`) [pyupset](https://github.com/ImSoErgodic/py-upset) - Visualizing intersecting sets. -[pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance, similarity between histograms. +[pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html) [littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. #### Train / Test Split @@ -161,6 +161,12 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. [feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. +##### Feature Engineering Images +[skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. +[mahotas](https://github.com/luispedro/mahotas) - Zernike, Haralick, LBP, and TAS features. +[pyradiomics](https://github.com/AIM-Harvard/pyradiomics) - Radiomics features from medical imaging. +[pyefd](https://github.com/hbldh/pyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series. + #### Feature Selection [Talk](https://www.youtube.com/watch?v=JsArBz46_3s) Blog post series - [1](http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/), [2](http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/), [3](http://blog.datadive.net/selecting-good-features-part-iii-random-forests/), [4](http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/) From bffebf579d42872ff6374aa7bc3a15cf84264f1f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 6 May 2021 13:35:12 +0200 Subject: [PATCH 171/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0bc0b06..5b37ae0 100644 --- a/README.md +++ b/README.md @@ -191,6 +191,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ), [paper](https://www.uncg.edu/mat/faculty/cdsmyth/topological-approaches-skin.pdf). +[giotto-tda](https://github.com/giotto-ai/giotto-tda) - Topological Data Analysis. [mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. [trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. From 319f1829ec42befe46b6f42ddf053f0f897e545d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 7 May 2021 14:38:26 +0200 Subject: [PATCH 172/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5b37ae0..6b42535 100644 --- a/README.md +++ b/README.md @@ -140,7 +140,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [tspreprocess](https://github.com/MaxBenChrist/tspreprocess) - Time series preprocessing: Denoising, Compression, Resampling. [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Utility functions (`OneHotEncoder(min_obs=100)`) [pyupset](https://github.com/ImSoErgodic/py-upset) - Visualizing intersecting sets. -[pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html) +[pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance / Wasserstein distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html), [POT implementation](https://pythonot.github.io/auto_examples/plot_OT_2D_samples.html) [littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. #### Train / Test Split From c23b6a4fe4cf0c1ef0bc37fd875fbcbad601b8ef Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 May 2021 16:06:49 +0200 Subject: [PATCH 173/550] Dimensionality Reduction Algorithms updated --- README.md | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6b42535..676ef25 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@ # Awesome Data Science with Python > A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks. +![image](https://user-images.githubusercontent.com/7324891/118276826-5fe16d00-b4c8-11eb-90bf-2722e7160d20.png) #### Core [pandas](https://pandas.pydata.org/) - Data structures built on top of [numpy](https://www.numpy.org/). @@ -184,6 +185,24 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. #### Dimensionality Reduction + +##### Selection + +* PCA - [link](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) +* Autoencoder - [link](https://blog.keras.io/building-autoencoders-in-keras.html) +* Isomaps - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html#sklearn.manifold.Isomap) +* LLE - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html) +* Force-directed graph drawing - [link](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.draw_graph.html#scanpy.tl.draw_graph) +* MDS - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) +* Diffusion Maps - [link](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.diffmap.html) +* t-SNE - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE) +* NeRV - [link](https://github.com/ziyuang/pynerv), [paper](https://www.jmlr.org/papers/volume11/venna10a/venna10a.pdf) +* MDR - [link](https://github.com/EpistasisLab/scikit-mdr) +* UMAP - [link](https://github.com/lmcinnes/umap) +* Ivis - [link](https://github.com/beringresearch/ivis) + +##### Packages + [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). @@ -192,10 +211,11 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ), [paper](https://www.uncg.edu/mat/faculty/cdsmyth/topological-approaches-skin.pdf). [giotto-tda](https://github.com/giotto-ai/giotto-tda) - Topological Data Analysis. -[mdr](https://github.com/EpistasisLab/scikit-mdr) - Dimensionality reduction, multifactor dimensionality reduction (MDR). [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. [trimap](https://github.com/eamid/trimap) - Dimensionality reduction using triplets. [scanpy](https://github.com/theislab/scanpy) - [Force-directed graph drawing](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.draw_graph.html#scanpy.tl.draw_graph), [Diffusion Maps](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.diffmap.html). +[direpack](https://github.com/SvenSerneels/direpack) - Projection pursuit, Sufficient dimension reduction, Robust M-estimators. +[DBS](https://cran.r-project.org/web/packages/DatabionicSwarm/vignettes/DatabionicSwarm.html) - DatabionicSwarm (R package). #### Training-related [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. @@ -467,6 +487,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering). [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. +[FCPS](https://github.com/Mthrun/FCPS) - Fundamental Clustering Problems Suite (R package). [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. [buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. From 559bbd7da7fe1a031e0f12349668a1fbe855497d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 May 2021 16:07:06 +0200 Subject: [PATCH 174/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 676ef25..7035d05 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,6 @@ # Awesome Data Science with Python > A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks. -![image](https://user-images.githubusercontent.com/7324891/118276826-5fe16d00-b4c8-11eb-90bf-2722e7160d20.png) #### Core [pandas](https://pandas.pydata.org/) - Data structures built on top of [numpy](https://www.numpy.org/). From fff5883513c6e1478256d955d00a668ba0f1bb14 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 May 2021 16:07:52 +0200 Subject: [PATCH 175/550] Update README.md --- README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 7035d05..3827b79 100644 --- a/README.md +++ b/README.md @@ -187,18 +187,18 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection ##### Selection -* PCA - [link](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) -* Autoencoder - [link](https://blog.keras.io/building-autoencoders-in-keras.html) -* Isomaps - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html#sklearn.manifold.Isomap) -* LLE - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html) -* Force-directed graph drawing - [link](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.draw_graph.html#scanpy.tl.draw_graph) -* MDS - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) -* Diffusion Maps - [link](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.diffmap.html) -* t-SNE - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE) -* NeRV - [link](https://github.com/ziyuang/pynerv), [paper](https://www.jmlr.org/papers/volume11/venna10a/venna10a.pdf) -* MDR - [link](https://github.com/EpistasisLab/scikit-mdr) -* UMAP - [link](https://github.com/lmcinnes/umap) -* Ivis - [link](https://github.com/beringresearch/ivis) +PCA - [link](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) +Autoencoder - [link](https://blog.keras.io/building-autoencoders-in-keras.html) +Isomaps - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html#sklearn.manifold.Isomap) +LLE - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html) +Force-directed graph drawing - [link](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.draw_graph.html#scanpy.tl.draw_graph) +MDS - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) +Diffusion Maps - [link](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.diffmap.html) +t-SNE - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE) +NeRV - [link](https://github.com/ziyuang/pynerv), [paper](https://www.jmlr.org/papers/volume11/venna10a/venna10a.pdf) +MDR - [link](https://github.com/EpistasisLab/scikit-mdr) +UMAP - [link](https://github.com/lmcinnes/umap) +Ivis - [link](https://github.com/beringresearch/ivis) ##### Packages From 3fb83212a6674f22d514c44fe3bab005ef7fc5f6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 May 2021 16:16:56 +0200 Subject: [PATCH 176/550] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3827b79..772f6b3 100644 --- a/README.md +++ b/README.md @@ -186,7 +186,8 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection #### Dimensionality Reduction ##### Selection - +[Review](https://members.loria.fr/moberger/Enseignement/AVR/Exposes/TR_Dimensiereductie.pdf) + PCA - [link](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) Autoencoder - [link](https://blog.keras.io/building-autoencoders-in-keras.html) Isomaps - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html#sklearn.manifold.Isomap) @@ -484,6 +485,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering). +[Clustering with Deep Learning: Taxonomy and New Methods](https://arxiv.org/pdf/1801.07648.pdf). [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. [FCPS](https://github.com/Mthrun/FCPS) - Fundamental Clustering Problems Suite (R package). From a61af5bf42e90f90f52dd54c4e94993d679a14f7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 7 Jun 2021 20:38:01 +0200 Subject: [PATCH 177/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 772f6b3..9f9b556 100644 --- a/README.md +++ b/README.md @@ -642,7 +642,8 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [neural-tangents](https://github.com/google/neural-tangents) - Infinite Neural Networks. #### Gaussian Processes -[GPyOpt](https://github.com/SheffieldML/GPyOpt) - Gaussian process optimization. +[Visualization](http://www.infinitecuriosity.org/vizgp/), [Article](https://distill.pub/2019/visual-exploration-gaussian-processes/) +[GPyOpt](https://github.com/SheffieldML/GPyOpt) - Gaussian process optimization. [GPflow](https://github.com/GPflow/GPflow) - Gaussian processes (Tensorflow). [gpytorch](https://gpytorch.ai/) - Gaussian processes (Pytorch). From 4333501400215ff9614badbd677cd97a135e3e07 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 10 Jun 2021 17:57:46 +0200 Subject: [PATCH 178/550] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9f9b556..95086da 100644 --- a/README.md +++ b/README.md @@ -186,6 +186,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection #### Dimensionality Reduction ##### Selection +Check also the Clustering section for ideas! [Review](https://members.loria.fr/moberger/Enseignement/AVR/Exposes/TR_Dimensiereductie.pdf) PCA - [link](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) @@ -209,6 +210,7 @@ Ivis - [link](https://github.com/beringresearch/ivis) Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE), [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). +[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ), [paper](https://www.uncg.edu/mat/faculty/cdsmyth/topological-approaches-skin.pdf). [giotto-tda](https://github.com/giotto-ai/giotto-tda) - Topological Data Analysis. [ivis](https://github.com/beringresearch/ivis) - Dimensionality reduction using Siamese Networks. @@ -495,7 +497,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. -[somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. +[phenograph](https://github.com/dpeerlab/phenograph) - Clustering by community detection. ##### Clustering Evalutation [Wagner, Wagner - Comparing Clusterings - An Overview](https://publikationen.bibliothek.kit.edu/1000011477/812079) From 41ad3d895e4214170fc1dd07f9174dfcb789d32d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 10 Jun 2021 18:00:49 +0200 Subject: [PATCH 179/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 95086da..616ae46 100644 --- a/README.md +++ b/README.md @@ -497,6 +497,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. +[distribution_clustering](https://github.com/EricElmoznino/distribution_clustering), [paper](https://arxiv.org/abs/1804.02624), [related paper](https://arxiv.org/abs/2003.07770), [alt](https://github.com/r0f1/distribution_clustering). [phenograph](https://github.com/dpeerlab/phenograph) - Clustering by community detection. ##### Clustering Evalutation From 324c1ad93c58150deef5e2c3ed34c991ca35dc8f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 11 Jun 2021 13:18:21 +0200 Subject: [PATCH 180/550] Awesome Metric Learning --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 616ae46..c5f48bf 100644 --- a/README.md +++ b/README.md @@ -831,6 +831,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) +[Awesome Metric Learning](https://github.com/kdhht2334/Survey_of_Deep_Metric_Learning) [Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) From a1a1723487e6109bcf1917aa387b76edee369986 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 12 Jun 2021 23:00:06 +0200 Subject: [PATCH 181/550] metric learning --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c5f48bf..d9043d8 100644 --- a/README.md +++ b/README.md @@ -485,6 +485,12 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [DESlib](https://github.com/scikit-learn-contrib/DESlib) - Dynamic classifier and ensemble selection. [human-learn](https://github.com/koaning/human-learn) - Create and tune classifier based on your rule set. +#### Metric Learning +[metric-learn](https://github.com/scikit-learn-contrib/metric-learn) - Supervised and weakly-supervised metric learning algorithms. +[pytorch-metric-learning](https://github.com/KevinMusgrave/pytorch-metric-learning) - Pytorch metric learning. +[deep_metric_learning](https://github.com/ronekko/deep_metric_learning) - Methods for deep metric learning. +[ivis](https://bering-ivis.readthedocs.io/en/latest/supervised.html) - Metric learning using siamese neural networks. + #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering). [Clustering with Deep Learning: Taxonomy and New Methods](https://arxiv.org/pdf/1801.07648.pdf). @@ -789,7 +795,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [unyt](https://github.com/yt-project/unyt) - Working with units. [scrapy](https://github.com/scrapy/scrapy) - Web scraping library. [VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit) - ML Toolkit from Microsoft. -[metric-learn](https://github.com/metric-learn/metric-learn) - Metric learning. #### General Python Programming [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. From 26d4ceb014fc6ec99d703ad27fe8d66315863bae Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 17 Jun 2021 21:39:59 +0200 Subject: [PATCH 182/550] augly --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d9043d8..dabc127 100644 --- a/README.md +++ b/README.md @@ -378,6 +378,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [albumentations](https://github.com/albu/albumentations) - Wrapper around imgaug and other libraries. [augmix](https://github.com/google-research/augmix) - Image augmentation from Google. [kornia](https://github.com/kornia/kornia) - Image augmentation, feature extraction and loss functions. +[augly](https://github.com/facebookresearch/AugLy) - Image, audio, text, video augmentation from Facebook. ##### Lossfunction Related [SegLoss](https://github.com/JunMa11/SegLoss) - List of loss functions for medical image segmentation. From ffa681a20f7e16722bc8ab24c74a79abd13f7624 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 22 Jun 2021 17:18:55 +0200 Subject: [PATCH 183/550] kats --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index dabc127..a663e46 100644 --- a/README.md +++ b/README.md @@ -549,9 +549,10 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach #### Time Series [statsmodels](https://www.statsmodels.org/dev/tsa.html) - Time series analysis, [seasonal decompose](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) [example](https://gist.github.com/balzer82/5cec6ad7adc1b550e7ee), [SARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html), [granger causality](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.grangercausalitytests.html). +[kats](https://github.com/facebookresearch/kats) - Time series prediction library by Facebook. +[prophet](https://github.com/facebook/prophet) - Time series prediction library by Facebook. [pyramid](https://github.com/tgsmith61591/pyramid), [pmdarima](https://github.com/tgsmith61591/pmdarima) - Wrapper for (Auto-) ARIMA. [pyflux](https://github.com/RJT1990/pyflux) - Time series prediction algorithms (ARIMA, GARCH, GAS, Bayesian). -[prophet](https://github.com/facebook/prophet) - Time series prediction library. [atspy](https://github.com/firmai/atspy) - Automated Time Series Models. [pm-prophet](https://github.com/luke14free/pm-prophet) - Time series prediction and decomposition library. [htsprophet](https://github.com/CollinRooney12/htsprophet) - Hierarchical Time Series Forecasting using Prophet. From 6a1949252cc58b4803e80314bbb083bfe66f37f2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 24 Jun 2021 11:12:15 +0200 Subject: [PATCH 184/550] cleanlab --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index a663e46..fa15ed1 100644 --- a/README.md +++ b/README.md @@ -132,6 +132,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). +[cleanlab](https://github.com/cgnorthcutt/cleanlab) - Imageing data: Machine learning with noisy labels and finding mislabeled data. [pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [impyute](https://github.com/eltonlaw/impyute) - Imputations. From 7b76751231477179c4511b4ea07af8aac978e3c8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 13 Jul 2021 11:20:48 +0200 Subject: [PATCH 185/550] Contrastive Representation Learning --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index fa15ed1..51b55c6 100644 --- a/README.md +++ b/README.md @@ -488,6 +488,8 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [human-learn](https://github.com/koaning/human-learn) - Create and tune classifier based on your rule set. #### Metric Learning +[Contrastive Representation Learning](https://lilianweng.github.io/lil-log/2021/05/31/contrastive-representation-learning.html) + [metric-learn](https://github.com/scikit-learn-contrib/metric-learn) - Supervised and weakly-supervised metric learning algorithms. [pytorch-metric-learning](https://github.com/KevinMusgrave/pytorch-metric-learning) - Pytorch metric learning. [deep_metric_learning](https://github.com/ronekko/deep_metric_learning) - Methods for deep metric learning. From 8eb79ac2896b2b2befea3f465353f9c29bc3ec02 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 14 Jul 2021 16:35:21 +0200 Subject: [PATCH 186/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 51b55c6..8ed237d 100644 --- a/README.md +++ b/README.md @@ -87,10 +87,10 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Verifying the Assumptions of Linear Models](https://github.com/erykml/medium_articles/blob/master/Statistics/linear_regression_assumptions.ipynb) [Mediation and Moderation Intro](https://ademos.people.uic.edu/Chapter14.html) [statsmodels](https://www.statsmodels.org/stable/index.html) - Statistical tests. -[pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. +[pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. [Pairwise correlation between columns of pandas DataFrame](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html) [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. -[Bland-Altman Plot](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. +Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandaltman.html), [2](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). ##### Interim Analyses / Sequential Analysis / Stopping From a230bdfcd87b0484a6065bbea9321ae13ce162b0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 4 Aug 2021 17:10:22 +0200 Subject: [PATCH 187/550] groot --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8ed237d..755ee2a 100644 --- a/README.md +++ b/README.md @@ -318,6 +318,7 @@ Why the default feature importance for random forests is wrong: [link](http://ex [infiniteboost](https://github.com/arogozhnikov/infiniteboost) - Combination of RFs and GBDTs. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [rrcf](https://github.com/kLabUM/rrcf) - Robust Random Cut Forest algorithm for anomaly detection on streams. +[groot](https://github.com/tudelft-cda-lab/GROOT) - Robust decision trees. #### Natural Language Processing (NLP) / Text Processing [talk](https://www.youtube.com/watch?v=6zm9NC9uRkk)-[nb](https://nbviewer.jupyter.org/github/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb), [nb2](https://ahmedbesbes.com/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html), [talk](https://www.youtube.com/watch?time_continue=2&v=sI7VpFNiy_I). From 505069a68b08f710b5d72eef788989480f09dcb3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 13 Aug 2021 13:00:48 +0200 Subject: [PATCH 188/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 755ee2a..9494d38 100644 --- a/README.md +++ b/README.md @@ -169,7 +169,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [pyefd](https://github.com/hbldh/pyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series. #### Feature Selection -[Talk](https://www.youtube.com/watch?v=JsArBz46_3s) +[Talk](https://www.youtube.com/watch?v=JsArBz46_3s), [Repo](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection) Blog post series - [1](http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/), [2](http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/), [3](http://blog.datadive.net/selecting-good-features-part-iii-random-forests/), [4](http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/) Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection-with-sklearn), [2](https://machinelearningmastery.com/feature-selection-machine-learning-python/) [sklearn](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection) - Feature selection. From 93ba4e1867b05e841e3802bd3d9d9d457d2e50ce Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 15 Aug 2021 11:19:55 +0200 Subject: [PATCH 189/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9494d38..3acded8 100644 --- a/README.md +++ b/README.md @@ -453,6 +453,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po ##### Graph-Based Neural Networks [How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) [Introduction To Graph Convolutional Networks](http://tkipf.github.io/graph-convolutional-networks/) +[An attempt at demystifying graph deep learning](https://ericmjl.github.io/essays-on-data-science/machine-learning/graph-nets/) [ogb](https://ogb.stanford.edu/) - Open Graph Benchmark, Benchmark datasets. [networkx](https://github.com/networkx/networkx) - Graph library. [cugraph](https://github.com/rapidsai/cugraph) - RAPIDS, Graph library on the GPU. From 5f96e87221fb8912fff1725f11841077a3409647 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 15 Aug 2021 21:25:33 +0200 Subject: [PATCH 190/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 3acded8..89918a5 100644 --- a/README.md +++ b/README.md @@ -745,6 +745,8 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 [hypergraph](https://github.com/aljabr0/hypergraph) - Global optimization methods and hyperparameter optimization. [bbopt](https://github.com/evhub/bbopt) - Black box hyperparameter optimization. [dragonfly](https://github.com/dragonfly/dragonfly) - Scalable Bayesian optimisation. +[botorch](https://github.com/pytorch/botorch) - Bayesian optimization in PyTorch. +[ax](https://github.com/facebook/Ax) - Adaptive Experimentation Platform by Facebook. #### Incremental Learning, Online Learning sklearn - [PassiveAggressiveClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html). From 0a29879e6e06caf42af18194aa42273b0b607109 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 19 Aug 2021 11:52:34 +0200 Subject: [PATCH 191/550] linear-tree --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 89918a5..5098955 100644 --- a/README.md +++ b/README.md @@ -318,7 +318,8 @@ Why the default feature importance for random forests is wrong: [link](http://ex [infiniteboost](https://github.com/arogozhnikov/infiniteboost) - Combination of RFs and GBDTs. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [rrcf](https://github.com/kLabUM/rrcf) - Robust Random Cut Forest algorithm for anomaly detection on streams. -[groot](https://github.com/tudelft-cda-lab/GROOT) - Robust decision trees. +[groot](https://github.com/tudelft-cda-lab/GROOT) - Robust decision trees. +[linear-tree](https://github.com/cerlymarco/linear-tree) - Trees with linear models at the leaves. #### Natural Language Processing (NLP) / Text Processing [talk](https://www.youtube.com/watch?v=6zm9NC9uRkk)-[nb](https://nbviewer.jupyter.org/github/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb), [nb2](https://ahmedbesbes.com/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html), [talk](https://www.youtube.com/watch?time_continue=2&v=sI7VpFNiy_I). From 5d6844e041e57eb4cb3f0558ed3e0470b41ac60f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 19 Aug 2021 17:58:03 +0200 Subject: [PATCH 192/550] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5098955..aa54a0a 100644 --- a/README.md +++ b/README.md @@ -249,7 +249,9 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. [hiplot](https://github.com/facebookresearch/hiplot) - High dimensional Interactive Plotting. [visdom](https://github.com/fossasia/visdom) - Live Visualizations. -[mpl-scatter-density](https://github.com/astrofrog/mpl-scatter-density) - Scatter density plots. Alternative to 2d-histograms. +[mpl-scatter-density](https://github.com/astrofrog/mpl-scatter-density) - Scatter density plots. Alternative to 2d-histograms. +[ComplexHeatmap](https://github.com/jokergoo/ComplexHeatmap) - Complex heatmaps for multidimensional genomic data (R package). +[largeVis](https://github.com/elbamos/largeVis) - Visualize embeddings (t-SNE etc.) (R package). #### Colors [palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). From 02878abb6aa68dba1caaef91e46a68a3e0e11585 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 20 Aug 2021 10:19:54 +0200 Subject: [PATCH 193/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index aa54a0a..1cfdadf 100644 --- a/README.md +++ b/README.md @@ -852,6 +852,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) +[Awesome Public APIs](https://github.com/public-apis/public-apis) [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) [Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science) From 95fc394567467d7a357b89f1ebbb42b1490c1b6f Mon Sep 17 00:00:00 2001 From: PawNep <89253870+PawNep@users.noreply.github.com> Date: Fri, 20 Aug 2021 14:11:23 +0200 Subject: [PATCH 194/550] Update README.md Added Neptune.ai --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1cfdadf..6f6c869 100644 --- a/README.md +++ b/README.md @@ -795,6 +795,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [knockknock](https://github.com/huggingface/knockknock) - Be notified when your training ends. [metaflow](https://github.com/Netflix/metaflow) - Lifecycle Management Tool by Netflix. [cortex](https://github.com/cortexlabs/cortex) - Deploy machine learning models. +[Neptune](https://neptune.ai) - Experiment tracking and model registry built for research and production teams that run a lot of experiments. #### Math and Background [All kinds of math and statistics resources](https://realnotcomplex.com/) From 44a8d21cc2345704a4f43a050cfdc0e1800276d3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 21 Aug 2021 12:04:36 +0200 Subject: [PATCH 195/550] lightly --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 1cfdadf..af67734 100644 --- a/README.md +++ b/README.md @@ -201,15 +201,19 @@ t-SNE - [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifol NeRV - [link](https://github.com/ziyuang/pynerv), [paper](https://www.jmlr.org/papers/volume11/venna10a/venna10a.pdf) MDR - [link](https://github.com/EpistasisLab/scikit-mdr) UMAP - [link](https://github.com/lmcinnes/umap) -Ivis - [link](https://github.com/beringresearch/ivis) +Random Projection - [link](https://scikit-learn.org/stable/modules/random_projection.html) +Ivis - [link](https://github.com/beringresearch/ivis) +SimCLR - [link](https://github.com/lightly-ai/lightly) ##### Packages [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. +[sklearn.random_projection](https://scikit-learn.org/stable/modules/random_projection.html) - Johnson-Lindenstrauss lemma, Gaussian random projection, Sparse random projection. [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE), [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). +[lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ), [paper](https://www.uncg.edu/mat/faculty/cdsmyth/topological-approaches-skin.pdf). @@ -417,6 +421,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. [pytorch-lightning](https://github.com/PyTorchLightning/PyTorch-lightning) - Wrapper around PyTorch. +[lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. ##### Distributed Libs [flexflow](https://github.com/flexflow/FlexFlow) - Distributed TensorFlow Keras and PyTorch. From a61defc769edec8ddf706e669e05da373141f89d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 21 Aug 2021 15:15:45 +0200 Subject: [PATCH 196/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index af67734..71c091b 100644 --- a/README.md +++ b/README.md @@ -399,6 +399,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Libs [keras](https://keras.io/) - Neural Networks on top of [tensorflow](https://www.tensorflow.org/), [examples](https://gist.github.com/candlewill/552fa102352ccce42fd829ae26277d24). +[timm](https://github.com/rwightman/pytorch-image-models) - Pytorch image models. [keras-contrib](https://github.com/keras-team/keras-contrib) - Keras community contributions. [keras-tuner](https://github.com/keras-team/keras-tuner) - Hyperparameter tuning for Keras. [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt: Convenient hyperparameter optimization wrapper. From 2614fd707efc05675399ceb241fa043b22ff0ca4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 24 Aug 2021 10:33:21 +0200 Subject: [PATCH 197/550] distance functions --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 71c091b..c745ccf 100644 --- a/README.md +++ b/README.md @@ -129,7 +129,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [zEpid](https://github.com/pzivich/zEpid) - Epidemiology analysis package, [Tutorial](https://github.com/pzivich/Python-for-Epidemiologists). - #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). [cleanlab](https://github.com/cgnorthcutt/cleanlab) - Imageing data: Machine learning with noisy labels and finding mislabeled data. @@ -506,6 +505,12 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [deep_metric_learning](https://github.com/ronekko/deep_metric_learning) - Methods for deep metric learning. [ivis](https://bering-ivis.readthedocs.io/en/latest/supervised.html) - Metric learning using siamese neural networks. +#### Distance Functions +[scipy.spatial](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) - All kinds of distance metrics. +[pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance / Wasserstein distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html), [POT implementation](https://pythonot.github.io/auto_examples/plot_OT_2D_samples.html) +[dcor](https://github.com/vnmabus/dcor) - Distance correlation and related Energy statistics. +[GeomLoss](https://www.kernel-operations.io/geomloss/) - Kernel norms, Hausdorff divergences, Debiased Sinkhorn divergences (=approximation of Wasserstein distance). + #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering). [Clustering with Deep Learning: Taxonomy and New Methods](https://arxiv.org/pdf/1801.07648.pdf). From 25f2474ae0cec77e83d43a695348c90f5217c23e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 25 Aug 2021 17:44:05 +0200 Subject: [PATCH 198/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c745ccf..06b97d7 100644 --- a/README.md +++ b/README.md @@ -446,6 +446,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Image Classification [nfnets](https://github.com/ypeleg/nfnets-keras) - Neural network. [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Neural network. +[pycls](https://github.com/facebookresearch/pycls) - Pytorch image classification networks: ResNet, ResNeXt, EfficientNet, and RegNet (by Facebook). ##### Applications and Snippets [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. From 166dcfb1cd68feb360e0289f4c3601965bd29c69 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 26 Aug 2021 20:50:45 +0200 Subject: [PATCH 199/550] esvit --- README.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 06b97d7..554eb48 100644 --- a/README.md +++ b/README.md @@ -183,7 +183,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. -#### Dimensionality Reduction +#### Dimensionality Reduction / Representation Learning ##### Selection Check also the Clustering section for ideas! @@ -204,6 +204,11 @@ Random Projection - [link](https://scikit-learn.org/stable/modules/random_projec Ivis - [link](https://github.com/beringresearch/ivis) SimCLR - [link](https://github.com/lightly-ai/lightly) + +##### Neural-network Based +[lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. +[esvit](https://github.com/microsoft/esvit) - Vision Transformers for Representation Learning (Microsoft). + ##### Packages [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). @@ -212,7 +217,6 @@ SimCLR - [link](https://github.com/lightly-ai/lightly) [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE), [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). -[lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ), [paper](https://www.uncg.edu/mat/faculty/cdsmyth/topological-approaches-skin.pdf). From c877e387a43b0db40986d3fc63cebd9389d5e9f0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 28 Aug 2021 17:56:41 +0200 Subject: [PATCH 200/550] MCML --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 554eb48..a7d4028 100644 --- a/README.md +++ b/README.md @@ -208,6 +208,7 @@ SimCLR - [link](https://github.com/lightly-ai/lightly) ##### Neural-network Based [lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [esvit](https://github.com/microsoft/esvit) - Vision Transformers for Representation Learning (Microsoft). +[MCML](https://github.com/pachterlab/MCML) - Semi-supervised dimensionality reduction of Multi-Class, Multi-Label data (sequencing data) [paper](https://www.biorxiv.org/content/10.1101/2021.08.25.457696v1). ##### Packages From 72fe44156593853ec67fba69a39bd563a932688f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Aug 2021 16:30:52 +0200 Subject: [PATCH 201/550] cellpose --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index a7d4028..ae0b709 100644 --- a/README.md +++ b/README.md @@ -367,6 +367,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). [microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). [cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. +[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. #### Image Processing [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) From e8a231fff878b9c15e3747f8005868cc1e9b6391 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 2 Sep 2021 16:49:35 +0200 Subject: [PATCH 202/550] textdistance --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ae0b709..d5017bd 100644 --- a/README.md +++ b/README.md @@ -351,6 +351,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [stanfordnlp](https://github.com/stanfordnlp/stanfordnlp) - NLP Library. [Chatistics](https://github.com/MasterScrat/Chatistics) - Turn Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames. [textvec](https://github.com/textvec/textvec) - Supervised text vectorization tool. +[textdistance](https://github.com/life4/textdistance) - Collection for comparing distances between two or more sequences. ##### Papers [Search Engine Correlation](https://arxiv.org/pdf/1107.2691.pdf) From 29ad4eba80733cb6dbad5d24e17638337beb308a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Sep 2021 17:09:04 +0200 Subject: [PATCH 203/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index d5017bd..b8fc4ce 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,9 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandaltman.html), [2](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). +##### Comparing Two Populations +[torch-two-sample](https://github.com/josipd/torch-two-sample) - Friedman-Rafsky Test: Compare two population based on a multivariate generalization of the Runstest. [Explanation](https://www.real-statistics.com/multivariate-statistics/multivariate-normal-distribution/friedman-rafsky-test/), [Application](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5014134/) + ##### Interim Analyses / Sequential Analysis / Stopping [Squential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. [Treatment Effects Monitoring](https://online.stat.psu.edu/stat509/node/75/) - Design and Analysis of Clinical Trials PennState. From 230f2090399439954856a1c035e5aa170d0ca11a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 10 Sep 2021 14:43:01 +0200 Subject: [PATCH 204/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b8fc4ce..7366c00 100644 --- a/README.md +++ b/README.md @@ -582,6 +582,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [kats](https://github.com/facebookresearch/kats) - Time series prediction library by Facebook. [prophet](https://github.com/facebook/prophet) - Time series prediction library by Facebook. [pyramid](https://github.com/tgsmith61591/pyramid), [pmdarima](https://github.com/tgsmith61591/pmdarima) - Wrapper for (Auto-) ARIMA. +[modeltime](https://cran.r-project.org/web/packages/modeltime/index.html) - Time series forecasting framework (R package). [pyflux](https://github.com/RJT1990/pyflux) - Time series prediction algorithms (ARIMA, GARCH, GAS, Bayesian). [atspy](https://github.com/firmai/atspy) - Automated Time Series Models. [pm-prophet](https://github.com/luke14free/pm-prophet) - Time series prediction and decomposition library. From 78d72ffe304cf4e682c1455061a20bd04cf1e5cb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 12 Sep 2021 17:19:38 +0200 Subject: [PATCH 205/550] Update README.md --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 7366c00..68644e9 100644 --- a/README.md +++ b/README.md @@ -846,6 +846,9 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks [Datascience Cheatsheets](https://github.com/FavioVazquez/ds-cheatsheets) +##### Guidelines +[datasharing](https://github.com/jtleek/datasharing) - Guide to data sharing. + ##### List of Books [Mat Kelceys list of cool machine learning books](http://matpalm.com/blog/cool_machine_learning_books/) From 9ba6e08c5c2d7b732a0f5c457eb2de3285df88c3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 13 Sep 2021 10:57:36 +0200 Subject: [PATCH 206/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 68644e9..9a9545c 100644 --- a/README.md +++ b/README.md @@ -171,7 +171,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [pyefd](https://github.com/hbldh/pyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series. #### Feature Selection -[Talk](https://www.youtube.com/watch?v=JsArBz46_3s), [Repo](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection) +[Overview Paper](https://www.sciencedirect.com/science/article/pii/S016794731930194X), [Talk](https://www.youtube.com/watch?v=JsArBz46_3s), [Repo](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection) Blog post series - [1](http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/), [2](http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/), [3](http://blog.datadive.net/selecting-good-features-part-iii-random-forests/), [4](http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/) Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection-with-sklearn), [2](https://machinelearningmastery.com/feature-selection-machine-learning-python/) [sklearn](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection) - Feature selection. From 6cfc780462b6814efc383d765443ac24d2e17fab Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Sep 2021 14:52:18 +0200 Subject: [PATCH 207/550] vissl, pytorch section --- README.md | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 9a9545c..d40f977 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,10 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. [zarr](https://github.com/zarr-developers/zarr-python) - Distributed numpy arrays. +#### Distributed Systems +[nextflow](https://github.com/nextflow-io/nextflow) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch and others. +[dsub](https://github.com/DataBiosphere/dsub) - Run batch computing tasks in Docker image in the Google Cloud. + #### Command line tools, CSV [ni](https://github.com/spencertipping/ni) - Command line tool for big data. [xsv](https://github.com/BurntSushi/xsv) - Command line tool for indexing, slicing, analyzing, splitting and joining CSV files. @@ -186,10 +190,11 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. + #### Dimensionality Reduction / Representation Learning ##### Selection -Check also the Clustering section for ideas! +Check also the Clustering section and self-supervised learning section for ideas! [Review](https://members.loria.fr/moberger/Enseignement/AVR/Exposes/TR_Dimensiereductie.pdf) PCA - [link](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) @@ -207,14 +212,11 @@ Random Projection - [link](https://scikit-learn.org/stable/modules/random_projec Ivis - [link](https://github.com/beringresearch/ivis) SimCLR - [link](https://github.com/lightly-ai/lightly) - ##### Neural-network Based -[lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [esvit](https://github.com/microsoft/esvit) - Vision Transformers for Representation Learning (Microsoft). [MCML](https://github.com/pachterlab/MCML) - Semi-supervised dimensionality reduction of Multi-Class, Multi-Label data (sequencing data) [paper](https://www.biorxiv.org/content/10.1101/2021.08.25.457696v1). ##### Packages - [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. [sklearn.random_projection](https://scikit-learn.org/stable/modules/random_projection.html) - Johnson-Lindenstrauss lemma, Gaussian random projection, Sparse random projection. @@ -406,9 +408,8 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. [ctrl](https://github.com/salesforce/ctrl) - Text generation. -##### Libs +##### Libs General [keras](https://keras.io/) - Neural Networks on top of [tensorflow](https://www.tensorflow.org/), [examples](https://gist.github.com/candlewill/552fa102352ccce42fd829ae26277d24). -[timm](https://github.com/rwightman/pytorch-image-models) - Pytorch image models. [keras-contrib](https://github.com/keras-team/keras-contrib) - Keras community contributions. [keras-tuner](https://github.com/keras-team/keras-tuner) - Hyperparameter tuning for Keras. [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt: Convenient hyperparameter optimization wrapper. @@ -416,10 +417,6 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [tflearn](https://github.com/tflearn/tflearn) - Neural Networks on top of tensorflow. [tensorlayer](https://github.com/tensorlayer/tensorlayer) - Neural Networks on top of tensorflow, [tricks](https://github.com/wagamamaz/tensorlayer-tricks). [tensorforce](https://github.com/reinforceio/tensorforce) - Tensorflow for applied reinforcement learning. -[fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. -[pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) - Collection of optimizers for pytorch. -[ignite](https://github.com/pytorch/ignite) - Highlevel library for pytorch. -[skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. [PlotNeuralNet](https://github.com/HarisIqbal88/PlotNeuralNet) - Plot neural networks. [lucid](https://github.com/tensorflow/lucid) - Neural network interpretability, [Activation Maps](https://openai.com/blog/introducing-activation-atlases/). @@ -429,9 +426,18 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Training metrics. [imgclsmob](https://github.com/osmr/imgclsmob) - Pretrained models. [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. + +##### Libs Pytorch +[skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). +[fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. +[timm](https://github.com/rwightman/pytorch-image-models) - Pytorch image models. +[ignite](https://github.com/pytorch/ignite) - Highlevel library for pytorch. [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. +[pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) - Collection of optimizers for pytorch. [pytorch-lightning](https://github.com/PyTorchLightning/PyTorch-lightning) - Wrapper around PyTorch. [lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. +[MONAI](https://github.com/project-monai/monai) - Deep learning in healthcare imaging. +[kornia](https://github.com/kornia/kornia) - Image transformations, epipolar geometry, depth estimation. ##### Distributed Libs [flexflow](https://github.com/flexflow/FlexFlow) - Distributed TensorFlow Keras and PyTorch. @@ -522,6 +528,10 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [dcor](https://github.com/vnmabus/dcor) - Distance correlation and related Energy statistics. [GeomLoss](https://www.kernel-operations.io/geomloss/) - Kernel norms, Hausdorff divergences, Debiased Sinkhorn divergences (=approximation of Wasserstein distance). +#### Self-supervised Learning +[lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. +[vissl](https://github.com/facebookresearch/vissl) - Self-Supervised Learning with PyTorch: RotNet, Jigsaw, NPID, ClusterFit, PIRL, SimCLR, MoCo, DeepCluster, SwAV. + #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering). [Clustering with Deep Learning: Taxonomy and New Methods](https://arxiv.org/pdf/1801.07648.pdf). From 61b4c10ef49f2462eb1340255640bcda407736ef Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Sep 2021 15:00:00 +0200 Subject: [PATCH 208/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d40f977..86c779c 100644 --- a/README.md +++ b/README.md @@ -521,6 +521,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [pytorch-metric-learning](https://github.com/KevinMusgrave/pytorch-metric-learning) - Pytorch metric learning. [deep_metric_learning](https://github.com/ronekko/deep_metric_learning) - Methods for deep metric learning. [ivis](https://bering-ivis.readthedocs.io/en/latest/supervised.html) - Metric learning using siamese neural networks. +[tensorflow similarity](https://github.com/tensorflow/similarity) - Metric learning. #### Distance Functions [scipy.spatial](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) - All kinds of distance metrics. From 6afff97135aac64626f090def4ab8e4c94978ea1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Sep 2021 15:03:57 +0200 Subject: [PATCH 209/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bf1f41f..348bbfa 100644 --- a/README.md +++ b/README.md @@ -829,7 +829,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [knockknock](https://github.com/huggingface/knockknock) - Be notified when your training ends. [metaflow](https://github.com/Netflix/metaflow) - Lifecycle Management Tool by Netflix. [cortex](https://github.com/cortexlabs/cortex) - Deploy machine learning models. -[Neptune](https://neptune.ai) - Experiment tracking and model registry built for research and production teams that run a lot of experiments. +[Neptune](https://neptune.ai) - Experiment tracking and model registry. #### Math and Background [All kinds of math and statistics resources](https://realnotcomplex.com/) From 6f59e5235556c09a70d2e4e3449ec2d42f040c62 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 27 Sep 2021 13:03:32 +0200 Subject: [PATCH 210/550] GAN update --- README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 348bbfa..9077e26 100644 --- a/README.md +++ b/README.md @@ -465,16 +465,22 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [pycls](https://github.com/facebookresearch/pycls) - Pytorch image classification networks: ResNet, ResNeXt, EfficientNet, and RegNet (by Facebook). ##### Applications and Snippets -[CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. [SPADE](https://github.com/nvlabs/spade) - Semantic Image Synthesis. [Entity Embeddings of Categorical Variables](https://arxiv.org/abs/1604.06737), [code](https://github.com/entron/entity-embedding-rossmann), [kaggle](https://www.kaggle.com/aquatic/entity-embedding-neural-net/code) [Image Super-Resolution](https://github.com/idealo/image-super-resolution) - Super-scaling using a Residual Dense Network. Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Posts: [1](https://www.thomasjpfan.com/2018/07/nuclei-image-segmentation-tutorial/), [2](https://www.thomasjpfan.com/2017/08/hassle-free-unets/) [deeplearning-models](https://github.com/rasbt/deeplearning-models) - Deep learning models. -##### Variational Autoencoders (VAE) +##### Variational Autoencoders (VAEs) [disentanglement_lib](https://github.com/google-research/disentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE. +##### Generative Adversarial Networks (GANs) +[Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) +[The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks. +[CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. +[Tensorflow GAN implementations](https://github.com/hwalsuklee/tensorflow-generative-model-collections). +[Pytorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder). + ##### Graph-Based Neural Networks [How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) [Introduction To Graph Convolutional Networks](http://tkipf.github.io/graph-convolutional-networks/) @@ -855,7 +861,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Machine Learning Videos](https://github.com/dustinvtran/ml-videos) [Data Science Notebooks](https://github.com/donnemartin/data-science-ipython-notebooks) [Recommender Systems (Microsoft)](https://github.com/Microsoft/Recommenders) -[The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks [Datascience Cheatsheets](https://github.com/FavioVazquez/ds-cheatsheets) ##### Guidelines From 43fa218604f45f5a7e0acc989364ce3fe0ca0842 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 28 Sep 2021 02:20:34 +0200 Subject: [PATCH 211/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9077e26..0cb2785 100644 --- a/README.md +++ b/README.md @@ -479,6 +479,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks. [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. [Tensorflow GAN implementations](https://github.com/hwalsuklee/tensorflow-generative-model-collections). +[Pytorch GAN implementations](https://github.com/znxlwm/pytorch-generative-model-collections). [Pytorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder). ##### Graph-Based Neural Networks From 98fdcec13a922d4027b811529e44842d4d1a9be5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 28 Sep 2021 13:02:51 +0200 Subject: [PATCH 212/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0cb2785..1add317 100644 --- a/README.md +++ b/README.md @@ -312,7 +312,7 @@ Examples: [1](https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and- [pywFM](https://github.com/jfloff/pywFM) - Factorization. #### Decision Tree Models -[Intro to Decision Trees and Random Forests](https://victorzhou.com/blog/intro-to-random-forests/), [Intro to Gradient Boosting](http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/) +[Intro to Decision Trees and Random Forests](https://victorzhou.com/blog/intro-to-random-forests/), Intro to Gradient Boosting [1](https://explained.ai/gradient-boosting/), [2](https://www.gormanalysis.com/blog/gradient-boosting-explained/), [Decision Tree Visualization](https://explained.ai/decision-tree-viz/index.html) [lightgbm](https://github.com/Microsoft/LightGBM) - Gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, [doc](https://sites.google.com/view/lauraepp/parameters). [xgboost](https://github.com/dmlc/xgboost) - Gradient boosting (GBDT, GBRT or GBM) library, [doc](https://sites.google.com/view/lauraepp/parameters), Methods for CIs: [link1](https://stats.stackexchange.com/questions/255783/confidence-interval-for-xgb-forecast), [link2](https://towardsdatascience.com/regression-prediction-intervals-with-xgboost-428e0a018b). [catboost](https://github.com/catboost/catboost) - Gradient boosting. From 2098ffa1cd9aff4d3429a8c651e32144e98c30ea Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 30 Sep 2021 01:20:23 +0200 Subject: [PATCH 213/550] NVTabular --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 1add317..ab6b910 100644 --- a/README.md +++ b/README.md @@ -72,6 +72,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. [zarr](https://github.com/zarr-developers/zarr-python) - Distributed numpy arrays. +[NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. #### Distributed Systems [nextflow](https://github.com/nextflow-io/nextflow) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch and others. @@ -167,6 +168,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. [feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. +[NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. ##### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From 5658d4c03055a1b5695d74c1cc65cbf25e7104f3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 30 Sep 2021 01:24:43 +0200 Subject: [PATCH 214/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ab6b910..b48f755 100644 --- a/README.md +++ b/README.md @@ -796,7 +796,7 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 #### Incremental Learning, Online Learning sklearn - [PassiveAggressiveClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html). -[creme-ml](https://github.com/creme-ml/creme) - Incremental learning framework, [talk](https://www.youtube.com/watch?v=P3M6dt7bY9U). +[river](https://github.com/online-ml/river) - Online machine learning. [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Online Learning algorithms. [onelearn](https://github.com/onelearn/onelearn) - Online Random Forests. From dfe3389b6ae55206d0e458ee0f1794fcf4147655 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 30 Sep 2021 12:01:40 +0200 Subject: [PATCH 215/550] Awesome Industry Machine Learning --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b48f755..d543ccd 100644 --- a/README.md +++ b/README.md @@ -890,6 +890,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Fraud Detection](https://github.com/benedekrozemberczki/awesome-fraud-detection-papers) [Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) [Awesome Graph Classification](https://github.com/benedekrozemberczki/awesome-graph-classification) +[Awesome Industry Machine Learning](https://github.com/firmai/industry-machine-learning) [Awesome Gradient Boosting](https://github.com/benedekrozemberczki/awesome-gradient-boosting-papers) [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) From 4160d62f42ec1309a7d61472269c8a51977a5832 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 4 Oct 2021 15:48:17 +0200 Subject: [PATCH 216/550] causallib --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index d543ccd..ecc35ec 100644 --- a/README.md +++ b/README.md @@ -689,6 +689,9 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Scoring [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. +#### Causal Inference +[causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). + #### Probabilistic Modeling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) [PyMC3](https://docs.pymc.io/) - Baysian modelling, [intro](https://docs.pymc.io/notebooks/getting_started) From 45fb103285fd023555f81694a81fb9fffcb18333 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 11 Oct 2021 16:57:01 +0200 Subject: [PATCH 217/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index ecc35ec..4127b03 100644 --- a/README.md +++ b/README.md @@ -430,6 +430,8 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. ##### Libs Pytorch +[Good Pytorch Introduction](https://cs230.stanford.edu/blog/pytorch/) + [skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). [fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. [timm](https://github.com/rwightman/pytorch-image-models) - Pytorch image models. From cd7511c61c419fc03334cc75f24eada966eb60fd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 29 Oct 2021 09:34:26 +0200 Subject: [PATCH 218/550] apricot --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 4127b03..28a3de2 100644 --- a/README.md +++ b/README.md @@ -192,6 +192,8 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. +#### Subset Selection +[apricot](https://github.com/jmschrei/apricot) - Selecting subsets of data sets to train machine learning models quickly. #### Dimensionality Reduction / Representation Learning From 871882cdacaaa74636e6ded7da80635334f093e7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 29 Oct 2021 11:06:36 +0200 Subject: [PATCH 219/550] lux --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 28a3de2..99a74f1 100644 --- a/README.md +++ b/README.md @@ -42,6 +42,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. +[lux](https://github.com/lux-org/lux) - Dataframe visualization within Jupyter. #### Helpful [drawdata](https://github.com/koaning/drawdata) - Quickly draw some points and export them as csv, [website](https://drawdata.xyz/). From c11c9870a557d401b2c76e7e3b5e536b56e0c65e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 29 Oct 2021 11:16:26 +0200 Subject: [PATCH 220/550] Causal Inference update --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 99a74f1..9a45c79 100644 --- a/README.md +++ b/README.md @@ -625,7 +625,6 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor [pastas](https://pastas.readthedocs.io/en/latest/examples.html) - Simulation of time series. [fastdtw](https://github.com/slaypni/fastdtw) - Dynamic Time Warp Distance. [fable](https://www.rdocumentation.org/packages/fable/versions/0.0.0.9000) - Time Series Forecasting (R package). -[CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). [pydlm](https://github.com/wwrechard/pydlm) - Bayesian time series modeling ([R package](https://cran.r-project.org/web/packages/bsts/index.html), [Blog post](http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html)) [PyAF](https://github.com/antoinecarme/pyaf) - Automatic Time Series Forecasting. [luminol](https://github.com/linkedin/luminol) - Anomaly Detection and Correlation library from Linkedin. @@ -695,7 +694,11 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference -[causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). +[CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). +[causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). +[causalml](https://github.com/uber/causalml) - Causal inference by Uber. +[upliftml](https://github.com/bookingcom/upliftml) - Causal inference by Booking.com. +[EconML](https://github.com/microsoft/EconML) - Heterogeneous Treatment Effects Estimation by Microsoft. #### Probabilistic Modeling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) @@ -757,7 +760,6 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [innvestigate](https://github.com/albermax/innvestigate) - A toolbox to investigate neural network predictions. [dalex](https://github.com/pbiecek/DALEX) - Explanations for ML models (R package). [interpret](https://github.com/microsoft/interpret) - Fit interpretable models, explain models (Microsoft). -[causalml](https://github.com/uber/causalml) - Causal inference by Uber. #### Automated Machine Learning [AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on tensorflow. From 0b3e33050de9fd34cdaa57a2d9dea871f917e4bc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 1 Nov 2021 18:46:57 +0100 Subject: [PATCH 221/550] scikit-learn-intelex --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 9a45c79..b359668 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,9 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. [lux](https://github.com/lux-org/lux) - Dataframe visualization within Jupyter. +#### Scikit-Learn Alternatives +[scikit-learn-intelex](https://github.com/intel/scikit-learn-intelex) - Intel extension for scikit-learn for speed. + #### Helpful [drawdata](https://github.com/koaning/drawdata) - Quickly draw some points and export them as csv, [website](https://drawdata.xyz/). [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). From f7ca594cdff350a97b3e72840bff3212c8e914cc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 4 Nov 2021 14:26:30 +0100 Subject: [PATCH 222/550] geomstats --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index b359668..6d28cd5 100644 --- a/README.md +++ b/README.md @@ -605,6 +605,9 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Interactive Tool](https://fiiir.com/) for FIR and IIR filters, [Examples](https://plot.ly/python/fft-filters/). [filterpy](https://github.com/rlabbe/filterpy) - Kalman filtering and optimal estimation library. +#### Geometry +[geomstats](https://github.com/geomstats/geomstats) - Computations and statistics on manifolds with geometric structures. + #### Time Series [statsmodels](https://www.statsmodels.org/dev/tsa.html) - Time series analysis, [seasonal decompose](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) [example](https://gist.github.com/balzer82/5cec6ad7adc1b550e7ee), [SARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html), [granger causality](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.grangercausalitytests.html). [kats](https://github.com/facebookresearch/kats) - Time series prediction library by Facebook. From 65329d591923f436e4257121a0d29ecbee9f75a3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 5 Nov 2021 14:07:37 +0100 Subject: [PATCH 223/550] SubTab --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6d28cd5..93a7e08 100644 --- a/README.md +++ b/README.md @@ -195,6 +195,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/) - Exhaustive feature selection. [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. +[SubTab](https://github.com/AstraZeneca/SubTab) - Subsetting Features of Tabular Data for Self-Supervised Representation Learning, AstraZeneca. #### Subset Selection [apricot](https://github.com/jmschrei/apricot) - Selecting subsets of data sets to train machine learning models quickly. From 51e059046bf9617216ae3b42e58fe434ace53180 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 5 Nov 2021 17:22:48 +0100 Subject: [PATCH 224/550] HypHC --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 93a7e08..96996b6 100644 --- a/README.md +++ b/README.md @@ -565,6 +565,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. [distribution_clustering](https://github.com/EricElmoznino/distribution_clustering), [paper](https://arxiv.org/abs/1804.02624), [related paper](https://arxiv.org/abs/2003.07770), [alt](https://github.com/r0f1/distribution_clustering). [phenograph](https://github.com/dpeerlab/phenograph) - Clustering by community detection. +[HypHC](https://github.com/HazyResearch/HypHC) - Hyperbolic Hierarchical Clustering. ##### Clustering Evalutation [Wagner, Wagner - Comparing Clusterings - An Overview](https://publikationen.bibliothek.kit.edu/1000011477/812079) From 607f2d06edb69e23c0425cc652c7d5d1ffdca703 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 8 Nov 2021 13:19:39 +0100 Subject: [PATCH 225/550] Update README.md --- README.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 96996b6..20f5a85 100644 --- a/README.md +++ b/README.md @@ -174,7 +174,13 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. [NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. -##### Feature Engineering Images +#### Image Cleanup +[Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. +[cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. +[BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. +[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Restoration of fluorescence microscopy images. + +#### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. [mahotas](https://github.com/luispedro/mahotas) - Zernike, Haralick, LBP, and TAS features. [pyradiomics](https://github.com/AIM-Harvard/pyradiomics) - Radiomics features from medical imaging. @@ -524,6 +530,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [pygam](https://github.com/dswah/pyGAM) - Generalized Additive Models (GAMs), [Explanation](https://multithreaded.stitchfix.com/blog/2015/07/30/gam/). [GLRM](https://github.com/madeleineudell/LowRankModels.jl) - Generalized Low Rank Models. [tweedie](https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tweedie-regression-objective-reg-tweedie) - Specialized distribution for zero inflated targets, [Talk](https://www.youtube.com/watch?v=-o0lpHBq85I). +[MAPIE](https://github.com/scikit-learn-contrib/MAPIE) - Estimating prediction intervals. #### Classification [Talk](https://www.youtube.com/watch?v=DkLPYccEJ8Y), [Notebook](https://github.com/ianozsvald/data_science_delivered/blob/master/ml_creating_correct_capable_classifiers.ipynb) From 8e31824af8b10147d9f6871b9894e24a1f03ec00 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 8 Nov 2021 13:58:27 +0100 Subject: [PATCH 226/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 20f5a85..54d8bfa 100644 --- a/README.md +++ b/README.md @@ -489,6 +489,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [deeplearning-models](https://github.com/rasbt/deeplearning-models) - Deep learning models. ##### Variational Autoencoders (VAEs) +[Variational Autoencoder Explanation Video](https://www.youtube.com/watch?v=9zKuYvjFFS8) [disentanglement_lib](https://github.com/google-research/disentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE. ##### Generative Adversarial Networks (GANs) From 68caacd8fc4b0764643d99a8cff119f3b33c4e3b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 9 Nov 2021 14:36:31 +0100 Subject: [PATCH 227/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 54d8bfa..5cd1a56 100644 --- a/README.md +++ b/README.md @@ -178,7 +178,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. -[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Restoration of fluorescence microscopy images. +CSBDeep [Project page](https://csbdeep.bioimagecomputing.com/tools/), [Python package](https://github.com/CSBDeep/CSBDeep) - Image restoration and object detection of fluorescence microscopy images. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From efe71b53100814b48962d730d78ea07c36c8437d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 9 Nov 2021 14:56:50 +0100 Subject: [PATCH 228/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5cd1a56..c227d77 100644 --- a/README.md +++ b/README.md @@ -178,7 +178,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. -CSBDeep [Project page](https://csbdeep.bioimagecomputing.com/tools/), [Python package](https://github.com/CSBDeep/CSBDeep) - Image restoration and object detection of fluorescence microscopy images. +[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From 1ba440c6cd70cc5029ef148047ae5d67d3116c36 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 9 Nov 2021 17:27:11 +0100 Subject: [PATCH 229/550] aydin --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c227d77..60098e7 100644 --- a/README.md +++ b/README.md @@ -178,7 +178,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. -[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). +[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). +[aydin](https://github.com/royerlab/aydin) - Image denoising. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From bda64791566274a1b12037965508be00bbafdbd0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 Nov 2021 12:04:54 +0100 Subject: [PATCH 230/550] Update README.md --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 60098e7..861f600 100644 --- a/README.md +++ b/README.md @@ -179,6 +179,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). +[DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [aydin](https://github.com/royerlab/aydin) - Image denoising. #### Feature Engineering Images @@ -492,6 +493,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po ##### Variational Autoencoders (VAEs) [Variational Autoencoder Explanation Video](https://www.youtube.com/watch?v=9zKuYvjFFS8) [disentanglement_lib](https://github.com/google-research/disentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE. +[ladder-vae-pytorch](https://github.com/addtt/ladder-vae-pytorch) - Ladder Variational Autoencoders (LVAE). ##### Generative Adversarial Networks (GANs) [Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) @@ -691,7 +693,8 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [survivalstan](https://github.com/hammerlab/survivalstan) - Survival analysis, [intro](http://www.hammerlab.org/2017/06/26/introducing-survivalstan/). [convoys](https://github.com/better/convoys) - Analyze time lagged conversions. RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests). -[pysurvival](https://github.com/square/pysurvival) - Survival analysis . +[pysurvival](https://github.com/square/pysurvival) - Survival analysis. +[DeepSurvivalMachines](https://github.com/autonlab/DeepSurvivalMachines) - Fully Parametric Survival Regression. #### Outlier Detection & Anomaly Detection [sklearn](https://scikit-learn.org/stable/modules/outlier_detection.html) - Isolation Forest and others. From 0bdd19d72ceb43b1e9f907f879038bcc0d32473c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 Nov 2021 10:18:22 +0100 Subject: [PATCH 231/550] Awesome Learning with Label Noise --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 861f600..c7f2c4d 100644 --- a/README.md +++ b/README.md @@ -143,7 +143,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). -[cleanlab](https://github.com/cgnorthcutt/cleanlab) - Imageing data: Machine learning with noisy labels and finding mislabeled data. +[cleanlab](https://github.com/cgnorthcutt/cleanlab) - Imageing data: Machine learning with noisy labels and finding mislabeled data. Also see awesome list below. [pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [impyute](https://github.com/eltonlaw/impyute) - Imputations. @@ -922,6 +922,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Graph Classification](https://github.com/benedekrozemberczki/awesome-graph-classification) [Awesome Industry Machine Learning](https://github.com/firmai/industry-machine-learning) [Awesome Gradient Boosting](https://github.com/benedekrozemberczki/awesome-gradient-boosting-papers) +[Awesome Learning with Label Noise](https://github.com/subeeshvasu/Awesome-Learning-with-Label-Noise) [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) From 7c55a581c5652556eabc2f90161e3bdf3cd1b941 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 Nov 2021 14:45:20 +0100 Subject: [PATCH 232/550] contrastive --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c7f2c4d..0136ae2 100644 --- a/README.md +++ b/README.md @@ -249,6 +249,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [scanpy](https://github.com/theislab/scanpy) - [Force-directed graph drawing](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.draw_graph.html#scanpy.tl.draw_graph), [Diffusion Maps](https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.diffmap.html). [direpack](https://github.com/SvenSerneels/direpack) - Projection pursuit, Sufficient dimension reduction, Robust M-estimators. [DBS](https://cran.r-project.org/web/packages/DatabionicSwarm/vignettes/DatabionicSwarm.html) - DatabionicSwarm (R package). +[contrastive](https://github.com/abidlabs/contrastive) - Contrastive PCA. #### Training-related [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. From c9a55c892e199133a7a04aea07735d303036cec8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 Nov 2021 19:28:48 +0100 Subject: [PATCH 233/550] Update README.md --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0136ae2..4a459bf 100644 --- a/README.md +++ b/README.md @@ -500,9 +500,10 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) [The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks. [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. -[Tensorflow GAN implementations](https://github.com/hwalsuklee/tensorflow-generative-model-collections). -[Pytorch GAN implementations](https://github.com/znxlwm/pytorch-generative-model-collections). -[Pytorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder). +[Tensorflow GAN implementations](https://github.com/hwalsuklee/tensorflow-generative-model-collections) +[Pytorch GAN implementations](https://github.com/znxlwm/pytorch-generative-model-collections) +[Pytorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder) +[GAN implementations](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) ##### Graph-Based Neural Networks [How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) From 92368b7ec84be0d62297faa8b3f3ade2ec043bad Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 Nov 2021 19:30:03 +0100 Subject: [PATCH 234/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4a459bf..effc445 100644 --- a/README.md +++ b/README.md @@ -503,7 +503,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [Tensorflow GAN implementations](https://github.com/hwalsuklee/tensorflow-generative-model-collections) [Pytorch GAN implementations](https://github.com/znxlwm/pytorch-generative-model-collections) [Pytorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder) -[GAN implementations](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) +[StudioGAN](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) - Pytorch GAN implementations. ##### Graph-Based Neural Networks [How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) From 5e901435423ae3a0722142058c5d571dd5fffdc8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 15 Nov 2021 19:20:04 +0100 Subject: [PATCH 235/550] Update README.md --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index effc445..8157a91 100644 --- a/README.md +++ b/README.md @@ -401,7 +401,6 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www #### Neural Networks ##### Tutorials & Viewer -[Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) fast.ai course - [Lessons 1-7](https://course.fast.ai/videos/?lesson=1), [Lessons 8-14](http://course18.fast.ai/lessons/lessons2.html) [Tensorflow without a PhD](https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd) - Neural Network course by Google. Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [PPT](http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture12.pdf) @@ -944,6 +943,10 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Time Series](https://github.com/MaxBenChrist/awesome_time_series_in_python) [Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) +#### Lectures +[Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. +[NYU Deep Learning SP21](https://www.youtube.com/playlist?list=PLLHTzKZzVU9e6xUfG10TkTWApKSZCzuBI) - Youtube Playlist. + #### Things I google a lot [Color codes](https://github.com/d3/d3-3.x-api-reference/blob/master/Ordinal-Scales.md#categorical-colors) [Frequency codes for time series](https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) From 386b490f9586bbf9a28dcbcf3d2f0c7df29de1eb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 16 Nov 2021 09:30:00 +0100 Subject: [PATCH 236/550] Awesome Visual Attentions --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8157a91..cb26c6e 100644 --- a/README.md +++ b/README.md @@ -941,7 +941,8 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Semantic Segmentation](https://github.com/mrgloom/awesome-semantic-segmentation) [Awesome Sentence Embedding](https://github.com/Separius/awesome-sentence-embedding) [Awesome Time Series](https://github.com/MaxBenChrist/awesome_time_series_in_python) -[Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) +[Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) +[Awesome Visual Attentions](https://github.com/MenghaoGuo/Awesome-Vision-Attentions) #### Lectures [Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. From 2ac4033171f91e8f055fe12f2834a0f75769f1cb Mon Sep 17 00:00:00 2001 From: Harry Biddle Date: Tue, 16 Nov 2021 13:45:57 +0100 Subject: [PATCH 237/550] Add python record linkage toolkit --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index cb26c6e..6b50520 100644 --- a/README.md +++ b/README.md @@ -882,6 +882,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [unyt](https://github.com/yt-project/unyt) - Working with units. [scrapy](https://github.com/scrapy/scrapy) - Web scraping library. [VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit) - ML Toolkit from Microsoft. +[Python Record Linkage Toolkit](https://github.com/J535D165/recordlinkage) - link records in or between data sources. #### General Python Programming [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. From 3def69440f55904cbf90a99e4616d402ac61b5fe Mon Sep 17 00:00:00 2001 From: charsmith Date: Wed, 17 Nov 2021 10:35:06 +0000 Subject: [PATCH 238/550] Add notebooker and dtale --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 6b50520..1cf9c89 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [debugger](https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559) - Visual debugger for Jupyter. [nbcommands](https://github.com/vinayak-mehta/nbcommands) - View and search notebooks from terminal. [handcalcs](https://github.com/connorferster/handcalcs) - More convenient way of writing mathematical equations in Jupyter. +[notebooker](https://github.com/man-group/notebooker) - Productionize and schedule your Jupyter Notebooks as easily as you wrote them #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) @@ -43,6 +44,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. [lux](https://github.com/lux-org/lux) - Dataframe visualization within Jupyter. +[dtale](https://github.com/man-group/dtale) - An easy way to view and analyze Pandas data structures, integrating seamlessly with Jupyter. #### Scikit-Learn Alternatives [scikit-learn-intelex](https://github.com/intel/scikit-learn-intelex) - Intel extension for scikit-learn for speed. From e75b10ffbd963da7e356ecb45177b36f39824b4e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 17 Nov 2021 15:30:26 +0100 Subject: [PATCH 239/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1cf9c89..21acfc2 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [debugger](https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559) - Visual debugger for Jupyter. [nbcommands](https://github.com/vinayak-mehta/nbcommands) - View and search notebooks from terminal. [handcalcs](https://github.com/connorferster/handcalcs) - More convenient way of writing mathematical equations in Jupyter. -[notebooker](https://github.com/man-group/notebooker) - Productionize and schedule your Jupyter Notebooks as easily as you wrote them +[notebooker](https://github.com/man-group/notebooker) - Productionize and schedule Jupyter Notebooks. #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) @@ -44,7 +44,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. [lux](https://github.com/lux-org/lux) - Dataframe visualization within Jupyter. -[dtale](https://github.com/man-group/dtale) - An easy way to view and analyze Pandas data structures, integrating seamlessly with Jupyter. +[dtale](https://github.com/man-group/dtale) - View and analyze Pandas data structures, integrating with Jupyter. #### Scikit-Learn Alternatives [scikit-learn-intelex](https://github.com/intel/scikit-learn-intelex) - Intel extension for scikit-learn for speed. From 1f8a57460f5301643f10c03f90e8cecef32240c1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 22 Nov 2021 14:48:00 +0100 Subject: [PATCH 240/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 21acfc2..b2dff1c 100644 --- a/README.md +++ b/README.md @@ -580,6 +580,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [distribution_clustering](https://github.com/EricElmoznino/distribution_clustering), [paper](https://arxiv.org/abs/1804.02624), [related paper](https://arxiv.org/abs/2003.07770), [alt](https://github.com/r0f1/distribution_clustering). [phenograph](https://github.com/dpeerlab/phenograph) - Clustering by community detection. [HypHC](https://github.com/HazyResearch/HypHC) - Hyperbolic Hierarchical Clustering. +[BanditPAM](https://github.com/ThrunGroup/BanditPAM) - Improved k-Medoids Clustering. ##### Clustering Evalutation [Wagner, Wagner - Comparing Clusterings - An Overview](https://publikationen.bibliothek.kit.edu/1000011477/812079) From d8f1b2540bff7780ba4bec28b671c7cc0e96bb49 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 25 Nov 2021 09:47:29 +0100 Subject: [PATCH 241/550] doubtlab --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b2dff1c..e488f08 100644 --- a/README.md +++ b/README.md @@ -145,7 +145,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). -[cleanlab](https://github.com/cgnorthcutt/cleanlab) - Imageing data: Machine learning with noisy labels and finding mislabeled data. Also see awesome list below. [pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. [impyute](https://github.com/eltonlaw/impyute) - Imputations. @@ -157,6 +156,10 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance / Wasserstein distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html), [POT implementation](https://pythonot.github.io/auto_examples/plot_OT_2D_samples.html) [littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. +#### Noisy Labels +[cleanlab](https://github.com/cleanlab/cleanlab) - Machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Also see awesome list below. +[doubtlab](https://github.com/koaning/doubtlab) - Find bad or noisy labels. + #### Train / Test Split [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Stratification of multilabel data. From 3a287307b58246566343d5b838fb4e60b48fca9b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 29 Nov 2021 13:21:35 +0100 Subject: [PATCH 242/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e488f08..a1875fd 100644 --- a/README.md +++ b/README.md @@ -397,6 +397,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). [cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. +[ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. #### Image Processing [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) From e4e2caff4f735cd4eb079a53893cdcd7cc048cf8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 30 Nov 2021 18:37:49 +0100 Subject: [PATCH 243/550] polars --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index a1875fd..d43c537 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. [lux](https://github.com/lux-org/lux) - Dataframe visualization within Jupyter. [dtale](https://github.com/man-group/dtale) - View and analyze Pandas data structures, integrating with Jupyter. +[polars](https://github.com/pola-rs/polars) - Multi-threaded alternative to pandas. #### Scikit-Learn Alternatives [scikit-learn-intelex](https://github.com/intel/scikit-learn-intelex) - Intel extension for scikit-learn for speed. From b1fb936ab3309abea357701d6d55fd132a1b4c1f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Dec 2021 16:44:47 +0100 Subject: [PATCH 244/550] Intro to semi-supervised learning --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index d43c537..12c790e 100644 --- a/README.md +++ b/README.md @@ -235,7 +235,7 @@ Random Projection - [link](https://scikit-learn.org/stable/modules/random_projec Ivis - [link](https://github.com/beringresearch/ivis) SimCLR - [link](https://github.com/lightly-ai/lightly) -##### Neural-network Based +##### Neural-network based [esvit](https://github.com/microsoft/esvit) - Vision Transformers for Representation Learning (Microsoft). [MCML](https://github.com/pachterlab/MCML) - Semi-supervised dimensionality reduction of Multi-Class, Multi-Label data (sequencing data) [paper](https://www.biorxiv.org/content/10.1101/2021.08.25.457696v1). @@ -406,6 +406,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. #### Neural Networks +[Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html) ##### Tutorials & Viewer fast.ai course - [Lessons 1-7](https://course.fast.ai/videos/?lesson=1), [Lessons 8-14](http://course18.fast.ai/lessons/lessons2.html) From a01506d5400d8eb1e16a0718627b3a07d51da80b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Dec 2021 18:33:48 +0100 Subject: [PATCH 245/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 12c790e..9b5e7c1 100644 --- a/README.md +++ b/README.md @@ -406,6 +406,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. #### Neural Networks +[Great Gradient Descent Article](https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9) [Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html) ##### Tutorials & Viewer From 60b16bed09f683930dbcf4f7aac0146a9bae7c24 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Dec 2021 18:41:20 +0100 Subject: [PATCH 246/550] Mode Median Mean and Linear Regression Articles --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 9b5e7c1..9c9c2e4 100644 --- a/README.md +++ b/README.md @@ -96,6 +96,8 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Classical Statistics ##### Statistical Tests and Packages +[Modes, Medians and Means: A Unifying Perspective](https://www.johnmyleswhite.com/notebook/2013/03/22/modes-medians-and-means-an-unifying-perspective/) +[Using Norms to Understand Linear Regression](https://www.johnmyleswhite.com/notebook/2013/03/22/using-norms-to-understand-linear-regression/) [Verifying the Assumptions of Linear Models](https://github.com/erykml/medium_articles/blob/master/Statistics/linear_regression_assumptions.ipynb) [Mediation and Moderation Intro](https://ademos.people.uic.edu/Chapter14.html) [statsmodels](https://www.statsmodels.org/stable/index.html) - Statistical tests. From f98d21a1ead0b7a2d47cd0e90dc2df55d4bee562 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Dec 2021 19:20:14 +0100 Subject: [PATCH 247/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9c9c2e4..24ec3fb 100644 --- a/README.md +++ b/README.md @@ -627,6 +627,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach #### Signal Processing and Filtering [Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). [The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html). +[Kalman Filter article](https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures). [Kalman Filter book](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Baysian and various Kalman filters. [Interactive Tool](https://fiiir.com/) for FIR and IIR filters, [Examples](https://plot.ly/python/fft-filters/). [filterpy](https://github.com/rlabbe/filterpy) - Kalman filtering and optimal estimation library. From cfe947dbd4d4504f8815aaa59e9a9badfc43f86e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 9 Dec 2021 16:49:02 +0100 Subject: [PATCH 248/550] clearml --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 24ec3fb..e970aa3 100644 --- a/README.md +++ b/README.md @@ -883,6 +883,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [metaflow](https://github.com/Netflix/metaflow) - Lifecycle Management Tool by Netflix. [cortex](https://github.com/cortexlabs/cortex) - Deploy machine learning models. [Neptune](https://neptune.ai) - Experiment tracking and model registry. +[clearml](https://github.com/allegroai/clearml) - Experiment Manager, MLOps and Data-Management. #### Math and Background [All kinds of math and statistics resources](https://realnotcomplex.com/) From 65dfebeb5676c5a9f99941e9dc4c878011090f3f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 9 Dec 2021 16:50:01 +0100 Subject: [PATCH 249/550] Shap article --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e970aa3..4b3bdd2 100644 --- a/README.md +++ b/README.md @@ -773,7 +773,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin #### Model Explanation, Interpretability, Feature Importance [Book](https://christophm.github.io/interpretable-ml-book/agnostic.html), [Examples](https://github.com/jphall663/interpretable_machine_learning_with_python) -[shap](https://github.com/slundberg/shap) - Explain predictions of machine learning models, [talk](https://www.youtube.com/watch?v=C80SQe16Rao). +[shap](https://github.com/slundberg/shap) - Explain predictions of machine learning models, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Good Shap intro](https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/). [treeinterpreter](https://github.com/andosa/treeinterpreter) - Interpreting scikit-learn's decision tree and random forest predictions. [lime](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Warning (Myth 7)](https://crazyoscarchang.github.io/2019/02/16/seven-myths-in-machine-learning-research/). [lime_xgboost](https://github.com/jphall663/lime_xgboost) - Create LIMEs for XGBoost. From e0f9a266e6cbddc5f2ba11503bc407f0e75593d6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 13 Dec 2021 20:56:06 +0100 Subject: [PATCH 250/550] etna --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4b3bdd2..eb5a53e 100644 --- a/README.md +++ b/README.md @@ -672,6 +672,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [adtk](https://github.com/arundo/adtk) - Time Series Anomaly Detection. [rocket](https://github.com/angus924/rocket) - Time Series classification using random convolutional kernels. [luminaire](https://github.com/zillow/luminaire) - Anomaly Detection for time series. +[etna](https://github.com/tinkoff-ai/etna) - Time Series library. ##### Time Series Evaluation [TimeSeriesSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) - Sklearn time series split. From 046ebcb225fb9e45094b943aad06226f0bb4eebf Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 16 Dec 2021 20:25:04 +0100 Subject: [PATCH 251/550] gradio --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index eb5a53e..6fa98b2 100644 --- a/README.md +++ b/README.md @@ -305,6 +305,9 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. +#### UI +[gradio](https://github.com/gradio-app/gradio) - Create UIs for your machine learning model. + #### Survey Tools [samplics](https://github.com/samplics-org/samplics) - Sampling techniques for complex survey designs. From 41509225d5a170d51dfe2ca6636ab2f19883cea4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 17 Dec 2021 11:45:18 +0100 Subject: [PATCH 252/550] python_for_microscopists --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6fa98b2..fc71c18 100644 --- a/README.md +++ b/README.md @@ -396,6 +396,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). ##### Image-related +[python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. [mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). [imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. [CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. From 4a0361865e227be83592282af27290bfb3e684a9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 17 Dec 2021 14:01:28 +0100 Subject: [PATCH 253/550] rational_activations --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index fc71c18..c6b1ef7 100644 --- a/README.md +++ b/README.md @@ -435,6 +435,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Lossfunction Related [SegLoss](https://github.com/JunMa11/SegLoss) - List of loss functions for medical image segmentation. +##### Activation Functions +[rational_activations](https://github.com/ml-research/rational_activations) - Rational activation functions. + ##### Text Related [ktext](https://github.com/hamelsmu/ktext) - Utilities for pre-processing text for deep learning in Keras. [textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. From a488237b3be795dc1a2b406fbe98e5109c8b2c05 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 17 Dec 2021 15:24:23 +0100 Subject: [PATCH 254/550] SegFormer --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index c6b1ef7..901271f 100644 --- a/README.md +++ b/README.md @@ -522,6 +522,10 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [Pytorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder) [StudioGAN](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) - Pytorch GAN implementations. +##### Transformers +[SegFormer](https://github.com/NVlabs/SegFormer) - Simple and Efficient Design for Semantic Segmentation with Transformers. +[esvit](https://github.com/microsoft/esvit) - Efficient self-supervised Vision Transformers. + ##### Graph-Based Neural Networks [How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) [Introduction To Graph Convolutional Networks](http://tkipf.github.io/graph-convolutional-networks/) From 372d037d34cf191a9fdc0295cdb43010b721d090 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 17 Dec 2021 16:04:04 +0100 Subject: [PATCH 255/550] Awesome Visual Transformer --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 901271f..c26c38c 100644 --- a/README.md +++ b/README.md @@ -970,6 +970,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Time Series](https://github.com/MaxBenChrist/awesome_time_series_in_python) [Awesome Time Series Anomaly Detection](https://github.com/rob-med/awesome-TS-anomaly-detection) [Awesome Visual Attentions](https://github.com/MenghaoGuo/Awesome-Vision-Attentions) +[Awesome Visual Transformer](https://github.com/dk-liang/Awesome-Visual-Transformer) #### Lectures [Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. From ad2406117a8acde790aab2754976389510cd4bdf Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 20 Dec 2021 23:28:30 +0100 Subject: [PATCH 256/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c26c38c..1c51ede 100644 --- a/README.md +++ b/README.md @@ -397,13 +397,13 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Image-related [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. +[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). [imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. [CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. [imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). [microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). [cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. -[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. #### Image Processing From 3b5a5778909eae75a13f71c608efc44789e49e49 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 21 Dec 2021 18:43:20 +0100 Subject: [PATCH 257/550] Sparse contrastive PCA --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1c51ede..3c82442 100644 --- a/README.md +++ b/README.md @@ -258,6 +258,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [direpack](https://github.com/SvenSerneels/direpack) - Projection pursuit, Sufficient dimension reduction, Robust M-estimators. [DBS](https://cran.r-project.org/web/packages/DatabionicSwarm/vignettes/DatabionicSwarm.html) - DatabionicSwarm (R package). [contrastive](https://github.com/abidlabs/contrastive) - Contrastive PCA. +[scPCA](https://github.com/PhilBoileau/scPCA) - Sparse contrastive PCA (R package). #### Training-related [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. From 056c8469a7832a731b55d35c5e32560419fdf68c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 1 Jan 2022 22:26:55 +0100 Subject: [PATCH 258/550] pyroc --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3c82442..fc9be32 100644 --- a/README.md +++ b/README.md @@ -780,6 +780,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [pandas_ml](https://github.com/pandas-ml/pandas-ml) - Confusion matrix. Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learning-curve/). [yellowbrick](http://www.scikit-yb.org/en/latest/api/model_selection/learning_curve.html) - Learning curve. +[pyroc](https://github.com/noudald/pyroc) - Receiver Operating Characteristic (ROC) curves. #### Model Uncertainty [uncertainty-toolbox](https://github.com/uncertainty-toolbox/uncertainty-toolbox) - Predictive uncertainty quantification, calibration, metrics, and visualization. From 38c606cc6f3a63794cf858a74066f53f53e8e725 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 2 Jan 2022 00:06:25 +0100 Subject: [PATCH 259/550] Second-generation p-values --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index fc9be32..8b44e39 100644 --- a/README.md +++ b/README.md @@ -132,6 +132,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal ##### Texts [Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) +[Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) [Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) [Chatruc - The Central Limit Theorem and its misuse](https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) [Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) From eb004722141b5c708a9d6c65170be609f516285e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 3 Jan 2022 22:46:38 +0100 Subject: [PATCH 260/550] Dependent Propabilities --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8b44e39..ffcce72 100644 --- a/README.md +++ b/README.md @@ -117,6 +117,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [confseq](https://github.com/gostevehoward/confseq) - Uniform boundaries, confidence sequences, and always-valid p-values. ##### Visualizations +[Dependent Propabilities](https://static.laszlokorte.de/stochastic/) [Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https://rpsychologist.com/d3/NHST/) [Correlation](https://rpsychologist.com/d3/correlation/) [Cohen's d](https://rpsychologist.com/d3/cohend/) @@ -124,7 +125,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Equivalence, non-inferiority and superiority testing](https://rpsychologist.com/d3/equivalence/) [Bayesian two-sample t test](https://rpsychologist.com/d3/bayes/) [Distribution of p-values when comparing two groups](https://rpsychologist.com/d3/pdist/) -[Understanding the t-distribution and its normal approximation](https://rpsychologist.com/d3/tdist/) +[Understanding the t-distribution and its normal approximation](https://rpsychologist.com/d3/tdist/) ##### Talks [Inverse Propensity Weighting](https://www.youtube.com/watch?v=SUq0shKLPPs) From ca3969a2598384b50770631a9a7912b6dff3c904 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 10 Jan 2022 00:11:22 +0100 Subject: [PATCH 261/550] tensorflow probability talk --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ffcce72..e44c422 100644 --- a/README.md +++ b/README.md @@ -759,7 +759,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects. [edward](https://github.com/blei-lab/edward) - Probabilistic modeling, inference, and criticism, [Mixture Density Networks (MNDs)](http://edwardlib.org/tutorials/mixture-density-network), [MDN Explanation](https://towardsdatascience.com/a-hitchhikers-guide-to-mixture-density-networks-76b435826cca). [Pyro](https://github.com/pyro-ppl/pyro) - Deep Universal Probabilistic Programming. -[tensorflow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). +[tensorflow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk1](https://www.youtube.com/watch?v=KJxmC5GCWe4), [notebook talk1](https://github.com/AlxndrMlk/PyDataGlobal2021/blob/main/00_PyData_Global_2021_nb_full.ipynb), [talk2](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). [bambi](https://github.com/bambinos/bambi) - High-level Bayesian model-building interface on top of PyMC3. [neural-tangents](https://github.com/google/neural-tangents) - Infinite Neural Networks. From f16aadff5c9cf9bced3e0b8ae84948f105033fe5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 10 Jan 2022 09:53:26 +0100 Subject: [PATCH 262/550] napari --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e44c422..149d79a 100644 --- a/README.md +++ b/README.md @@ -186,6 +186,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Image Cleanup [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. +[napari](https://github.com/napari/napari) - Multi-dimensional image viewer. [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). From 7b9ff1de596c0b369ab9b769972699e4e2ea2392 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 11 Jan 2022 15:10:29 +0100 Subject: [PATCH 263/550] unprocessing --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 149d79a..1cba697 100644 --- a/README.md +++ b/README.md @@ -192,6 +192,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [aydin](https://github.com/royerlab/aydin) - Image denoising. +[unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From e51dace7ec7f0f97248fa81f10c11e6e68734e02 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 11 Jan 2022 15:16:00 +0100 Subject: [PATCH 264/550] feast --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1cba697..dd40fc5 100644 --- a/README.md +++ b/README.md @@ -888,6 +888,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [dvc](https://github.com/iterative/dvc) - Version control for large files. [hangar](https://github.com/tensorwerk/hangar-py) - Version control for tensor data. [kedro](https://github.com/quantumblacklabs/kedro) - Build data pipelines. +[feast](https://github.com/feast-dev/feast) - Feature store. [Video](https://www.youtube.com/watch?v=_omcXenypmo). ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From 0405b10bf6c84894ce37ce9bfdcb0ab90cbf9643 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 12 Jan 2022 00:03:14 +0100 Subject: [PATCH 265/550] Awesome Quantitative Finance --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index dd40fc5..438707e 100644 --- a/README.md +++ b/README.md @@ -971,6 +971,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science) [Awesome Python Data Science](https://github.com/amitness/toolbox) [Awesome Pytorch](https://github.com/bharathgs/Awesome-pytorch-list) +[Awesome Quantitative Finance](https://github.com/wilsonfreitas/awesome-quant) [Awesome Recommender Systems](https://github.com/grahamjenson/list_of_recommender_systems) [Awesome Semantic Segmentation](https://github.com/mrgloom/awesome-semantic-segmentation) [Awesome Sentence Embedding](https://github.com/Separius/awesome-sentence-embedding) From eccaf24023f4e5aa0de908699a7edbbc5e9d6cb0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 13 Jan 2022 10:28:30 +0100 Subject: [PATCH 266/550] DeepPurpose --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 438707e..e9172a0 100644 --- a/README.md +++ b/README.md @@ -411,6 +411,9 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. +###### Drug discovery +[DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modeling and Prediction Toolkit. + #### Image Processing [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) [cv2](https://github.com/skvark/opencv-python) - OpenCV, classical algorithms: [Gaussian Filter](https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html), [Morphological Transformations](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html). From dbf0d763104fd597ee4494f817e246aa90043764 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 13 Jan 2022 16:42:42 +0100 Subject: [PATCH 267/550] AlphaPy --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e9172a0..245f190 100644 --- a/README.md +++ b/README.md @@ -825,6 +825,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [automl-gs](https://github.com/minimaxir/automl-gs) - Automated machine learning. [mljar](https://github.com/mljar/mljar-supervised) - Automated machine learning. [automl_zero](https://github.com/google-research/google-research/tree/master/automl_zero) - Automatically discover computer programs that can solve machine learning tasks from Google. +[AlphaPy](https://github.com/ScottfreeLLC/AlphaPy) - Automated Machine Learning using scikit-learn xgboost, LightGBM and others. #### Graph Representation Learning [Karate Club](https://github.com/benedekrozemberczki/karateclub) - Unsupervised learning on graphs. From de17d861769b2622cff6b0c84adbc939ac589124 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 13 Jan 2022 17:01:43 +0100 Subject: [PATCH 268/550] fiftyone --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 245f190..0c0dc9e 100644 --- a/README.md +++ b/README.md @@ -187,6 +187,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Image Cleanup [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [napari](https://github.com/napari/napari) - Multi-dimensional image viewer. +[fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). From 1636dffededdbc0ad5ac1a61b35abdc104341ab6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 Jan 2022 09:54:31 +0100 Subject: [PATCH 269/550] neural_prophet --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0c0dc9e..6568158 100644 --- a/README.md +++ b/README.md @@ -659,6 +659,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [statsmodels](https://www.statsmodels.org/dev/tsa.html) - Time series analysis, [seasonal decompose](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) [example](https://gist.github.com/balzer82/5cec6ad7adc1b550e7ee), [SARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html), [granger causality](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.grangercausalitytests.html). [kats](https://github.com/facebookresearch/kats) - Time series prediction library by Facebook. [prophet](https://github.com/facebook/prophet) - Time series prediction library by Facebook. +[neural_prophet](https://github.com/ourownstory/neural_prophet) - Time series prediction built on Pytorch. [pyramid](https://github.com/tgsmith61591/pyramid), [pmdarima](https://github.com/tgsmith61591/pmdarima) - Wrapper for (Auto-) ARIMA. [modeltime](https://cran.r-project.org/web/packages/modeltime/index.html) - Time series forecasting framework (R package). [pyflux](https://github.com/RJT1990/pyflux) - Time series prediction algorithms (ARIMA, GARCH, GAS, Bayesian). From 7f06d0ecf4f61cadf65d97451f8895932f4b8cb3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 16 Jan 2022 21:13:07 +0100 Subject: [PATCH 270/550] Riskfolio-Lib, Detic, Example notebooks for interactive visualizations --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 6568158..1f248b2 100644 --- a/README.md +++ b/README.md @@ -271,6 +271,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). +[Example notebooks for interactive visualizations](https://github.com/nicolaskruchten/pydata_global_2021/tree/main)(Plotly,Seaborn, Holoviz, Altair) [cufflinks](https://github.com/santosjorge/cufflinks) - Dynamic visualization library, wrapper for [plotly](https://plot.ly/), [medium](https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e), [example](https://github.com/WillKoehrsen/Data-Analysis/blob/master/plotly/Plotly%20Whirlwind%20Introduction.ipynb). [physt](https://github.com/janpipek/physt) - Better histograms, [talk](https://www.youtube.com/watch?v=ZG-wH3-Up9Y), [notebook](https://nbviewer.jupyter.org/github/janpipek/pydata2018-berlin/blob/master/notebooks/talk.ipynb). [fast-histogram](https://github.com/astrofrog/fast-histogram) - Fast histograms. @@ -500,6 +501,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [CenterNet](https://github.com/xingyizhou/CenterNet) - Object detection. [FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. [norfair](https://github.com/tryolabs/norfair) - Real-time 2D object tracking. +[Detic](https://github.com/facebookresearch/Detic) - Detector with image classes that can use image-level labels (facebookresearch). ##### Image Annotation [cvat](https://github.com/openvinotoolkit/cvat) - Image annotation tool. @@ -713,6 +715,7 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. [tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. [quantstats](https://github.com/ranaroussi/quantstats) - Portfolio management. +[Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib) - Portfolio optimization and strategic asset allocation. ##### Quantopian Stack [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. From af36e344a2e87d9a4221dfc934b852892a69e47d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 17 Jan 2022 00:23:03 +0100 Subject: [PATCH 271/550] bamboolib --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1f248b2..22e81d3 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [nbcommands](https://github.com/vinayak-mehta/nbcommands) - View and search notebooks from terminal. [handcalcs](https://github.com/connorferster/handcalcs) - More convenient way of writing mathematical equations in Jupyter. [notebooker](https://github.com/man-group/notebooker) - Productionize and schedule Jupyter Notebooks. +[bamboolib](https://github.com/tkrabel/bamboolib) - Intuitive GUI for tables. #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) From 2568f3ebc97ce9de8fe5cc10e13302c3e54139d6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 19 Jan 2022 10:51:06 +0100 Subject: [PATCH 272/550] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 22e81d3..d6c03e3 100644 --- a/README.md +++ b/README.md @@ -491,6 +491,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [flexflow](https://github.com/flexflow/FlexFlow) - Distributed TensorFlow Keras and PyTorch. ##### Architecture Visualization +[Awesome List](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) [netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. ##### Object detection / Instance Segmentation @@ -973,6 +974,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) [Awesome Metric Learning](https://github.com/kdhht2334/Survey_of_Deep_Metric_Learning) [Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) +[Awesome Neural Network Visualization)(https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) [Awesome Public APIs](https://github.com/public-apis/public-apis) From 47d036d22a9fe09a772256ab4313eed04fc81f9b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 19 Jan 2022 11:27:31 +0100 Subject: [PATCH 273/550] ffcv --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d6c03e3..c47bcf3 100644 --- a/README.md +++ b/README.md @@ -472,6 +472,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Training metrics. [imgclsmob](https://github.com/osmr/imgclsmob) - Pretrained models. [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. +[ffcv](https://github.com/libffcv/ffcv) - Fast dataloder. ##### Libs Pytorch [Good Pytorch Introduction](https://cs230.stanford.edu/blog/pytorch/) From 03c245f02250977d0632c0a7011224a91e029f61 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 19 Jan 2022 23:06:58 +0100 Subject: [PATCH 274/550] microsoft/interpret -> interpretml/interpret --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index c47bcf3..ac3f081 100644 --- a/README.md +++ b/README.md @@ -642,10 +642,6 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach * Average largest within-cluster gap * Variation of clusterings on bootstrapped data -#### Interpretable Classifiers and Regressors -[skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. -[sklearn-expertsys](https://github.com/tmadl/sklearn-expertsys) - Interpretable classifiers, Bayesian Rule List classifier. - #### Multi-label classification [scikit-multilearn](https://github.com/scikit-multilearn/scikit-multilearn) - Multi-label classification, [talk](https://www.youtube.com/watch?v=m-tAASQA7XQ&t=18m57s). @@ -800,6 +796,10 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin #### Model Uncertainty [uncertainty-toolbox](https://github.com/uncertainty-toolbox/uncertainty-toolbox) - Predictive uncertainty quantification, calibration, metrics, and visualization. +#### Interpretable Classifiers and Regressors +[skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. +[sklearn-expertsys](https://github.com/tmadl/sklearn-expertsys) - Interpretable classifiers, Bayesian Rule List classifier. + #### Model Explanation, Interpretability, Feature Importance [Book](https://christophm.github.io/interpretable-ml-book/agnostic.html), [Examples](https://github.com/jphall663/interpretable_machine_learning_with_python) [shap](https://github.com/slundberg/shap) - Explain predictions of machine learning models, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Good Shap intro](https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/). @@ -822,7 +822,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [xai](https://github.com/EthicalML/XAI) - An eXplainability toolbox for machine learning. [innvestigate](https://github.com/albermax/innvestigate) - A toolbox to investigate neural network predictions. [dalex](https://github.com/pbiecek/DALEX) - Explanations for ML models (R package). -[interpret](https://github.com/microsoft/interpret) - Fit interpretable models, explain models (Microsoft). +[interpretml](https://github.com/interpretml/interpret) - Fit interpretable models, explain models. #### Automated Machine Learning [AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on tensorflow. From ac631fb8b414c5569fe5b38d225730bf9ca4c06e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 24 Jan 2022 00:53:13 +0100 Subject: [PATCH 275/550] cell segmentation algorithms and TDC --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index ac3f081..b1592cb 100644 --- a/README.md +++ b/README.md @@ -404,6 +404,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). ##### Image-related +[Overview over cell segmentation algorithms](https://biomag-lab.github.io/microscopy-tree/) [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). @@ -415,6 +416,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. ###### Drug discovery +[TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modeling and Prediction Toolkit. #### Image Processing From 3476dc9c83ffcfdd2426204cd262cd39e450c05f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 24 Jan 2022 13:41:03 +0100 Subject: [PATCH 276/550] voila-gridstack --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index b1592cb..ebb892a 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,8 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [handcalcs](https://github.com/connorferster/handcalcs) - More convenient way of writing mathematical equations in Jupyter. [notebooker](https://github.com/man-group/notebooker) - Productionize and schedule Jupyter Notebooks. [bamboolib](https://github.com/tkrabel/bamboolib) - Intuitive GUI for tables. +[voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. +[voila-gridstack](https://github.com/voila-dashboards/voila-gridstack) - Voila grid layout. #### Pandas Tricks, Alternatives and Additions [Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) @@ -312,6 +314,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [panel](https://panel.pyviz.org/index.html) - Dashboarding solution. [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. +[voila-gridstack](https://github.com/voila-dashboards/voila-gridstack) - Voila grid layout. #### UI [gradio](https://github.com/gradio-app/gradio) - Create UIs for your machine learning model. From fcc1cf768993a7eeb6b98210cd25438a7cfe722d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 24 Jan 2022 19:50:05 +0100 Subject: [PATCH 277/550] Introduction to Probability for Data Science --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ebb892a..319d401 100644 --- a/README.md +++ b/README.md @@ -951,8 +951,9 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin ##### Guidelines [datasharing](https://github.com/jtleek/datasharing) - Guide to data sharing. -##### List of Books +##### Books [Mat Kelceys list of cool machine learning books](http://matpalm.com/blog/cool_machine_learning_books/) +[Introduction to Probability for Data Science](https://probability4datascience.com/index.html) ##### Other Awesome Lists [Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) From 850eccb8d63b38d75fbb7f62efe54e202a45893b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 25 Jan 2022 10:56:09 +0100 Subject: [PATCH 278/550] linearmodels --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 319d401..fe1381b 100644 --- a/README.md +++ b/README.md @@ -104,6 +104,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [Verifying the Assumptions of Linear Models](https://github.com/erykml/medium_articles/blob/master/Statistics/linear_regression_assumptions.ipynb) [Mediation and Moderation Intro](https://ademos.people.uic.edu/Chapter14.html) [statsmodels](https://www.statsmodels.org/stable/index.html) - Statistical tests. +[linearmodels](https://github.com/bashtage/linearmodels) - Instrumental variable and panel data models. [pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. [Pairwise correlation between columns of pandas DataFrame](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html) [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. @@ -952,8 +953,8 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [datasharing](https://github.com/jtleek/datasharing) - Guide to data sharing. ##### Books -[Mat Kelceys list of cool machine learning books](http://matpalm.com/blog/cool_machine_learning_books/) -[Introduction to Probability for Data Science](https://probability4datascience.com/index.html) +[Chan - Introduction to Probability for Data Science](https://probability4datascience.com/index.html) +[Colonescu - Principles of Econometrics with R](https://bookdown.org/ccolonescu/RPoE4/) ##### Other Awesome Lists [Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) @@ -977,6 +978,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Gradient Boosting](https://github.com/benedekrozemberczki/awesome-gradient-boosting-papers) [Awesome Learning with Label Noise](https://github.com/subeeshvasu/Awesome-Learning-with-Label-Noise) [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning#python) +[Awesome Machine Learning Books](http://matpalm.com/blog/cool_machine_learning_books/) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) [Awesome Metric Learning](https://github.com/kdhht2334/Survey_of_Deep_Metric_Learning) From bd971f804c25958b25a53af95b7d03efc4f973a2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 25 Jan 2022 10:57:28 +0100 Subject: [PATCH 279/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index fe1381b..45a4381 100644 --- a/README.md +++ b/README.md @@ -983,7 +983,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) [Awesome Metric Learning](https://github.com/kdhht2334/Survey_of_Deep_Metric_Learning) [Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) -[Awesome Neural Network Visualization)(https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) +[Awesome Neural Network Visualization](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) [Awesome Public APIs](https://github.com/public-apis/public-apis) From c62c26e287f1ade4f8558e54165068c1a1d1ae8d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 26 Jan 2022 20:41:19 +0100 Subject: [PATCH 280/550] causality --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 45a4381..3412f09 100644 --- a/README.md +++ b/README.md @@ -763,6 +763,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [causalml](https://github.com/uber/causalml) - Causal inference by Uber. [upliftml](https://github.com/bookingcom/upliftml) - Causal inference by Booking.com. [EconML](https://github.com/microsoft/EconML) - Heterogeneous Treatment Effects Estimation by Microsoft. +[causality](https://github.com/akelleh/causality) - Causal analysis using observational datasets. #### Probabilistic Modeling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) From 4bcca9600f86df83d8dcd9b862b4e5182cdd689a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 1 Feb 2022 11:14:32 +0100 Subject: [PATCH 281/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3412f09..026333e 100644 --- a/README.md +++ b/README.md @@ -502,6 +502,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. ##### Object detection / Instance Segmentation +[Good Yolo Explanation](https://jonathan-hui.medium.com/real-time-object-detection-with-yolo-yolov2-28b1b93e2088) [segmentation_models](https://github.com/qubvel/segmentation_models) - Segmentation models with pretrained backbones: Unet, FPN, Linknet, PSPNet. [yolact](https://github.com/dbolya/yolact) - Fully convolutional model for real-time instance segmentation. [EfficientDet Pytorch](https://github.com/toandaominh1997/EfficientDet.Pytorch), [EfficientDet Keras](https://github.com/xuannianz/EfficientDet) - Scalable and Efficient Object Detection. From 195b6b477a9da29b1c9fff1d456fc79dd22dbb34 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 1 Feb 2022 15:10:19 +0100 Subject: [PATCH 282/550] Python Causality Handbook --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 026333e..49d1be8 100644 --- a/README.md +++ b/README.md @@ -759,6 +759,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference +[Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). [causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). [causalml](https://github.com/uber/causalml) - Causal inference by Uber. From 1b687548d78d43ec4c49da74f47d02a3ce9617cc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 2 Feb 2022 15:51:01 +0100 Subject: [PATCH 283/550] shapash --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 49d1be8..38ed8db 100644 --- a/README.md +++ b/README.md @@ -832,6 +832,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [innvestigate](https://github.com/albermax/innvestigate) - A toolbox to investigate neural network predictions. [dalex](https://github.com/pbiecek/DALEX) - Explanations for ML models (R package). [interpretml](https://github.com/interpretml/interpret) - Fit interpretable models, explain models. +[shapash](https://github.com/MAIF/shapash) - Model interpretability. #### Automated Machine Learning [AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on tensorflow. From e77185d25bc47a833e2eeb737a97efdb2c6a7dc9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 3 Feb 2022 12:52:19 +0100 Subject: [PATCH 284/550] imodels --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 38ed8db..329b4e6 100644 --- a/README.md +++ b/README.md @@ -833,6 +833,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [dalex](https://github.com/pbiecek/DALEX) - Explanations for ML models (R package). [interpretml](https://github.com/interpretml/interpret) - Fit interpretable models, explain models. [shapash](https://github.com/MAIF/shapash) - Model interpretability. +[imodels](https://github.com/csinva/imodels) - Interpretable ML package. #### Automated Machine Learning [AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on tensorflow. From 517b67d90d070b703144480a0604f34bb742a33f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 4 Feb 2022 14:58:52 +0100 Subject: [PATCH 285/550] Statistical Rethinking --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 329b4e6..f9d5fe7 100644 --- a/README.md +++ b/README.md @@ -759,6 +759,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference +[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models. [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). [causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). From 59b3b10a8f8ff44d77cbe713462ddfab147c06eb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 4 Feb 2022 15:04:27 +0100 Subject: [PATCH 286/550] numpyro --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f9d5fe7..aec3a70 100644 --- a/README.md +++ b/README.md @@ -770,7 +770,8 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Probabilistic Modeling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) -[PyMC3](https://docs.pymc.io/) - Baysian modelling, [intro](https://docs.pymc.io/notebooks/getting_started) +[PyMC3](https://docs.pymc.io/) - Bayesian modelling, [intro](https://docs.pymc.io/notebooks/getting_started) +[numpyro](https://github.com/pyro-ppl/numpyro) - Probabilistic programming with numpy, built on [pyro](https://github.com/pyro-ppl/pyro). [pomegranate](https://github.com/jmschrei/pomegranate) - Probabilistic modelling, [talk](https://www.youtube.com/watch?v=dE5j6NW-Kzg). [pmlearn](https://github.com/pymc-learn/pymc-learn) - Probabilistic machine learning. [arviz](https://github.com/arviz-devs/arviz) - Exploratory analysis of Bayesian models. From c728d2f77a77bf12ed5378d2c7627509692ba204 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 4 Feb 2022 15:06:02 +0100 Subject: [PATCH 287/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index aec3a70..8630798 100644 --- a/README.md +++ b/README.md @@ -759,7 +759,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference -[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models. +[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [python notebooks](https://github.com/pymc-devs/resources/tree/master/Rethinking_2). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). [causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). From d3875f6feec65381a97786aef2d6757e158a992e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 11 Feb 2022 19:23:05 +0100 Subject: [PATCH 288/550] Dendrogram tutorial and dendextend --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 8630798..9e608f7 100644 --- a/README.md +++ b/README.md @@ -609,6 +609,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach #### Clustering [Overview of clustering algorithms applied image data (= Deep Clustering)](https://deepnotes.io/deep-clustering). [Clustering with Deep Learning: Taxonomy and New Methods](https://arxiv.org/pdf/1801.07648.pdf). +[Hierarchical Cluster Analysis (R Tutorial)](https://uc-r.github.io/hc_clustering) - Dendrogram, Tanglegram [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) - Clustering algorithm, [talk](https://www.youtube.com/watch?v=dGsxd67IFiU), [blog](https://towardsdatascience.com/understanding-hdbscan-and-density-based-clustering-121dbee1320e). [pyclustering](https://github.com/annoviko/pyclustering) - All sorts of clustering algorithms. [FCPS](https://github.com/Mthrun/FCPS) - Fundamental Clustering Problems Suite (R package). @@ -622,6 +623,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [phenograph](https://github.com/dpeerlab/phenograph) - Clustering by community detection. [HypHC](https://github.com/HazyResearch/HypHC) - Hyperbolic Hierarchical Clustering. [BanditPAM](https://github.com/ThrunGroup/BanditPAM) - Improved k-Medoids Clustering. +[dendextend](https://github.com/talgalili/dendextend) - Comparing dendrograms (R package). ##### Clustering Evalutation [Wagner, Wagner - Comparing Clusterings - An Overview](https://publikationen.bibliothek.kit.edu/1000011477/812079) From aede8b697fd1c1f1a72d22fc9022ef1e46f3ea05 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 12 Feb 2022 01:29:25 +0100 Subject: [PATCH 289/550] Delete INTERESTING.md --- INTERESTING.md | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 INTERESTING.md diff --git a/INTERESTING.md b/INTERESTING.md deleted file mode 100644 index 5b4e302..0000000 --- a/INTERESTING.md +++ /dev/null @@ -1,6 +0,0 @@ -#### Paradoxes -Inspection paradox | [link](https://allendowney.blogspot.com/2015/08/the-inspection-paradox-is-everywhere.html), [link](https://jakevdp.github.io/blog/2018/09/13/waiting-time-paradox/) -Simpsons paradox | [link](https://en.wikipedia.org/wiki/Simpson%27s_paradox) -Berksons paradox | [link](https://en.wikipedia.org/wiki/Berkson%27s_paradox) -Base rate fallacy | [link](https://en.wikipedia.org/wiki/Base_rate_fallacy) -Sampling bias | [link](https://en.wikipedia.org/wiki/Sampling_bias) From 9b623d2ee72e179f2398e14b17a6409695506183 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 12 Feb 2022 23:37:10 +0100 Subject: [PATCH 290/550] CORAL --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9e608f7..6b8488f 100644 --- a/README.md +++ b/README.md @@ -427,6 +427,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) [cv2](https://github.com/skvark/opencv-python) - OpenCV, classical algorithms: [Gaussian Filter](https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html), [Morphological Transformations](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html). [scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. +[CORAL](https://github.com/VisionLearningGroup/CORAL) - Correlation Alignment for Domain Adaptation. #### Neural Networks [Great Gradient Descent Article](https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9) From 8de296bff4e324d022dc09ff162ed8b758cc48ca Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 13 Feb 2022 14:02:50 +0100 Subject: [PATCH 291/550] ConvNet Shape Calculator --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6b8488f..d3c5806 100644 --- a/README.md +++ b/README.md @@ -430,6 +430,8 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [CORAL](https://github.com/VisionLearningGroup/CORAL) - Correlation Alignment for Domain Adaptation. #### Neural Networks +[Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. +[ConvNet Shape Calculator](https://madebyollin.github.io/convnet-calculator/) - Calculate output dimensions of Conv2D layer. [Great Gradient Descent Article](https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9) [Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html) @@ -1012,7 +1014,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Visual Transformer](https://github.com/dk-liang/Awesome-Visual-Transformer) #### Lectures -[Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. [NYU Deep Learning SP21](https://www.youtube.com/playlist?list=PLLHTzKZzVU9e6xUfG10TkTWApKSZCzuBI) - Youtube Playlist. #### Things I google a lot From 6f924c6c5b797509a904701df4f8994e0a83c455 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 15 Feb 2022 19:31:42 +0100 Subject: [PATCH 292/550] ashlar --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d3c5806..2c3f5fb 100644 --- a/README.md +++ b/README.md @@ -194,6 +194,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. +[ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [aydin](https://github.com/royerlab/aydin) - Image denoising. From 2313fb83a663863774027acb39b0f7d77724e5f1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 15 Feb 2022 20:10:25 +0100 Subject: [PATCH 293/550] Update README.md --- README.md | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 2c3f5fb..98d51ed 100644 --- a/README.md +++ b/README.md @@ -192,14 +192,22 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [napari](https://github.com/napari/napari) - Multi-dimensional image viewer. [fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. -[cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. -[BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. -[ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. -[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection of fluorescence microscopy images, [Project page](https://csbdeep.bioimagecomputing.com/tools/). [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [aydin](https://github.com/royerlab/aydin) - Image denoising. [unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. +#### Microscopy +[Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). +[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). +[MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. +[skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). +[cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. +[BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. +[ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. +[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection, [Project page](https://csbdeep.bioimagecomputing.com/tools/). +[mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Paper](https://www.nature.com/articles/s41592-021-01308-y). +[UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. + #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. [mahotas](https://github.com/luispedro/mahotas) - Zernike, Haralick, LBP, and TAS features. @@ -403,17 +411,19 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Papers [Search Engine Correlation](https://arxiv.org/pdf/1107.2691.pdf) -#### Biology +#### Biology / Bioinformatics ##### Sequencing +[cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). ##### Image-related +See also Microscopy Section above. [Overview over cell segmentation algorithms](https://biomag-lab.github.io/microscopy-tree/) [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. -[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). [imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. +[scimap](https://github.com/labsyspharm/scimap) - Spatial Single-Cell Analysis Toolkit. [CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. [imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). [microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). @@ -625,6 +635,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. [distribution_clustering](https://github.com/EricElmoznino/distribution_clustering), [paper](https://arxiv.org/abs/1804.02624), [related paper](https://arxiv.org/abs/2003.07770), [alt](https://github.com/r0f1/distribution_clustering). [phenograph](https://github.com/dpeerlab/phenograph) - Clustering by community detection. +[FastPG](https://github.com/sararselitsky/FastPG) - Clustering of single cell data (RNA). Improvement of phenograph, [Paper](https://www.researchgate.net/publication/342339899_FastPG_Fast_clustering_of_millions_of_single_cells). [HypHC](https://github.com/HazyResearch/HypHC) - Hyperbolic Hierarchical Clustering. [BanditPAM](https://github.com/ThrunGroup/BanditPAM) - Improved k-Medoids Clustering. [dendextend](https://github.com/talgalili/dendextend) - Comparing dendrograms (R package). From 65b8fb5f0c7715500915d63da052f6d7beddf214 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 16 Feb 2022 12:17:06 +0100 Subject: [PATCH 294/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 98d51ed..56ad683 100644 --- a/README.md +++ b/README.md @@ -230,6 +230,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. [SubTab](https://github.com/AstraZeneca/SubTab) - Subsetting Features of Tabular Data for Self-Supervised Representation Learning, AstraZeneca. +[mrmr](https://github.com/smazzanti/mrmr) - Maximum Relevance and Minimum Redundancy Feature Selection, [Website](http://home.penglab.com/proj/mRMR/). #### Subset Selection [apricot](https://github.com/jmschrei/apricot) - Selecting subsets of data sets to train machine learning models quickly. From b2b519a8010ed47f1cb3219b0e2b05166cc4419e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 18 Feb 2022 12:26:50 +0100 Subject: [PATCH 295/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 56ad683..5a997a9 100644 --- a/README.md +++ b/README.md @@ -197,6 +197,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. #### Microscopy +[BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap for fluorescence microscopy dyes. [Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. From b559e3cd74a2b45fbb892dab0e2062939e024d56 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 18 Feb 2022 12:27:42 +0100 Subject: [PATCH 296/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5a997a9..4a8e254 100644 --- a/README.md +++ b/README.md @@ -197,7 +197,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. #### Microscopy -[BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap for fluorescence microscopy dyes. +[BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. From 7af24660f7a8b5aec03e1381fc9049c081050b19 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 24 Feb 2022 14:59:01 +0100 Subject: [PATCH 297/550] Awesome Cytodata --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 4a8e254..f5ecd52 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,6 @@ # Awesome Data Science with Python > A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks. - #### Core [pandas](https://pandas.pydata.org/) - Data structures built on top of [numpy](https://www.numpy.org/). [scikit-learn](https://scikit-learn.org/stable/) - Core ML library. @@ -197,6 +196,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. #### Microscopy +[Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). @@ -990,7 +990,8 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) [Awesome Causality](https://github.com/rguo12/awesome-causality-algorithms) [Awesome Community Detection](https://github.com/benedekrozemberczki/awesome-community-detection) -[Awesome CSV](https://github.com/secretGeek/AwesomeCSV) +[Awesome CSV](https://github.com/secretGeek/AwesomeCSV) +[Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) [Awesome Dash](https://github.com/ucg8j/awesome-dash) [Awesome Decision Trees](https://github.com/benedekrozemberczki/awesome-decision-tree-papers) From 612d764ffa8970069c5c1ff3924a8df4adcbf6f1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 28 Feb 2022 14:34:54 +0100 Subject: [PATCH 298/550] janggu --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index f5ecd52..45cda09 100644 --- a/README.md +++ b/README.md @@ -418,6 +418,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Sequencing [cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). +[janggu](https://github.com/BIMSBbioinfo/janggu) - Deep Learning for Genomics. ##### Image-related See also Microscopy Section above. From 2873630e96a7c5e9c319f8203f8942828c827574 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 2 Mar 2022 16:19:18 +0100 Subject: [PATCH 299/550] tsfel --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 45cda09..2138246 100644 --- a/README.md +++ b/README.md @@ -700,6 +700,7 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor ), [link](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/timeseries), [link](https://github.com/hzy46/TensorFlow-Time-Series-Examples), [Explain LSTM](https://github.com/slundberg/shap/blob/master/notebooks/deep_explainer/Keras%20LSTM%20for%20IMDB%20Sentiment%20Classification.ipynb), seq2seq: [1](https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/), [2](https://github.com/guillaume-chevalier/seq2seq-signal-prediction), [3](https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/TS_Seq2Seq_Intro.ipynb), [4](https://github.com/LukeTonin/keras-seq-2-seq-signal-prediction) [tspreprocess](https://github.com/MaxBenChrist/tspreprocess) - Preprocessing: Denoising, Compression, Resampling. [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. +[tsfel](https://github.com/fraunhoferportugal/tsfel) - Time series feature extraction. [thunder](https://github.com/thunder-project/thunder) - Data structures and algorithms for loading, processing, and analyzing time series data. [gatspy](https://www.astroml.org/gatspy/) - General tools for Astronomical Time Series, [talk](https://www.youtube.com/watch?v=E4NMZyfao2c). [gendis](https://github.com/IBCNServices/GENDIS) - shapelets, [example](https://github.com/IBCNServices/GENDIS/blob/master/gendis/example.ipynb). From aaa030ee65bbe636669bafaf13dd57f3d8ec61e4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 3 Mar 2022 17:20:46 +0100 Subject: [PATCH 300/550] gdsctools --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2138246..e59784c 100644 --- a/README.md +++ b/README.md @@ -419,6 +419,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). [janggu](https://github.com/BIMSBbioinfo/janggu) - Deep Learning for Genomics. +[gdsctools](https://github.com/CancerRxGene/gdsctools) - Drug responses in the context of the Genomics of Drug Sensitivity in Cancer project, ANOVA, IC50, MoBEM, [doc](https://gdsctools.readthedocs.io/en/master/). ##### Image-related See also Microscopy Section above. From 70a4914c853c402e64bec14005682abea13c03eb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 4 Mar 2022 20:07:18 +0100 Subject: [PATCH 301/550] bnlearn --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e59784c..7f031d9 100644 --- a/README.md +++ b/README.md @@ -804,6 +804,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [tensorflow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk1](https://www.youtube.com/watch?v=KJxmC5GCWe4), [notebook talk1](https://github.com/AlxndrMlk/PyDataGlobal2021/blob/main/00_PyData_Global_2021_nb_full.ipynb), [talk2](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). [bambi](https://github.com/bambinos/bambi) - High-level Bayesian model-building interface on top of PyMC3. [neural-tangents](https://github.com/google/neural-tangents) - Infinite Neural Networks. +[bnlearn](https://github.com/erdogant/bnlearn) - Bayesian networks, parameter learning, inference and sampling methods. #### Gaussian Processes [Visualization](http://www.infinitecuriosity.org/vizgp/), [Article](https://distill.pub/2019/visual-exploration-gaussian-processes/) From 74d3b562a98dc4d69909461d489f0a87af28af3a Mon Sep 17 00:00:00 2001 From: Harshit Surana Date: Wed, 9 Mar 2022 16:04:15 +0530 Subject: [PATCH 302/550] Add Chaos Genius to Data Science Repo --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7f031d9..293556f 100644 --- a/README.md +++ b/README.md @@ -724,6 +724,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [rocket](https://github.com/angus924/rocket) - Time Series classification using random convolutional kernels. [luminaire](https://github.com/zillow/luminaire) - Anomaly Detection for time series. [etna](https://github.com/tinkoff-ai/etna) - Time Series library. +[Chaos Genius](https://github.com/chaos-genius/chaos_genius) - ML powered analytics engine for outlier/anomaly detection and root cause analysis. ##### Time Series Evaluation [TimeSeriesSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) - Sklearn time series split. From 28ec5701d51f9610d2c85e4d16cdec6f982c55c9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 9 Mar 2022 15:07:40 +0100 Subject: [PATCH 303/550] arfs --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 293556f..508c37d 100644 --- a/README.md +++ b/README.md @@ -232,6 +232,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [INVASE](https://github.com/jsyoon0823/INVASE) - Instance-wise Variable Selection using Neural Networks. [SubTab](https://github.com/AstraZeneca/SubTab) - Subsetting Features of Tabular Data for Self-Supervised Representation Learning, AstraZeneca. [mrmr](https://github.com/smazzanti/mrmr) - Maximum Relevance and Minimum Redundancy Feature Selection, [Website](http://home.penglab.com/proj/mRMR/). +[arfs](https://github.com/ThomasBury/arfs) - All Relevant Feature Selection. #### Subset Selection [apricot](https://github.com/jmschrei/apricot) - Selecting subsets of data sets to train machine learning models quickly. From 3268f27100002759abb9aaeebdebb27dd5178c3c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 9 Mar 2022 15:20:12 +0100 Subject: [PATCH 304/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 508c37d..b28bc69 100644 --- a/README.md +++ b/README.md @@ -195,7 +195,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [aydin](https://github.com/royerlab/aydin) - Image denoising. [unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. -#### Microscopy +#### Microscopy / Segmentation [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). @@ -208,6 +208,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection, [Project page](https://csbdeep.bioimagecomputing.com/tools/). [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Paper](https://www.nature.com/articles/s41592-021-01308-y). [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. +[stardist](https://github.com/stardist/stardist) - Object Detection with Star-convex Shapes. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From 996c1a3e9d3355becb753699384e73e4913321af Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 9 Mar 2022 15:29:33 +0100 Subject: [PATCH 305/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b28bc69..420c37b 100644 --- a/README.md +++ b/README.md @@ -209,6 +209,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Paper](https://www.nature.com/articles/s41592-021-01308-y). [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. [stardist](https://github.com/stardist/stardist) - Object Detection with Star-convex Shapes. +[nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From 3ed03b3bb1d06fb4454b8aaa3b1b6c79ae951d29 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 13 Mar 2022 13:43:06 +0100 Subject: [PATCH 306/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 420c37b..a4b0da4 100644 --- a/README.md +++ b/README.md @@ -135,6 +135,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Dealing with Selection Bias By Propensity Based Feature Selection](https://www.youtube.com/watch?reload=9&v=3ZWCKr0vDtc) ##### Texts +[Montgomery et al. - How conditioning on post-treatment variables can ruin your experiment and what to do about it](https://cpb-us-e1.wpmucdn.com/sites.dartmouth.edu/dist/5/2293/files/2021/03/post-treatment-bias.pdf) [Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) [Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) @@ -787,6 +788,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Causal Inference [Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [python notebooks](https://github.com/pymc-devs/resources/tree/master/Rethinking_2). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) +[dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects. [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). [causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). [causalml](https://github.com/uber/causalml) - Causal inference by Uber. @@ -802,7 +804,6 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [pmlearn](https://github.com/pymc-learn/pymc-learn) - Probabilistic machine learning. [arviz](https://github.com/arviz-devs/arviz) - Exploratory analysis of Bayesian models. [zhusuan](https://github.com/thu-ml/zhusuan) - Bayesian deep learning, generative models. -[dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects. [edward](https://github.com/blei-lab/edward) - Probabilistic modeling, inference, and criticism, [Mixture Density Networks (MNDs)](http://edwardlib.org/tutorials/mixture-density-network), [MDN Explanation](https://towardsdatascience.com/a-hitchhikers-guide-to-mixture-density-networks-76b435826cca). [Pyro](https://github.com/pyro-ppl/pyro) - Deep Universal Probabilistic Programming. [tensorflow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk1](https://www.youtube.com/watch?v=KJxmC5GCWe4), [notebook talk1](https://github.com/AlxndrMlk/PyDataGlobal2021/blob/main/00_PyData_Global_2021_nb_full.ipynb), [talk2](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). From 1765f8390c838401a9c41c9ee82179223e21ca42 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 14 Mar 2022 12:53:50 +0100 Subject: [PATCH 307/550] mit6874 --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index a4b0da4..025cb86 100644 --- a/README.md +++ b/README.md @@ -438,10 +438,13 @@ See also Microscopy Section above. [cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. -###### Drug discovery +##### Drug discovery [TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modeling and Prediction Toolkit. +##### Courses +[mit6874](https://mit6874.github.io/) - Computational Systems Biology: Deep Learning in the Life Sciences. + #### Image Processing [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) [cv2](https://github.com/skvark/opencv-python) - OpenCV, classical algorithms: [Gaussian Filter](https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html), [Morphological Transformations](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html). From d6bfa9b6badf87ba0a0145d5072dec44fe2c3541 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 16 Mar 2022 00:22:56 +0100 Subject: [PATCH 308/550] Effect Modification and Interaction --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 025cb86..10e1373 100644 --- a/README.md +++ b/README.md @@ -799,6 +799,9 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [EconML](https://github.com/microsoft/EconML) - Heterogeneous Treatment Effects Estimation by Microsoft. [causality](https://github.com/akelleh/causality) - Causal analysis using observational datasets. +##### Papers +[Difference between Effect Modification and Interaction](https://www.sciencedirect.com/science/article/pii/S0895435621000330) + #### Probabilistic Modeling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) [PyMC3](https://docs.pymc.io/) - Bayesian modelling, [intro](https://docs.pymc.io/notebooks/getting_started) From 68a179cd0c28df785a53dd87540d70bb882a5273 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 16 Mar 2022 11:13:00 +0100 Subject: [PATCH 309/550] Confounding --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 10e1373..8f2ca0d 100644 --- a/README.md +++ b/README.md @@ -800,7 +800,8 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [causality](https://github.com/akelleh/causality) - Causal analysis using observational datasets. ##### Papers -[Difference between Effect Modification and Interaction](https://www.sciencedirect.com/science/article/pii/S0895435621000330) +[Bours - Confounding](https://edisciplinas.usp.br/pluginfile.php/5625667/mod_resource/content/3/Nontechnicalexplanation-counterfactualdefinition-confounding.pdf) +[Bours - Effect Modification and Interaction](https://www.sciencedirect.com/science/article/pii/S0895435621000330) #### Probabilistic Modeling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) From bf81f7d642afcf3252d9acadd5856978eabdd4a5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 18 Mar 2022 20:13:56 +0100 Subject: [PATCH 310/550] tmap --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8f2ca0d..2f05e7a 100644 --- a/README.md +++ b/README.md @@ -283,6 +283,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [DBS](https://cran.r-project.org/web/packages/DatabionicSwarm/vignettes/DatabionicSwarm.html) - DatabionicSwarm (R package). [contrastive](https://github.com/abidlabs/contrastive) - Contrastive PCA. [scPCA](https://github.com/PhilBoileau/scPCA) - Sparse contrastive PCA (R package). +[tmap](https://github.com/reymond-group/tmap) - Visualization library for large, high-dimensional data sets. #### Training-related [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. From eac5faa801411bec6e0f37653ae3c97e629e3e2c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 19 Mar 2022 19:39:06 +0100 Subject: [PATCH 311/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2f05e7a..bc33388 100644 --- a/README.md +++ b/README.md @@ -790,7 +790,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference -[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [python notebooks](https://github.com/pymc-devs/resources/tree/master/Rethinking_2). +[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects. [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). From 6bd71c3216667d0fc171c3aff9f7a510656457b4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 19 Mar 2022 19:44:07 +0100 Subject: [PATCH 312/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bc33388..48e5f5b 100644 --- a/README.md +++ b/README.md @@ -790,7 +790,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference -[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). +[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects. [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). From 55d38f16ee81f57ee5a21cf588d519e4c1a6e735 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 19 Mar 2022 19:48:23 +0100 Subject: [PATCH 313/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 48e5f5b..96d4bce 100644 --- a/README.md +++ b/README.md @@ -790,7 +790,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference -[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). +[Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https://bookdown.org/content/4857/), [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects. [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). From a2086554d0ca12f6b47e0393d4b156959dd8403f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 19 Mar 2022 23:20:08 +0100 Subject: [PATCH 314/550] atomai --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 96d4bce..fbb41f9 100644 --- a/README.md +++ b/README.md @@ -211,6 +211,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. [stardist](https://github.com/stardist/stardist) - Object Detection with Star-convex Shapes. [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. +[atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. + #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From 6d84f76f376154adac4a620dd446304438561616 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 24 Mar 2022 11:17:20 +0100 Subject: [PATCH 315/550] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index fbb41f9..4867bc2 100644 --- a/README.md +++ b/README.md @@ -442,7 +442,7 @@ See also Microscopy Section above. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. ##### Drug discovery -[TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. +[TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modeling and Prediction Toolkit. ##### Courses @@ -670,7 +670,8 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach * Minimum distance between any two clusters * Distance between centroids * p-separation index: Like minimum distance. Look at the average distance to nearest point in different cluster for p=10% "border" points in any cluster. Measuring density, measuring mountains vs valleys -* Estimate density by weighted count of close points Other measures +* Estimate density by weighted count of close points +Other measures: * Within-cluster average distance * Mean of within-cluster average distance over nearest-cluster average distance (silhouette score) * Within-cluster similarity measure to normal/uniform From a1ccb3720a66413c7d43957883fdeb9095f45127 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 24 Mar 2022 11:18:16 +0100 Subject: [PATCH 316/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4867bc2..70a3e3f 100644 --- a/README.md +++ b/README.md @@ -671,6 +671,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach * Distance between centroids * p-separation index: Like minimum distance. Look at the average distance to nearest point in different cluster for p=10% "border" points in any cluster. Measuring density, measuring mountains vs valleys * Estimate density by weighted count of close points + Other measures: * Within-cluster average distance * Mean of within-cluster average distance over nearest-cluster average distance (silhouette score) From d3edca7b6d69d20a91d1238b62c691e769f83c1e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 25 Mar 2022 08:51:10 +0100 Subject: [PATCH 317/550] airflow, prefect, dagster, ploomber, kestra --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index 70a3e3f..d065954 100644 --- a/README.md +++ b/README.md @@ -940,6 +940,13 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe #### Deployment and Lifecycle Management +##### Workflow Scheduling and Orchestration +[airflow](https://github.com/apache/airflow) - Schedule and monitor workflows. +[prefect](https://github.com/PrefectHQ/prefect) - Python specific workflow scheduling. +[dagster](https://github.com/dagster-io/dagster) - Development, production and observation of data assets. +[ploomber](https://github.com/ploomber/ploomber) - Workflow orchestration. +[kestra](https://github.com/kestra-io/kestra) - Workflow orchestration. + ##### Docker [Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) From 2f731b44ca18db2596d39d8414aa3eba251cfb04 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 25 Mar 2022 21:10:17 +0100 Subject: [PATCH 318/550] orthopy --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index d065954..56724de 100644 --- a/README.md +++ b/README.md @@ -608,6 +608,9 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [tweedie](https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tweedie-regression-objective-reg-tweedie) - Specialized distribution for zero inflated targets, [Talk](https://www.youtube.com/watch?v=-o0lpHBq85I). [MAPIE](https://github.com/scikit-learn-contrib/MAPIE) - Estimating prediction intervals. +#### Polynomials +[orthopy](https://github.com/nschloe/orthopy) - Orthogonal polynomials in all shapes and sizes. + #### Classification [Talk](https://www.youtube.com/watch?v=DkLPYccEJ8Y), [Notebook](https://github.com/ianozsvald/data_science_delivered/blob/master/ml_creating_correct_capable_classifiers.ipynb) [Blog post: Probability Scoring](https://machinelearningmastery.com/how-to-score-probability-predictions-in-python/) From 6ea2a9a9a0f4d0f219625dabc6c1822e4278c358 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 26 Mar 2022 09:56:14 +0100 Subject: [PATCH 319/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 56724de..396122f 100644 --- a/README.md +++ b/README.md @@ -952,6 +952,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe ##### Docker [Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) +[Optimize Docker Image Size](https://www.augmentedmind.de/2022/02/06/optimize-docker-image-size/) ##### Dependency Management [dephell](https://github.com/dephell/dephell) - Dependency management. From b19a8ed55b8f55afc5661cb8db9b9bd8791a9c88 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 28 Mar 2022 18:34:58 +0200 Subject: [PATCH 320/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 396122f..be779d5 100644 --- a/README.md +++ b/README.md @@ -863,7 +863,6 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [eli5](https://github.com/TeamHG-Memex/eli5) - Inspecting machine learning classifiers and explaining their predictions. [lofo-importance](https://github.com/aerdem4/lofo-importance) - Leave One Feature Out Importance, [talk](https://www.youtube.com/watch?v=zqsQ2ojj7sE), examples: [1](https://www.kaggle.com/divrikwicky/pf-f-lofo-importance-on-adversarial-validation), [2](https://www.kaggle.com/divrikwicky/lofo-importance), [3](https://www.kaggle.com/divrikwicky/santanderctp-lofo-feature-importance). [pybreakdown](https://github.com/MI2DataLab/pyBreakDown) - Generate feature contribution plots. -[FairML](https://github.com/adebayoj/fairml) - Model explanation, feature importance. [pycebox](https://github.com/AustinRochford/PyCEbox) - Individual Conditional Expectation Plot Toolbox. [pdpbox](https://github.com/SauceCat/PDPbox) - Partial dependence plot toolbox, [example](https://www.kaggle.com/dansbecker/partial-plots). [partial_dependence](https://github.com/nyuvis/partial_dependence) - Visualize and cluster partial dependence. From aa2b3a5fcaf4b055f7aeeadb52d9ae1237b75880 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 30 Mar 2022 10:20:47 +0200 Subject: [PATCH 321/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index be779d5..0378b75 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ [scikit-learn](https://scikit-learn.org/stable/) - Core ML library. [matplotlib](https://matplotlib.org/) - Plotting library. [seaborn](https://seaborn.pydata.org/) - Data visualization library based on matplotlib. -[pandas_summary](https://github.com/mouradmourafiq/pandas-summary) - Basic statistics using `DataFrameSummary(df).summary()`. +[datatile](https://github.com/polyaxon/datatile) - Basic statistics using `DataFrameSummary(df).summary()`. [pandas_profiling](https://github.com/pandas-profiling/pandas-profiling) - Descriptive statistics using `ProfileReport`. [sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Helpful `DataFrameMapper` class. [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. From f4efa19194419b76e9dac5f4923e948483d501dc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 30 Mar 2022 10:24:43 +0200 Subject: [PATCH 322/550] polyaxon --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0378b75..73bc637 100644 --- a/README.md +++ b/README.md @@ -978,6 +978,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [cortex](https://github.com/cortexlabs/cortex) - Deploy machine learning models. [Neptune](https://neptune.ai) - Experiment tracking and model registry. [clearml](https://github.com/allegroai/clearml) - Experiment Manager, MLOps and Data-Management. +[polyaxon](https://github.com/polyaxon/polyaxon) - MLOps. #### Math and Background [All kinds of math and statistics resources](https://realnotcomplex.com/) From 21bf9f46c160835cfbacac54b8694dccc3cf7948 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 1 Apr 2022 11:09:10 +0200 Subject: [PATCH 323/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 73bc637..92f40db 100644 --- a/README.md +++ b/README.md @@ -745,7 +745,7 @@ Turn time series into images and use Neural Nets: [example](https://gist.github. [TimeSeriesSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) - Sklearn time series split. [tscv](https://github.com/WenjieZ/TSCV) - Evaluation with gap. -#### Financial Data +#### Financial Data and Trading Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html), [2](https://calmcode.io/cvxpy-two/introduction.html) [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/whatsnew.html) - Read stock data. [yfinance](https://github.com/ranaroussi/yfinance) - Read stock data from Yahoo Finance. @@ -760,6 +760,7 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. [quantstats](https://github.com/ranaroussi/quantstats) - Portfolio management. [Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib) - Portfolio optimization and strategic asset allocation. +[OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) - Terminal. ##### Quantopian Stack [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. From 94652e4593843194c5bafc49fec28490d14c58cb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 21 Apr 2022 14:31:31 +0200 Subject: [PATCH 324/550] cog --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 92f40db..36404ab 100644 --- a/README.md +++ b/README.md @@ -950,9 +950,10 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [ploomber](https://github.com/ploomber/ploomber) - Workflow orchestration. [kestra](https://github.com/kestra-io/kestra) - Workflow orchestration. -##### Docker +##### Containerization and Docker [Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) [Optimize Docker Image Size](https://www.augmentedmind.de/2022/02/06/optimize-docker-image-size/) +[cog](https://github.com/replicate/cog) - Facilitates building Docker images. ##### Dependency Management [dephell](https://github.com/dephell/dephell) - Dependency management. From 1b95de59ba1514e05eb661c402527d457950c20e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 22 Apr 2022 09:34:40 +0200 Subject: [PATCH 325/550] DeepDPM --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 36404ab..5d25d71 100644 --- a/README.md +++ b/README.md @@ -656,6 +656,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [HypHC](https://github.com/HazyResearch/HypHC) - Hyperbolic Hierarchical Clustering. [BanditPAM](https://github.com/ThrunGroup/BanditPAM) - Improved k-Medoids Clustering. [dendextend](https://github.com/talgalili/dendextend) - Comparing dendrograms (R package). +[DeepDPM](https://github.com/BGU-CS-VIL/DeepDPM) - Deep Clustering With An Unknown Number of Clusters. ##### Clustering Evalutation [Wagner, Wagner - Comparing Clusterings - An Overview](https://publikationen.bibliothek.kit.edu/1000011477/812079) From 25249cecb7181f8990b0786d74fc9bbd40e94fcd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 27 Apr 2022 15:51:52 +0200 Subject: [PATCH 326/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 5d25d71..2f62b75 100644 --- a/README.md +++ b/README.md @@ -293,6 +293,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). +[Better heatmaps and correlation plots](https://towardsdatascience.com/better-heatmaps-and-correlation-matrix-plots-in-python-41445d0f2bec). [Example notebooks for interactive visualizations](https://github.com/nicolaskruchten/pydata_global_2021/tree/main)(Plotly,Seaborn, Holoviz, Altair) [cufflinks](https://github.com/santosjorge/cufflinks) - Dynamic visualization library, wrapper for [plotly](https://plot.ly/), [medium](https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e), [example](https://github.com/WillKoehrsen/Data-Analysis/blob/master/plotly/Plotly%20Whirlwind%20Introduction.ipynb). [physt](https://github.com/janpipek/physt) - Better histograms, [talk](https://www.youtube.com/watch?v=ZG-wH3-Up9Y), [notebook](https://nbviewer.jupyter.org/github/janpipek/pydata2018-berlin/blob/master/notebooks/talk.ipynb). From a1c4bea7580b0aadbd7269dc573c061ddc344640 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 13 May 2022 09:45:23 +0200 Subject: [PATCH 327/550] Boruta-Shap, VSURF --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 2f62b75..2e233d6 100644 --- a/README.md +++ b/README.md @@ -231,6 +231,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [scikit-rebate](https://github.com/EpistasisLab/scikit-rebate) - Relief-based feature selection algorithms. [scikit-genetic](https://github.com/manuel-calzolari/sklearn-genetic) - Genetic feature selection. [boruta_py](https://github.com/scikit-learn-contrib/boruta_py) - Feature selection, [explaination](https://stats.stackexchange.com/questions/264360/boruta-all-relevant-feature-selection-vs-random-forest-variables-of-importanc/264467), [example](https://www.kaggle.com/tilii7/boruta-feature-elimination). +[Boruta-Shap](https://github.com/Ekeany/Boruta-Shap) - Boruta feature selection algorithm + shapley values. [linselect](https://github.com/efavdb/linselect) - Feature selection package. [mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/) - Exhaustive feature selection. [BoostARoota](https://github.com/chasedehan/BoostARoota) - Xgboost feature selection algorithm. @@ -238,6 +239,8 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [SubTab](https://github.com/AstraZeneca/SubTab) - Subsetting Features of Tabular Data for Self-Supervised Representation Learning, AstraZeneca. [mrmr](https://github.com/smazzanti/mrmr) - Maximum Relevance and Minimum Redundancy Feature Selection, [Website](http://home.penglab.com/proj/mRMR/). [arfs](https://github.com/ThomasBury/arfs) - All Relevant Feature Selection. +[VSURF](https://github.com/robingenuer/VSURF) - Variable Selection Using Random Forests (R package) [doc](https://www.rdocumentation.org/packages/VSURF/versions/1.1.0/topics/VSURF). + #### Subset Selection [apricot](https://github.com/jmschrei/apricot) - Selecting subsets of data sets to train machine learning models quickly. From b571bb4576fcdb195bfd407dfdb512ef7963716c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 23 May 2022 10:12:45 +0200 Subject: [PATCH 328/550] auton-survival --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2e233d6..c7da142 100644 --- a/README.md +++ b/README.md @@ -784,6 +784,7 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests). [pysurvival](https://github.com/square/pysurvival) - Survival analysis. [DeepSurvivalMachines](https://github.com/autonlab/DeepSurvivalMachines) - Fully Parametric Survival Regression. +[auton-survival](https://github.com/autonlab/auton-survival) - Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events. #### Outlier Detection & Anomaly Detection [sklearn](https://scikit-learn.org/stable/modules/outlier_detection.html) - Isolation Forest and others. From 574c9347db62182ddd708cdb8e8d287619bbbcc4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 25 May 2022 13:24:53 +0200 Subject: [PATCH 329/550] visualkeras --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c7da142..09c82d3 100644 --- a/README.md +++ b/README.md @@ -532,6 +532,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Architecture Visualization [Awesome List](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) [netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. +[visualkeras](https://github.com/paulgavrikov/visualkeras) - Visualize Keras networks. ##### Object detection / Instance Segmentation [Good Yolo Explanation](https://jonathan-hui.medium.com/real-time-object-detection-with-yolo-yolov2-28b1b93e2088) From 575dac3daeb828c89fc5cca0c37d704e4b47d9da Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Jun 2022 11:13:12 +0200 Subject: [PATCH 330/550] mplfinance, mercury --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 09c82d3..8c32add 100644 --- a/README.md +++ b/README.md @@ -332,6 +332,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M #### Dashboards [superset](https://github.com/apache/superset) - Dashboarding solution by Apache. [streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). +[mercury](https://github.com/mljar/mercury) - Convert Python notebook to web app, [Example](https://github.com/pplonski/dashboard-python-jupyter-notebook). [dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. [Resources](https://github.com/ucg8j/awesome-dash). [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. [panel](https://panel.pyviz.org/index.html) - Dashboarding solution. @@ -767,6 +768,7 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [quantstats](https://github.com/ranaroussi/quantstats) - Portfolio management. [Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib) - Portfolio optimization and strategic asset allocation. [OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) - Terminal. +[mplfinance](https://github.com/matplotlib/mplfinance) - Financial markets data visualization. ##### Quantopian Stack [pyfolio](https://github.com/quantopian/pyfolio) - Portfolio and risk analytics. @@ -1082,5 +1084,4 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin Do you know a package that should be on this list? Did you spot a package that is no longer maintained and should be removed from this list? Then feel free to read the [contribution guidelines](CONTRIBUTING.md) and submit your pull request or create a new issue. ## License - [![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/) From 9a09dd1d07cb8ad390ac5473d4684f271690c260 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Jun 2022 11:45:30 +0200 Subject: [PATCH 331/550] pinecone --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8c32add..6961e9f 100644 --- a/README.md +++ b/README.md @@ -970,11 +970,12 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [pyup](https://github.com/pyupio/pyup) - Dependency management. [pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. -##### Data Versioning and Pipelines +##### Data Versioning, Databases and Pipelines [dvc](https://github.com/iterative/dvc) - Version control for large files. [hangar](https://github.com/tensorwerk/hangar-py) - Version control for tensor data. [kedro](https://github.com/quantumblacklabs/kedro) - Build data pipelines. [feast](https://github.com/feast-dev/feast) - Feature store. [Video](https://www.youtube.com/watch?v=_omcXenypmo). +[pinecone](https://www.pinecone.io/) - Database for vector search applications. ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From 94a39f4a484dcc57d399f45dd58ed86b04c8b189 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 17 Jul 2022 07:51:53 +0200 Subject: [PATCH 332/550] linearsdr --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6961e9f..51c102a 100644 --- a/README.md +++ b/README.md @@ -289,6 +289,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [contrastive](https://github.com/abidlabs/contrastive) - Contrastive PCA. [scPCA](https://github.com/PhilBoileau/scPCA) - Sparse contrastive PCA (R package). [tmap](https://github.com/reymond-group/tmap) - Visualization library for large, high-dimensional data sets. +[linearsdr](https://github.com/HarrisQ/linearsdr) - Linear Sufficient Dimension Reduction (R package). #### Training-related [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. From b44793bd78b8df97a45b1773073fd7d74e053b44 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 17 Jul 2022 07:54:35 +0200 Subject: [PATCH 333/550] lollipop --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 51c102a..4dbb1a7 100644 --- a/README.md +++ b/README.md @@ -289,6 +289,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [contrastive](https://github.com/abidlabs/contrastive) - Contrastive PCA. [scPCA](https://github.com/PhilBoileau/scPCA) - Sparse contrastive PCA (R package). [tmap](https://github.com/reymond-group/tmap) - Visualization library for large, high-dimensional data sets. +[lollipop](https://github.com/neurodata/lollipop) - Linear Optimal Low Rank Projection (R package). [linearsdr](https://github.com/HarrisQ/linearsdr) - Linear Sufficient Dimension Reduction (R package). #### Training-related From 300c97b9bcc69bfa322d6426015c71ab88bf9bb6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 17 Jul 2022 07:55:01 +0200 Subject: [PATCH 334/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4dbb1a7..1218754 100644 --- a/README.md +++ b/README.md @@ -289,7 +289,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [contrastive](https://github.com/abidlabs/contrastive) - Contrastive PCA. [scPCA](https://github.com/PhilBoileau/scPCA) - Sparse contrastive PCA (R package). [tmap](https://github.com/reymond-group/tmap) - Visualization library for large, high-dimensional data sets. -[lollipop](https://github.com/neurodata/lollipop) - Linear Optimal Low Rank Projection (R package). +[lollipop](https://github.com/neurodata/lollipop) - Linear Optimal Low Rank Projection. [linearsdr](https://github.com/HarrisQ/linearsdr) - Linear Sufficient Dimension Reduction (R package). #### Training-related From 277c0c7a2694e9cb59a1bd6ba0fa035af97f0e1b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 17 Jul 2022 09:36:01 +0200 Subject: [PATCH 335/550] phik correlation --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 1218754..24daac3 100644 --- a/README.md +++ b/README.md @@ -97,6 +97,9 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 #### Classical Statistics +##### Correlation +[phik](https://github.com/kaveio/phik) - Correlation between categorical, ordinal and interval variables. + ##### Statistical Tests and Packages [Modes, Medians and Means: A Unifying Perspective](https://www.johnmyleswhite.com/notebook/2013/03/22/modes-medians-and-means-an-unifying-perspective/) [Using Norms to Understand Linear Regression](https://www.johnmyleswhite.com/notebook/2013/03/22/using-norms-to-understand-linear-regression/) From 2c3c0f2f9520ad399d654bf1d91b53c255369be3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Jul 2022 10:51:53 +0200 Subject: [PATCH 336/550] cross_decomposition --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 24daac3..14506c9 100644 --- a/README.md +++ b/README.md @@ -277,6 +277,7 @@ SimCLR - [link](https://github.com/lightly-ai/lightly) [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. [sklearn.random_projection](https://scikit-learn.org/stable/modules/random_projection.html) - Johnson-Lindenstrauss lemma, Gaussian random projection, Sparse random projection. +[sklearn.cross_decomposition](https://scikit-learn.org/stable/modules/cross_decomposition.html#cross-decomposition) - Partial least squares, supervised estimators for dimensionality reduction and regression. [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE), [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). From 7c418e7e2c26996d62f1d4afb6ad23a0ffde4ef4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Jul 2022 11:30:39 +0200 Subject: [PATCH 337/550] harmonypy --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 14506c9..ff59fc2 100644 --- a/README.md +++ b/README.md @@ -216,6 +216,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. +#### Batch-Effect Correction +(A benchmark of batch-effect correction methods for single-cell RNA sequencing data)[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9] +[harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From dac03b9523920ce2072e9190423a0b036e59a380 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Jul 2022 11:31:09 +0200 Subject: [PATCH 338/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ff59fc2..310f7bd 100644 --- a/README.md +++ b/README.md @@ -217,7 +217,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. #### Batch-Effect Correction -(A benchmark of batch-effect correction methods for single-cell RNA sequencing data)[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9] +[A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9) [harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. #### Feature Engineering Images From 3cd7bce376e833ca7a70fecf4ba3a622c75d4c64 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Jul 2022 11:52:30 +0200 Subject: [PATCH 339/550] R Tutorial on correcting batch effects --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 310f7bd..2a6d596 100644 --- a/README.md +++ b/README.md @@ -218,6 +218,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Batch-Effect Correction [A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9) +[R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html) [harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. #### Feature Engineering Images From 25ba20d5f985e84fc9353dbe34187d228408a0ca Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Jul 2022 15:08:36 +0200 Subject: [PATCH 340/550] besca, liger, nimfa, scgen --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 2a6d596..ad5ed7d 100644 --- a/README.md +++ b/README.md @@ -220,6 +220,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9) [R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html) [harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. +[pyliger](https://github.com/welch-lab/pyliger) - Batch-effect correction, [Example](https://github.com/welch-lab/pyliger/blob/master/pyliger/factorization/_iNMF_ANLS.py#L65), [R package](https://github.com/welch-lab/liger). +[nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. +[scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. @@ -440,6 +443,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Sequencing [cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). +[besca](https://github.com/bedapub/besca) - Beyond single-cell analysis. [janggu](https://github.com/BIMSBbioinfo/janggu) - Deep Learning for Genomics. [gdsctools](https://github.com/CancerRxGene/gdsctools) - Drug responses in the context of the Genomics of Drug Sensitivity in Cancer project, ANOVA, IC50, MoBEM, [doc](https://gdsctools.readthedocs.io/en/master/). From 28f316faeaf6f637401b491eec3dd5f3d2cc71d8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Jul 2022 15:18:03 +0200 Subject: [PATCH 341/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ad5ed7d..e4035aa 100644 --- a/README.md +++ b/README.md @@ -217,8 +217,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. #### Batch-Effect Correction -[A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9) -[R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html) +[Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). +[R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html). [harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. [pyliger](https://github.com/welch-lab/pyliger) - Batch-effect correction, [Example](https://github.com/welch-lab/pyliger/blob/master/pyliger/factorization/_iNMF_ANLS.py#L65), [R package](https://github.com/welch-lab/liger). [nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. From e8bb36d1a48fd6e8ddf085b0ea2c269837d17e36 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 22 Jul 2022 09:34:35 +0200 Subject: [PATCH 342/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e4035aa..58525b3 100644 --- a/README.md +++ b/README.md @@ -823,7 +823,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Causal Inference [Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https://bookdown.org/content/4857/), [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) -[dowhy](https://github.com/Microsoft/dowhy) - Estimate causal effects. +[dowhy](https://github.com/py-why/dowhy) - Estimate causal effects. [CausalImpact](https://github.com/tcassou/causal_impact) - Causal Impact Analysis ([R package](https://google.github.io/CausalImpact/CausalImpact.html)). [causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). [causalml](https://github.com/uber/causalml) - Causal inference by Uber. From 991f7274b4561afc359418344f8b7d57b1e0c560 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 1 Aug 2022 18:11:58 +0200 Subject: [PATCH 343/550] truss --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 58525b3..20c2788 100644 --- a/README.md +++ b/README.md @@ -984,12 +984,13 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [pyup](https://github.com/pyupio/pyup) - Dependency management. [pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. -##### Data Versioning, Databases and Pipelines +##### Data Versioning, Databases, Pipelines and Model Serving [dvc](https://github.com/iterative/dvc) - Version control for large files. [hangar](https://github.com/tensorwerk/hangar-py) - Version control for tensor data. [kedro](https://github.com/quantumblacklabs/kedro) - Build data pipelines. [feast](https://github.com/feast-dev/feast) - Feature store. [Video](https://www.youtube.com/watch?v=_omcXenypmo). [pinecone](https://www.pinecone.io/) - Database for vector search applications. +[truss](https://github.com/basetenlabs/truss) - Serve ML models. ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From 507da1aba8eb716a6ccba77a2934443b5c340bc4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 4 Aug 2022 15:14:23 +0200 Subject: [PATCH 344/550] horovod --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 20c2788..b195888 100644 --- a/README.md +++ b/README.md @@ -543,6 +543,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Distributed Libs [flexflow](https://github.com/flexflow/FlexFlow) - Distributed TensorFlow Keras and PyTorch. +[horovod](https://github.com/horovod/horovod) - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. ##### Architecture Visualization [Awesome List](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) From eb74e2e298c10ba0cf27faf2cba60ccd055ce3b7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 5 Aug 2022 08:57:00 +0200 Subject: [PATCH 345/550] proplot --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b195888..d0d83c8 100644 --- a/README.md +++ b/README.md @@ -337,6 +337,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [mpl-scatter-density](https://github.com/astrofrog/mpl-scatter-density) - Scatter density plots. Alternative to 2d-histograms. [ComplexHeatmap](https://github.com/jokergoo/ComplexHeatmap) - Complex heatmaps for multidimensional genomic data (R package). [largeVis](https://github.com/elbamos/largeVis) - Visualize embeddings (t-SNE etc.) (R package). +[proplot](https://github.com/proplot-dev/proplot) - Matplotlib wrapper. #### Colors [palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). From 2459f3987a941b5b3d789673485f4d4aed1a2744 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 5 Aug 2022 08:57:42 +0200 Subject: [PATCH 346/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d0d83c8..c0f880a 100644 --- a/README.md +++ b/README.md @@ -320,7 +320,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [scikit-plot](https://github.com/reiinakano/scikit-plot) - ROC curves and other visualizations for ML models. [yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Visualizations for ML models (similar to scikit-plot). [bokeh](https://bokeh.pydata.org/en/latest/) - Interactive visualization library, [Examples](https://bokeh.pydata.org/en/latest/docs/user_guide/server.html), [Examples](https://github.com/WillKoehrsen/Bokeh-Python-Visualization). -[lets-plot](https://github.com/JetBrains/lets-plot/blob/master/README_PYTHON.md) - Plotting library. +[lets-plot](https://github.com/JetBrains/lets-plot) - Plotting library. [animatplot](https://github.com/t-makaro/animatplot) - Animate plots build on matplotlib. [plotnine](https://github.com/has2k1/plotnine) - ggplot for Python. [altair](https://altair-viz.github.io/) - Declarative statistical visualization library. From 24746cc21b723f5e88d146647534f5fdf54ca8fc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 Aug 2022 21:13:23 +0200 Subject: [PATCH 347/550] pca --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c0f880a..7af98e1 100644 --- a/README.md +++ b/README.md @@ -283,6 +283,7 @@ SimCLR - [link](https://github.com/lightly-ai/lightly) ##### Packages [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. +Additional plots for PCA - Factor Loadings, Cumulative Variance Explained, [Correlation Circle Plot](http://rasbt.github.io/mlxtend/user_guide/plotting/plot_pca_correlation_graph/), [Tweet](https://twitter.com/rasbt/status/1555999903398219777/photo/1) [sklearn.random_projection](https://scikit-learn.org/stable/modules/random_projection.html) - Johnson-Lindenstrauss lemma, Gaussian random projection, Sparse random projection. [sklearn.cross_decomposition](https://scikit-learn.org/stable/modules/cross_decomposition.html#cross-decomposition) - Partial least squares, supervised estimators for dimensionality reduction and regression. [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). From 16eb0716c9226f2371cbdfbbace8175f38ea12b3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 9 Aug 2022 13:21:00 +0200 Subject: [PATCH 348/550] captum --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7af98e1..3e05e11 100644 --- a/README.md +++ b/README.md @@ -905,6 +905,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [interpretml](https://github.com/interpretml/interpret) - Fit interpretable models, explain models. [shapash](https://github.com/MAIF/shapash) - Model interpretability. [imodels](https://github.com/csinva/imodels) - Interpretable ML package. +[captum](https://github.com/pytorch/captum) - Model interpretability and understanding for PyTorch. #### Automated Machine Learning [AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on tensorflow. From b82ff4e546b175fe5d09870c5e9d84fff01a5b08 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 Aug 2022 18:49:44 +0200 Subject: [PATCH 349/550] sematic --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3e05e11..c120d98 100644 --- a/README.md +++ b/README.md @@ -1010,6 +1010,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [Neptune](https://neptune.ai) - Experiment tracking and model registry. [clearml](https://github.com/allegroai/clearml) - Experiment Manager, MLOps and Data-Management. [polyaxon](https://github.com/polyaxon/polyaxon) - MLOps. +[sematic](https://github.com/sematic-ai/sematic) - Deploy machine learning models. #### Math and Background [All kinds of math and statistics resources](https://realnotcomplex.com/) From 9f57a018703d1d66914887f5beef2dc1da6c7ef3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 Aug 2022 21:21:17 +0200 Subject: [PATCH 350/550] milvus --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c120d98..f5d82ed 100644 --- a/README.md +++ b/README.md @@ -995,6 +995,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [feast](https://github.com/feast-dev/feast) - Feature store. [Video](https://www.youtube.com/watch?v=_omcXenypmo). [pinecone](https://www.pinecone.io/) - Database for vector search applications. [truss](https://github.com/basetenlabs/truss) - Serve ML models. +[milvus](https://github.com/milvus-io/milvus) - Vector database for similarity search. ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From cc8450daf682780d4023cf29b11510126c42ec66 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 Aug 2022 23:25:48 +0200 Subject: [PATCH 351/550] Biostatistics --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f5d82ed..b371a81 100644 --- a/README.md +++ b/README.md @@ -207,7 +207,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. [skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. -[BaSiC](https://github.com/marrlab/BaSiC) - Background and Shading Correction of Optical Microscopy Images. +[BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). [ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection, [Project page](https://csbdeep.bioimagecomputing.com/tools/). [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Paper](https://www.nature.com/articles/s41592-021-01308-y). @@ -442,6 +442,11 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www #### Biology / Bioinformatics +##### Biostatistics / Robust statistics +[MinCovDet](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.MinCovDet.html) - Robust estimator of covariance, Robust Morphological Perturbation Value, [Paper](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421), [Application1](https://journals.sagepub.com/doi/10.1177/1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [Application2](https://www.cell.com/cell-reports/pdf/S2211-1247(21)00694-X.pdf). +[winsorize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers. +[moderated z-score](https://clue.io/connectopedia/replicate_collapse) - Weighted average of z-scores based on Spearman correlation. + ##### Sequencing [cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). From 968880100b791df29d5e8c61c4e8978528e24c79 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 Aug 2022 23:30:27 +0200 Subject: [PATCH 352/550] nextflow --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b371a81..1b7d9d0 100644 --- a/README.md +++ b/README.md @@ -84,7 +84,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. #### Distributed Systems -[nextflow](https://github.com/nextflow-io/nextflow) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch and others. +[nextflow](https://github.com/goodwright/nextflow.py) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch, [Website](https://github.com/nextflow-io/nextflow). [dsub](https://github.com/DataBiosphere/dsub) - Run batch computing tasks in Docker image in the Google Cloud. #### Command line tools, CSV From 5b8c2c4ca73a3368845d614b1648b8ecd86d676f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 Aug 2022 23:34:24 +0200 Subject: [PATCH 353/550] evotorch --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1b7d9d0..c4e3c8d 100644 --- a/README.md +++ b/README.md @@ -941,6 +941,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [blackbox](https://github.com/paulknysh/blackbox) - Optimization of expensive black-box functions. Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645-7). [DeepSwarm](https://github.com/Pattio/DeepSwarm) - Neural architecture search. +[evotorch](https://github.com/nnaisense/evotorch) - Evolutionary computation library built on Pytorch. #### Hyperparameter Tuning [sklearn](https://scikit-learn.org/stable/index.html) - [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html), [RandomizedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html). From 003a031e55e494cb5d661d1f63920966b4990b2f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 Aug 2022 23:51:26 +0200 Subject: [PATCH 354/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c4e3c8d..a6a3ffd 100644 --- a/README.md +++ b/README.md @@ -443,7 +443,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www #### Biology / Bioinformatics ##### Biostatistics / Robust statistics -[MinCovDet](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.MinCovDet.html) - Robust estimator of covariance, Robust Morphological Perturbation Value, [Paper](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421), [Application1](https://journals.sagepub.com/doi/10.1177/1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [Application2](https://www.cell.com/cell-reports/pdf/S2211-1247(21)00694-X.pdf). +[MinCovDet](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.MinCovDet.html) - Robust estimator of covariance, RMPV, [Paper](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421), [App1](https://journals.sagepub.com/doi/10.1177/1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [App2](https://www.cell.com/cell-reports/pdf/S2211-1247(21)00694-X.pdf). [winsorize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers. [moderated z-score](https://clue.io/connectopedia/replicate_collapse) - Weighted average of z-scores based on Spearman correlation. From fb5ab981524bc9988c4ccbf9a48e25792c1a0cb0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 Aug 2022 19:07:03 +0200 Subject: [PATCH 355/550] CytoImageNet --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index a6a3ffd..3d3fdd7 100644 --- a/README.md +++ b/README.md @@ -204,6 +204,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). +[CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. [skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. From a0891b38fd3cb304246cbc8d8d38c12f27b5f6c8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 Aug 2022 20:11:45 +0200 Subject: [PATCH 356/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3d3fdd7..5e5669c 100644 --- a/README.md +++ b/README.md @@ -224,6 +224,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [pyliger](https://github.com/welch-lab/pyliger) - Batch-effect correction, [Example](https://github.com/welch-lab/pyliger/blob/master/pyliger/factorization/_iNMF_ANLS.py#L65), [R package](https://github.com/welch-lab/liger). [nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. [scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). +[CORAL](https://github.com/google-research/google-research/tree/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https://github.com/google-research/google-research/blob/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn/transform.py#L152), [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050548/). #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From 6c29a0c385b67e388cde2139414f8456f9f52bd5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 Aug 2022 20:12:43 +0200 Subject: [PATCH 357/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 5e5669c..aae7eb8 100644 --- a/README.md +++ b/README.md @@ -480,7 +480,6 @@ See also Microscopy Section above. [Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) [cv2](https://github.com/skvark/opencv-python) - OpenCV, classical algorithms: [Gaussian Filter](https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html), [Morphological Transformations](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html). [scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. -[CORAL](https://github.com/VisionLearningGroup/CORAL) - Correlation Alignment for Domain Adaptation. #### Neural Networks [Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. From 936692b2bff8f584d6b56a467511cf984d56851b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 22 Aug 2022 10:47:50 +0200 Subject: [PATCH 358/550] Single cell tutorial --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index aae7eb8..101b551 100644 --- a/README.md +++ b/README.md @@ -450,6 +450,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [moderated z-score](https://clue.io/connectopedia/replicate_collapse) - Weighted average of z-scores based on Spearman correlation. ##### Sequencing +[Single cell tutorial](https://github.com/theislab/single-cell-tutorial). [cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). [besca](https://github.com/bedapub/besca) - Beyond single-cell analysis. From 090c81cc30049fa351a26b577e95816a3d54d617 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mert=20Bozk=C4=B1r?= Date: Mon, 29 Aug 2022 13:58:02 +0300 Subject: [PATCH 359/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 101b551..783de74 100644 --- a/README.md +++ b/README.md @@ -984,6 +984,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [dagster](https://github.com/dagster-io/dagster) - Development, production and observation of data assets. [ploomber](https://github.com/ploomber/ploomber) - Workflow orchestration. [kestra](https://github.com/kestra-io/kestra) - Workflow orchestration. +[cml](https://github.com/iterative/cml) - CI/CD for Machine Learning Projects ##### Containerization and Docker [Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) @@ -1003,7 +1004,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [feast](https://github.com/feast-dev/feast) - Feature store. [Video](https://www.youtube.com/watch?v=_omcXenypmo). [pinecone](https://www.pinecone.io/) - Database for vector search applications. [truss](https://github.com/basetenlabs/truss) - Serve ML models. -[milvus](https://github.com/milvus-io/milvus) - Vector database for similarity search. +[milvus](https://github.com/milvus-io/milvus) - Vector database for similarity search. ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From defab90d046d116e7d16c6a36fef55e9fc1d8816 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mert=20Bozk=C4=B1r?= Date: Mon, 29 Aug 2022 13:59:27 +0300 Subject: [PATCH 360/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 101b551..50dc861 100644 --- a/README.md +++ b/README.md @@ -1004,6 +1004,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [pinecone](https://www.pinecone.io/) - Database for vector search applications. [truss](https://github.com/basetenlabs/truss) - Serve ML models. [milvus](https://github.com/milvus-io/milvus) - Vector database for similarity search. +[mlem](https://github.com/iterative/mlem) - Version and deploy your ML models following GitOps principles ##### Data Science Related [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. From 5a02137f1fe037fd4ee64ed09fc7f00fc02bec06 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 30 Aug 2022 19:29:39 +0200 Subject: [PATCH 361/550] Regressio --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 49c4642..4ee8631 100644 --- a/README.md +++ b/README.md @@ -637,6 +637,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [GLRM](https://github.com/madeleineudell/LowRankModels.jl) - Generalized Low Rank Models. [tweedie](https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tweedie-regression-objective-reg-tweedie) - Specialized distribution for zero inflated targets, [Talk](https://www.youtube.com/watch?v=-o0lpHBq85I). [MAPIE](https://github.com/scikit-learn-contrib/MAPIE) - Estimating prediction intervals. +[Regressio](https://github.com/brendanartley/Regressio) - Regression and Spline models. #### Polynomials [orthopy](https://github.com/nschloe/orthopy) - Orthogonal polynomials in all shapes and sizes. From 102cb8763e305bf28cb4053047edaeef55bb4ad1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 31 Aug 2022 14:50:34 +0200 Subject: [PATCH 362/550] EasyCV --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4ee8631..8b75e12 100644 --- a/README.md +++ b/README.md @@ -570,6 +570,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [FCOS](https://github.com/tianzhi0549/FCOS) - Fully Convolutional One-Stage Object Detection. [norfair](https://github.com/tryolabs/norfair) - Real-time 2D object tracking. [Detic](https://github.com/facebookresearch/Detic) - Detector with image classes that can use image-level labels (facebookresearch). +[EasyCV](https://github.com/alibaba/EasyCV) - Image segmentation, classification, metric-learning, object detection, pose estimation. ##### Image Annotation [cvat](https://github.com/openvinotoolkit/cvat) - Image annotation tool. From 6d75f7f284619d32cab588bba50c4c051c0066b9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 1 Sep 2022 19:34:27 +0200 Subject: [PATCH 363/550] DL for tabular data, concept drift section --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index 8b75e12..9561393 100644 --- a/README.md +++ b/README.md @@ -605,6 +605,10 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po ##### Transformers [SegFormer](https://github.com/NVlabs/SegFormer) - Simple and Efficient Design for Semantic Segmentation with Transformers. [esvit](https://github.com/microsoft/esvit) - Efficient self-supervised Vision Transformers. +[nystromformer](https://github.com/Rishit-dagli/Nystromformer) - More efficient transformer because of approximate self-attention. + +##### Deep learning on structured data +[Great overview for deep learning for tabular data](https://sebastianraschka.com/blog/2022/deep-learning-for-tabular-data.html) ##### Graph-Based Neural Networks [How to do Deep Learning on Graphs with Graph Convolutional Networks](https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) @@ -826,6 +830,13 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [telemanom](https://github.com/khundman/telemanom) - Detect anomalies in multivariate time series data using LSTMs. [luminaire](https://github.com/zillow/luminaire) - Anomaly Detection for time series. +#### Concept Drift & Domain Shift +[TorchDrift](https://github.com/TorchDrift/TorchDrift) - Drift Detection for PyTorch Models. +[alibi-detect](https://github.com/SeldonIO/alibi-detect) - Algorithms for outlier, adversarial and drift detection. +[evidently](https://github.com/evidentlyai/evidently) - Evaluate and monitor ML models from validation to production. +[Lipton et al. - Detecting and Correcting for Label Shift with Black Box Predictors](https://arxiv.org/abs/1802.03916). +[Bu et al. - A pdf-Free Change Detection Test Based on Density Difference Estimation](https://ieeexplore.ieee.org/document/7745962). + #### Ranking [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear classification, regression and ranking. From d29894653eb24595b2567dbb49608c45e9a85776 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 2 Sep 2022 17:46:08 +0200 Subject: [PATCH 364/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9561393..a7ce0aa 100644 --- a/README.md +++ b/README.md @@ -844,6 +844,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference +[CS 594 Causal Inference and Learning](https://www.cs.uic.edu/~elena/courses/fall19/cs594cil.html) [Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https://bookdown.org/content/4857/), [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [dowhy](https://github.com/py-why/dowhy) - Estimate causal effects. From a1aa4189ce58161bea9fd57701e2676cb8fabaa5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 5 Sep 2022 14:41:52 +0200 Subject: [PATCH 365/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a7ce0aa..028b026 100644 --- a/README.md +++ b/README.md @@ -844,7 +844,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. #### Causal Inference -[CS 594 Causal Inference and Learning](https://www.cs.uic.edu/~elena/courses/fall19/cs594cil.html) +[CS 594 Causal Inference and Learning](https://www.cs.uic.edu/~elena/courses/fall19/cs594cil.html) [Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https://bookdown.org/content/4857/), [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [dowhy](https://github.com/py-why/dowhy) - Estimate causal effects. From e425c06c80d8356c80d7df986e12c2c7cd7ce168 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 6 Sep 2022 17:05:07 +0200 Subject: [PATCH 366/550] G-Test --- README.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 028b026..d63b7f4 100644 --- a/README.md +++ b/README.md @@ -100,11 +100,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 ##### Correlation [phik](https://github.com/kaveio/phik) - Correlation between categorical, ordinal and interval variables. -##### Statistical Tests and Packages -[Modes, Medians and Means: A Unifying Perspective](https://www.johnmyleswhite.com/notebook/2013/03/22/modes-medians-and-means-an-unifying-perspective/) -[Using Norms to Understand Linear Regression](https://www.johnmyleswhite.com/notebook/2013/03/22/using-norms-to-understand-linear-regression/) -[Verifying the Assumptions of Linear Models](https://github.com/erykml/medium_articles/blob/master/Statistics/linear_regression_assumptions.ipynb) -[Mediation and Moderation Intro](https://ademos.people.uic.edu/Chapter14.html) +##### Packages [statsmodels](https://www.statsmodels.org/stable/index.html) - Statistical tests. [linearmodels](https://github.com/bashtage/linearmodels) - Instrumental variable and panel data models. [pingouin](https://github.com/raphaelvallat/pingouin) - Statistical tests. [Pairwise correlation between columns of pandas DataFrame](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html) @@ -113,6 +109,9 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandaltman.html), [2](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). +##### Statistical Tests +[G-Test](https://en.wikipedia.org/wiki/G-test) - Alternative to chi-square test, [power_divergence](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.power_divergence.html). + ##### Comparing Two Populations [torch-two-sample](https://github.com/josipd/torch-two-sample) - Friedman-Rafsky Test: Compare two population based on a multivariate generalization of the Runstest. [Explanation](https://www.real-statistics.com/multivariate-statistics/multivariate-normal-distribution/friedman-rafsky-test/), [Application](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5014134/) @@ -138,6 +137,10 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Dealing with Selection Bias By Propensity Based Feature Selection](https://www.youtube.com/watch?reload=9&v=3ZWCKr0vDtc) ##### Texts +[Modes, Medians and Means: A Unifying Perspective](https://www.johnmyleswhite.com/notebook/2013/03/22/modes-medians-and-means-an-unifying-perspective/) +[Using Norms to Understand Linear Regression](https://www.johnmyleswhite.com/notebook/2013/03/22/using-norms-to-understand-linear-regression/) +[Verifying the Assumptions of Linear Models](https://github.com/erykml/medium_articles/blob/master/Statistics/linear_regression_assumptions.ipynb) +[Mediation and Moderation Intro](https://ademos.people.uic.edu/Chapter14.html) [Montgomery et al. - How conditioning on post-treatment variables can ruin your experiment and what to do about it](https://cpb-us-e1.wpmucdn.com/sites.dartmouth.edu/dist/5/2293/files/2021/03/post-treatment-bias.pdf) [Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) From 83ecd909416a03448dc082ebf4466c17339a7265 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 6 Sep 2022 22:29:40 +0200 Subject: [PATCH 367/550] lightning-hpo --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d63b7f4..e5fe4b6 100644 --- a/README.md +++ b/README.md @@ -975,6 +975,7 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 [dragonfly](https://github.com/dragonfly/dragonfly) - Scalable Bayesian optimisation. [botorch](https://github.com/pytorch/botorch) - Bayesian optimization in PyTorch. [ax](https://github.com/facebook/Ax) - Adaptive Experimentation Platform by Facebook. +[lightning-hpo](https://github.com/Lightning-AI/lightning-hpo) - Hyperparameter optimization based on optuna. #### Incremental Learning, Online Learning sklearn - [PassiveAggressiveClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html). From b7250305ced7358cd4cb052bcd8d130502e8c8f7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 7 Sep 2022 09:59:24 +0200 Subject: [PATCH 368/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e5fe4b6..75d47c0 100644 --- a/README.md +++ b/README.md @@ -110,6 +110,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). ##### Statistical Tests +[test_proportions_2indep](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.test_proportions_2indep.html) - Proportion test. [G-Test](https://en.wikipedia.org/wiki/G-test) - Alternative to chi-square test, [power_divergence](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.power_divergence.html). ##### Comparing Two Populations From 733154a7cabe8f604e2c0c409b674acd28875ba0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 7 Sep 2022 14:49:00 +0200 Subject: [PATCH 369/550] zenml --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 75d47c0..4d6b457 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@ # Awesome Data Science with Python > A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks. + #### Core [pandas](https://pandas.pydata.org/) - Data structures built on top of [numpy](https://www.numpy.org/). [scikit-learn](https://scikit-learn.org/stable/) - Core ML library. @@ -1003,7 +1004,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [dagster](https://github.com/dagster-io/dagster) - Development, production and observation of data assets. [ploomber](https://github.com/ploomber/ploomber) - Workflow orchestration. [kestra](https://github.com/kestra-io/kestra) - Workflow orchestration. -[cml](https://github.com/iterative/cml) - CI/CD for Machine Learning Projects +[cml](https://github.com/iterative/cml) - CI/CD for Machine Learning Projects. ##### Containerization and Docker [Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) @@ -1041,6 +1042,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [clearml](https://github.com/allegroai/clearml) - Experiment Manager, MLOps and Data-Management. [polyaxon](https://github.com/polyaxon/polyaxon) - MLOps. [sematic](https://github.com/sematic-ai/sematic) - Deploy machine learning models. +[zenml](https://github.com/zenml-io/zenml) - MLOPs. #### Math and Background [All kinds of math and statistics resources](https://realnotcomplex.com/) From fdfb2514d8f6605f7e52b81992fb51c30b9bac4e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 7 Sep 2022 22:18:36 +0200 Subject: [PATCH 370/550] tipr --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4d6b457..6a48946 100644 --- a/README.md +++ b/README.md @@ -159,6 +159,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [EpiEstim](https://github.com/mrc-ide/EpiEstim) - Estimate time varying instantaneous reproduction number R during epidemics (R package) [paper](https://academic.oup.com/aje/article/178/9/1505/89262). [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [zEpid](https://github.com/pzivich/zEpid) - Epidemiology analysis package, [Tutorial](https://github.com/pzivich/Python-for-Epidemiologists). +[tipr](https://github.com/LucyMcGowan/tipr) - Sensitivity analyses for unmeasured confounders (R package). #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). From 30ee24df50c9ca95f9cb23390435cd4990b3e07b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 8 Sep 2022 17:19:18 +0200 Subject: [PATCH 371/550] pandera --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6a48946..55100e2 100644 --- a/README.md +++ b/README.md @@ -165,6 +165,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Checklist](https://github.com/r0f1/ml_checklist). [pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. [janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. +[pandera](https://github.com/unionai-oss/pandera) - Data / Schema validation. [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Resampling for imbalanced datasets. From 83bc8dadc09cc48a206a5196b23761c6aad1a721 Mon Sep 17 00:00:00 2001 From: Theo Walker Date: Thu, 8 Sep 2022 13:48:34 -0400 Subject: [PATCH 372/550] add ducks whitespace --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 55100e2..590789d 100644 --- a/README.md +++ b/README.md @@ -263,6 +263,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection #### Subset Selection [apricot](https://github.com/jmschrei/apricot) - Selecting subsets of data sets to train machine learning models quickly. +[ducks](https://github.com/manimino/ducks) - Index data for fast lookup by any combination of fields. #### Dimensionality Reduction / Representation Learning From 481f48520cc3220cacf23a614b23c617e8d2cd47 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 9 Sep 2022 16:21:51 +0200 Subject: [PATCH 373/550] benchmark_VAE --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 590789d..43df51e 100644 --- a/README.md +++ b/README.md @@ -600,6 +600,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [Variational Autoencoder Explanation Video](https://www.youtube.com/watch?v=9zKuYvjFFS8) [disentanglement_lib](https://github.com/google-research/disentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE. [ladder-vae-pytorch](https://github.com/addtt/ladder-vae-pytorch) - Ladder Variational Autoencoders (LVAE). +[benchmark_VAE](https://github.com/clementchadebec/benchmark_VAE) - Unifying Generative Autoencoder implementations. ##### Generative Adversarial Networks (GANs) [Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) From 2a843b77e6dc5cf48d973cd11b94d32f970c8faf Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 10 Sep 2022 00:29:42 +0200 Subject: [PATCH 374/550] Intro to Computer Vision --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 43df51e..e67b137 100644 --- a/README.md +++ b/README.md @@ -198,6 +198,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. [NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. +#### Computer Vision +[Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) + #### Image Cleanup [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [napari](https://github.com/napari/napari) - Multi-dimensional image viewer. From 4adef6dc82f56a88e8f4230585b40991e2ddd2c2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 15 Sep 2022 10:28:17 +0200 Subject: [PATCH 375/550] allencell --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e67b137..3011631 100644 --- a/README.md +++ b/README.md @@ -226,6 +226,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [stardist](https://github.com/stardist/stardist) - Object Detection with Star-convex Shapes. [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. +[allencell](https://www.allencell.org/segmenter.html) - Tools for the 3D segmentation of intracellular structures. #### Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). From e27dcb5a595dc2bb7b93f4eec8c89de92b8c4e22 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 20 Sep 2022 15:29:00 +0200 Subject: [PATCH 376/550] duckdb --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3011631..ecb191f 100644 --- a/README.md +++ b/README.md @@ -49,6 +49,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [lux](https://github.com/lux-org/lux) - Dataframe visualization within Jupyter. [dtale](https://github.com/man-group/dtale) - View and analyze Pandas data structures, integrating with Jupyter. [polars](https://github.com/pola-rs/polars) - Multi-threaded alternative to pandas. +[duckdb](https://github.com/duckdb/duckdb) - Efficiently run SQL queries on pandas DataFrame. #### Scikit-Learn Alternatives [scikit-learn-intelex](https://github.com/intel/scikit-learn-intelex) - Intel extension for scikit-learn for speed. From 7f40a4f09c3e7521ce970cef65dc1b2fe6d70887 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 22 Sep 2022 13:44:14 +0200 Subject: [PATCH 377/550] FeatureSelectionGA, adapt --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ecb191f..ce2fcb4 100644 --- a/README.md +++ b/README.md @@ -229,7 +229,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [allencell](https://www.allencell.org/segmenter.html) - Tools for the 3D segmentation of intracellular structures. -#### Batch-Effect Correction +#### Domain Adaptation / Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). [R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html). [harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. @@ -237,6 +237,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. [scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). [CORAL](https://github.com/google-research/google-research/tree/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https://github.com/google-research/google-research/blob/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn/transform.py#L152), [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050548/). +[adapt](https://github.com/adapt-python/adapt) - Aweseome Domain Adaptation Python Toolbox. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. @@ -264,7 +265,7 @@ Tutorials - [1](https://www.kaggle.com/residentmario/automated-feature-selection [mrmr](https://github.com/smazzanti/mrmr) - Maximum Relevance and Minimum Redundancy Feature Selection, [Website](http://home.penglab.com/proj/mRMR/). [arfs](https://github.com/ThomasBury/arfs) - All Relevant Feature Selection. [VSURF](https://github.com/robingenuer/VSURF) - Variable Selection Using Random Forests (R package) [doc](https://www.rdocumentation.org/packages/VSURF/versions/1.1.0/topics/VSURF). - +[FeatureSelectionGA](https://github.com/kaushalshetty/FeatureSelectionGA) - Feature Selection using Genetic Algorithm. #### Subset Selection [apricot](https://github.com/jmschrei/apricot) - Selecting subsets of data sets to train machine learning models quickly. From ac9d9500237e1d50c42de5bf868aa5219d3b526d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 22 Sep 2022 13:46:30 +0200 Subject: [PATCH 378/550] awesome-conformal-prediction --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ce2fcb4..e85c829 100644 --- a/README.md +++ b/README.md @@ -911,6 +911,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [pyroc](https://github.com/noudald/pyroc) - Receiver Operating Characteristic (ROC) curves. #### Model Uncertainty +[awesome-conformal-prediction](https://github.com/valeman/awesome-conformal-prediction) - Uncertainty quantification. [uncertainty-toolbox](https://github.com/uncertainty-toolbox/uncertainty-toolbox) - Predictive uncertainty quantification, calibration, metrics, and visualization. #### Interpretable Classifiers and Regressors From 2ee6d3d1d96eeb0b6a865023bed7205e5a5d8d63 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 23 Sep 2022 10:40:11 +0200 Subject: [PATCH 379/550] tensorstore --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e85c829..b1cc7a6 100644 --- a/README.md +++ b/README.md @@ -84,6 +84,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. [zarr](https://github.com/zarr-developers/zarr-python) - Distributed numpy arrays. [NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. +[tensorstore](https://github.com/google/tensorstore) - Reading and writing large multi-dimensional arrays (Google). #### Distributed Systems [nextflow](https://github.com/goodwright/nextflow.py) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch, [Website](https://github.com/nextflow-io/nextflow). From 042bea3ab205ac65a7864cb0db970d748d2423e5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 23 Sep 2022 15:07:52 +0200 Subject: [PATCH 380/550] Hierarchical UMAP. --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b1cc7a6..24f85e9 100644 --- a/README.md +++ b/README.md @@ -306,6 +306,7 @@ Additional plots for PCA - Factor Loadings, Cumulative Variance Explained, [Corr [prince](https://github.com/MaxHalford/prince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD). Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [MulticoreTSNE](https://github.com/DmitryUlyanov/Multicore-TSNE), [FIt-SNE](https://github.com/KlugerLab/FIt-SNE) [umap](https://github.com/lmcinnes/umap) - Uniform Manifold Approximation and Projection, [talk](https://www.youtube.com/watch?v=nq6iPZVUxZU), [explorer](https://github.com/GrantCuster/umap-explorer), [explanation](https://pair-code.github.io/understanding-umap/), [parallel version](https://docs.rapids.ai/api/cuml/stable/api.html). +[humap](https://github.com/wilsonjr/humap) - Hierarchical UMAP. [sleepwalk](https://github.com/anders-biostat/sleepwalk/) - Explore embeddings, interactive visualization (R package). [somoclu](https://github.com/peterwittek/somoclu) - Self-organizing map. [scikit-tda](https://github.com/scikit-tda/scikit-tda) - Topological Data Analysis, [paper](https://www.nature.com/articles/srep01236), [talk](https://www.youtube.com/watch?v=F2t_ytTLrQ4), [talk](https://www.youtube.com/watch?v=AWoeBzJd7uQ), [paper](https://www.uncg.edu/mat/faculty/cdsmyth/topological-approaches-skin.pdf). From d99626294f1b777f8af4f1e41c0f5891b22f9fc8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 25 Sep 2022 12:20:35 +0200 Subject: [PATCH 381/550] rocketry --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 24f85e9..ffd0d24 100644 --- a/README.md +++ b/README.md @@ -1018,6 +1018,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [ploomber](https://github.com/ploomber/ploomber) - Workflow orchestration. [kestra](https://github.com/kestra-io/kestra) - Workflow orchestration. [cml](https://github.com/iterative/cml) - CI/CD for Machine Learning Projects. +[rocketry](https://github.com/Miksus/rocketry) - Task scheduling. ##### Containerization and Docker [Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) From fc9a329e72ba1f42bddaee46ce547d3e745c5a72 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 26 Sep 2022 13:05:22 +0200 Subject: [PATCH 382/550] Reproducibility Crisis --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ffd0d24..68e724e 100644 --- a/README.md +++ b/README.md @@ -921,6 +921,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [sklearn-expertsys](https://github.com/tmadl/sklearn-expertsys) - Interpretable classifiers, Bayesian Rule List classifier. #### Model Explanation, Interpretability, Feature Importance +[Princeton - Reproducibility Crisis in ML‑based Science](https://sites.google.com/princeton.edu/rep-workshop) [Book](https://christophm.github.io/interpretable-ml-book/agnostic.html), [Examples](https://github.com/jphall663/interpretable_machine_learning_with_python) [shap](https://github.com/slundberg/shap) - Explain predictions of machine learning models, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Good Shap intro](https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/). [treeinterpreter](https://github.com/andosa/treeinterpreter) - Interpreting scikit-learn's decision tree and random forest predictions. From 76938b2eb83202a7b8beece92ade4d25aa5e66dc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 26 Sep 2022 13:27:51 +0200 Subject: [PATCH 383/550] Double ML, Visualization overview --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 68e724e..794bcd5 100644 --- a/README.md +++ b/README.md @@ -126,6 +126,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [confseq](https://github.com/gostevehoward/confseq) - Uniform boundaries, confidence sequences, and always-valid p-values. ##### Visualizations +[Great Overview over Visualizations](https://textvis.lnu.se/) [Dependent Propabilities](https://static.laszlokorte.de/stochastic/) [Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https://rpsychologist.com/d3/NHST/) [Correlation](https://rpsychologist.com/d3/correlation/) @@ -871,6 +872,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [upliftml](https://github.com/bookingcom/upliftml) - Causal inference by Booking.com. [EconML](https://github.com/microsoft/EconML) - Heterogeneous Treatment Effects Estimation by Microsoft. [causality](https://github.com/akelleh/causality) - Causal analysis using observational datasets. +[DoubleML](https://github.com/DoubleML/doubleml-for-py) - Machine Learning + Causal inference, [Tweet](https://twitter.com/ChristophMolnar/status/1574338002305880068), [Presentation](https://scholar.princeton.edu/sites/default/files/bstewart/files/felton.chern_.slides.20190318.pdf), [Paper](https://arxiv.org/abs/1608.00060v1). ##### Papers [Bours - Confounding](https://edisciplinas.usp.br/pluginfile.php/5625667/mod_resource/content/3/Nontechnicalexplanation-counterfactualdefinition-confounding.pdf) From 9c209042f9cef462316ee78bec835a7a89e3382c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 26 Sep 2022 17:20:57 +0200 Subject: [PATCH 384/550] PCA --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 794bcd5..360853a 100644 --- a/README.md +++ b/README.md @@ -299,6 +299,7 @@ SimCLR - [link](https://github.com/lightly-ai/lightly) [MCML](https://github.com/pachterlab/MCML) - Semi-supervised dimensionality reduction of Multi-Class, Multi-Label data (sequencing data) [paper](https://www.biorxiv.org/content/10.1101/2021.08.25.457696v1). ##### Packages +[Dangers of PCA (paper)](https://www.nature.com/articles/s41598-022-14395-4). [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. Additional plots for PCA - Factor Loadings, Cumulative Variance Explained, [Correlation Circle Plot](http://rasbt.github.io/mlxtend/user_guide/plotting/plot_pca_correlation_graph/), [Tweet](https://twitter.com/rasbt/status/1555999903398219777/photo/1) From 5688aa20b7bc15a770a4794fd7278739ab3a8d37 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 26 Oct 2022 16:48:56 +0200 Subject: [PATCH 385/550] torchinfo --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 360853a..8e8691f 100644 --- a/README.md +++ b/README.md @@ -567,6 +567,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [MONAI](https://github.com/project-monai/monai) - Deep learning in healthcare imaging. [kornia](https://github.com/kornia/kornia) - Image transformations, epipolar geometry, depth estimation. +[torchinfo](https://github.com/TylerYep/torchinfo) - Nice model summary. ##### Distributed Libs [flexflow](https://github.com/flexflow/FlexFlow) - Distributed TensorFlow Keras and PyTorch. From a85e0869c8de3f435f8b2eb09b5daf99abc6a629 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 26 Oct 2022 23:20:30 +0200 Subject: [PATCH 386/550] pytorch-adapt --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 8e8691f..66e0738 100644 --- a/README.md +++ b/README.md @@ -240,6 +240,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). [CORAL](https://github.com/google-research/google-research/tree/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https://github.com/google-research/google-research/blob/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn/transform.py#L152), [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050548/). [adapt](https://github.com/adapt-python/adapt) - Aweseome Domain Adaptation Python Toolbox. +[pytorch-adapt](https://github.com/KevinMusgrave/pytorch-adapt) - Various neural network models for domain adaptation. #### Feature Engineering Images [skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. From 46621da3dc4c74eeed44b1679e066a8a62986edb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 16 Nov 2022 11:21:58 +0100 Subject: [PATCH 387/550] jump-cellpainting --- README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 66e0738..dfc9017 100644 --- a/README.md +++ b/README.md @@ -213,12 +213,18 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. #### Microscopy / Segmentation + +##### Datasets +[jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. +[MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. +[CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. +[cellpose dataset](https://www.cellpose.org/dataset) - Cell images. + +#### Packages [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). -[CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. -[MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. [skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). From 2bc3fe8ffb651268b9eae41e0c40ac14d8789439 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 16 Nov 2022 11:41:10 +0100 Subject: [PATCH 388/550] Haghighi --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index dfc9017..e2046dd 100644 --- a/README.md +++ b/README.md @@ -219,6 +219,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. [CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. [cellpose dataset](https://www.cellpose.org/dataset) - Cell images. +[Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. #### Packages [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) From 2dd51f27d6c265dd9df3cc9f768281ef8540b44f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 16 Nov 2022 13:03:42 +0100 Subject: [PATCH 389/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e2046dd..8735636 100644 --- a/README.md +++ b/README.md @@ -220,6 +220,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. [cellpose dataset](https://www.cellpose.org/dataset) - Cell images. [Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. +[broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. #### Packages [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) From ee171e879f828d6a290368255262db1b257f87d4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 16 Nov 2022 16:01:02 +0100 Subject: [PATCH 390/550] PHATE, morpheus --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 8735636..2c702f1 100644 --- a/README.md +++ b/README.md @@ -332,6 +332,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [tmap](https://github.com/reymond-group/tmap) - Visualization library for large, high-dimensional data sets. [lollipop](https://github.com/neurodata/lollipop) - Linear Optimal Low Rank Projection. [linearsdr](https://github.com/HarrisQ/linearsdr) - Linear Sufficient Dimension Reduction (R package). +[PHATE](https://github.com/KrishnaswamyLab/PHATE) - Tool for visualizing high dimensional data. #### Training-related [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. @@ -368,6 +369,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [ComplexHeatmap](https://github.com/jokergoo/ComplexHeatmap) - Complex heatmaps for multidimensional genomic data (R package). [largeVis](https://github.com/elbamos/largeVis) - Visualize embeddings (t-SNE etc.) (R package). [proplot](https://github.com/proplot-dev/proplot) - Matplotlib wrapper. +[morpheus](https://software.broadinstitute.org/morpheus/) - Broad Institute tool matrix visualization and analysis software. #### Colors [palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). From f4e3d53da40d7aeb83b4859cd3d5c27789982528 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 17 Nov 2022 15:56:32 +0100 Subject: [PATCH 391/550] Visual fourier explanation --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2c702f1..ab29169 100644 --- a/README.md +++ b/README.md @@ -758,6 +758,7 @@ Other measures: #### Signal Processing and Filtering [Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). +[Visual fourier explanation](https://dsego.github.io/demystifying-fourier/). [The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html). [Kalman Filter article](https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures). [Kalman Filter book](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Baysian and various Kalman filters. From 96a5b8b6c635fee7453e63c157f1ce85afe797d5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 17 Nov 2022 16:23:19 +0100 Subject: [PATCH 392/550] OpenMMLab --- README.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index ab29169..01b2395 100644 --- a/README.md +++ b/README.md @@ -514,8 +514,8 @@ See also Microscopy Section above. #### Neural Networks [Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. [ConvNet Shape Calculator](https://madebyollin.github.io/convnet-calculator/) - Calculate output dimensions of Conv2D layer. -[Great Gradient Descent Article](https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9) -[Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html) +[Great Gradient Descent Article](https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9). +[Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html). ##### Tutorials & Viewer fast.ai course - [Lessons 1-7](https://course.fast.ai/videos/?lesson=1), [Lessons 8-14](http://course18.fast.ai/lessons/lessons2.html) @@ -545,6 +545,11 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [textgenrnn](https://github.com/minimaxir/textgenrnn) - Ready-to-use LSTM for text generation. [ctrl](https://github.com/salesforce/ctrl) - Text generation. +##### Neural network and deep learning frameworks +[OpenMMLab](https://github.com/open-mmlab) - Framework for segmentation, classification and lots of other computer vision tasks. +[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). +[mxnet](https://github.com/apache/incubator-mxnet) - Deep learning framework, [book](https://d2l.ai/index.html). + ##### Libs General [keras](https://keras.io/) - Neural Networks on top of [tensorflow](https://www.tensorflow.org/), [examples](https://gist.github.com/candlewill/552fa102352ccce42fd829ae26277d24). [keras-contrib](https://github.com/keras-team/keras-contrib) - Keras community contributions. @@ -566,8 +571,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [ffcv](https://github.com/libffcv/ffcv) - Fast dataloder. ##### Libs Pytorch -[Good Pytorch Introduction](https://cs230.stanford.edu/blog/pytorch/) - +[Good Pytorch Introduction](https://cs230.stanford.edu/blog/pytorch/) [skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). [fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. [timm](https://github.com/rwightman/pytorch-image-models) - Pytorch image models. @@ -585,7 +589,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [horovod](https://github.com/horovod/horovod) - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. ##### Architecture Visualization -[Awesome List](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) +[Awesome List](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network). [netron](https://github.com/lutzroeder/netron) - Viewer for neural networks. [visualkeras](https://github.com/paulgavrikov/visualkeras) - Visualize Keras networks. @@ -652,10 +656,6 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [dgl](https://github.com/dmlc/dgl) - Deep Graph Library. [graph_nets](https://github.com/deepmind/graph_nets) - Build graph networks in Tensorflow, by deepmind. -##### Other neural network and deep learning frameworks -[caffe](https://github.com/BVLC/caffe) - Deep learning framework, [pretrained models](https://github.com/BVLC/caffe/wiki/Model-Zoo). -[mxnet](https://github.com/apache/incubator-mxnet) - Deep learning framework, [book](https://d2l.ai/index.html). - #### Model conversion [hummingbird](https://github.com/microsoft/hummingbird) - Compile trained ML models into tensor computations (by Microsoft). From 8f4cfd7bce672e4cb83eb9af9f72811542799b3b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 21 Nov 2022 16:44:56 +0100 Subject: [PATCH 393/550] lovely-tensors --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 01b2395..2be0c9d 100644 --- a/README.md +++ b/README.md @@ -583,6 +583,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [MONAI](https://github.com/project-monai/monai) - Deep learning in healthcare imaging. [kornia](https://github.com/kornia/kornia) - Image transformations, epipolar geometry, depth estimation. [torchinfo](https://github.com/TylerYep/torchinfo) - Nice model summary. +[lovely-tensors](https://github.com/xl0/lovely-tensors/) - Inspect tensors, mean, std, inf values. ##### Distributed Libs [flexflow](https://github.com/flexflow/FlexFlow) - Distributed TensorFlow Keras and PyTorch. From c9610bfc54850cdbede06aea8b33240b6d2fab00 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 22 Nov 2022 17:45:01 +0100 Subject: [PATCH 394/550] Named Colors Wheel --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2be0c9d..c9651ad 100644 --- a/README.md +++ b/README.md @@ -374,6 +374,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M #### Colors [palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). [colorcet](https://github.com/holoviz/colorcet) - Collection of perceptually uniform colormaps. +[Named Colors Wheel](https://arantius.github.io/web-color-wheel/) - Color wheel for all named HTML colors. #### Dashboards [superset](https://github.com/apache/superset) - Dashboarding solution by Apache. From b1221edf85c7cac4a6d6b1e12c694d1fd75c843a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 25 Nov 2022 10:19:29 +0100 Subject: [PATCH 395/550] morpheus update --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c9651ad..a4b9b20 100644 --- a/README.md +++ b/README.md @@ -369,7 +369,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [ComplexHeatmap](https://github.com/jokergoo/ComplexHeatmap) - Complex heatmaps for multidimensional genomic data (R package). [largeVis](https://github.com/elbamos/largeVis) - Visualize embeddings (t-SNE etc.) (R package). [proplot](https://github.com/proplot-dev/proplot) - Matplotlib wrapper. -[morpheus](https://software.broadinstitute.org/morpheus/) - Broad Institute tool matrix visualization and analysis software. +[morpheus](https://software.broadinstitute.org/morpheus/) - Broad Institute tool matrix visualization and analysis software. [Source](https://github.com/cmap/morpheus.js), Tutorial: [1](https://www.youtube.com/watch?v=0nkYDeekhtQ), [2](https://www.youtube.com/watch?v=r9mN6MsxUb0), [Code](https://github.com/broadinstitute/BBBC021_Morpheus_Exercise). #### Colors [palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). From 9a61ef2ddc81f71caeb7ef8a3b600b5022a2a56d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 28 Nov 2022 17:54:00 +0100 Subject: [PATCH 396/550] pyscaffold --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a4b9b20..b3f4d8d 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ [General Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) Fixing environment: [link](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) -[cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) - Project template for data science projects. +[pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. [nteract](https://nteract.io/) - Open Jupyter Notebooks with doubleclick. [papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). [nbdime](https://github.com/jupyter/nbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https://www.reviewnb.com/). From 9cae94bf4a7a7996f15c6477ae2a990bc9b6fd8d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 23 Dec 2022 10:47:01 +0100 Subject: [PATCH 397/550] Bio-image notebooks, py-clesperanto, ashlar --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index b3f4d8d..e1ffe62 100644 --- a/README.md +++ b/README.md @@ -214,6 +214,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Microscopy / Segmentation +##### Tutorials +[Bio-image Analysis Notebooks] - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. + ##### Datasets [jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. @@ -238,6 +241,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [allencell](https://www.allencell.org/segmenter.html) - Tools for the 3D segmentation of intracellular structures. +[py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microsopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb and lots of other tutorials, interacts with napari. +[ashlar](https://github.com/labsyspharm/ashlar) - Image stitching and registration. #### Domain Adaptation / Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). From 505ece1e03823c606ada1190a969ecb56e6ffa80 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 23 Dec 2022 10:47:55 +0100 Subject: [PATCH 398/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e1ffe62..42546b3 100644 --- a/README.md +++ b/README.md @@ -215,7 +215,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Microscopy / Segmentation ##### Tutorials -[Bio-image Analysis Notebooks] - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. +[Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. ##### Datasets [jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. @@ -241,7 +241,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [allencell](https://www.allencell.org/segmenter.html) - Tools for the 3D segmentation of intracellular structures. -[py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microsopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb and lots of other tutorials, interacts with napari. +[py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microsopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. [ashlar](https://github.com/labsyspharm/ashlar) - Image stitching and registration. #### Domain Adaptation / Batch-Effect Correction From 175c4b56f20d55cc4411e0f33904c720f01c61a5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 10 Jan 2023 18:08:27 +0100 Subject: [PATCH 399/550] Bleedthrough --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 42546b3..fb1f46c 100644 --- a/README.md +++ b/README.md @@ -243,6 +243,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [allencell](https://www.allencell.org/segmenter.html) - Tools for the 3D segmentation of intracellular structures. [py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microsopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. [ashlar](https://github.com/labsyspharm/ashlar) - Image stitching and registration. +[cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. +Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). #### Domain Adaptation / Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). From 879783c4e55e63d3c1ffff584db330a559a4d8b7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 10 Jan 2023 18:40:25 +0100 Subject: [PATCH 400/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index fb1f46c..24fe58a 100644 --- a/README.md +++ b/README.md @@ -245,6 +245,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [ashlar](https://github.com/labsyspharm/ashlar) - Image stitching and registration. [cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). +Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). #### Domain Adaptation / Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). From 20cbffbee5e02cc8c81ce2a4f473dd7ab278345d Mon Sep 17 00:00:00 2001 From: Darigov Research <30328618+darigovresearch@users.noreply.github.com> Date: Sun, 15 Jan 2023 23:27:34 +0000 Subject: [PATCH 401/550] refactor: Small copyediting changes See diff for details --- README.md | 94 +++++++++++++++++++++++++++---------------------------- 1 file changed, 47 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index 24fe58a..ff5ea0f 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [nbdime](https://github.com/jupyter/nbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https://www.reviewnb.com/). [RISE](https://github.com/damianavila/RISE) - Turn Jupyter notebooks into presentations. [qgrid](https://github.com/quantopian/qgrid) - Pandas `DataFrame` sorting. -[pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for jupyter notebooks. +[pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for Jupyter notebooks. [itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. [jupyter-datatables](https://github.com/CermakM/jupyter-datatables) - Interactive tables in Jupyter. [debugger](https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559) - Visual debugger for Jupyter. @@ -42,11 +42,11 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. [pandarallel](https://github.com/nalepae/pandarallel) - Parallelize pandas operations. [xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. -[swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas dataframe faster. +[swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas DataFrame faster. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. [pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. -[lux](https://github.com/lux-org/lux) - Dataframe visualization within Jupyter. +[lux](https://github.com/lux-org/lux) - DataFrame visualization within Jupyter. [dtale](https://github.com/man-group/dtale) - View and analyze Pandas data structures, integrating with Jupyter. [polars](https://github.com/pola-rs/polars) - Multi-threaded alternative to pandas. [duckdb](https://github.com/duckdb/duckdb) - Efficiently run SQL queries on pandas DataFrame. @@ -82,8 +82,8 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [bolz](https://github.com/Blosc/bcolz) - A columnar data container that can be compressed. [cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. -[zarr](https://github.com/zarr-developers/zarr-python) - Distributed numpy arrays. -[NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. +[zarr](https://github.com/zarr-developers/zarr-python) - Distributed NumPy arrays. +[NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by Nvidia. [tensorstore](https://github.com/google/tensorstore) - Reading and writing large multi-dimensional arrays (Google). #### Distributed Systems @@ -95,7 +95,7 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [xsv](https://github.com/BurntSushi/xsv) - Command line tool for indexing, slicing, analyzing, splitting and joining CSV files. [csvkit](https://csvkit.readthedocs.io/en/1.0.3/) - Another command line tool for CSV files. [csvsort](https://pypi.org/project/csvsort/) - Sort large csv files. -[tsv-utils](https://github.com/eBay/tsv-utils) - Tools for working with CSV files by ebay. +[tsv-utils](https://github.com/eBay/tsv-utils) - Tools for working with CSV files by eBay. [cheat](https://github.com/cheat/cheat) - Make cheatsheets for command line commands. #### Classical Statistics @@ -120,7 +120,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [torch-two-sample](https://github.com/josipd/torch-two-sample) - Friedman-Rafsky Test: Compare two population based on a multivariate generalization of the Runstest. [Explanation](https://www.real-statistics.com/multivariate-statistics/multivariate-normal-distribution/friedman-rafsky-test/), [Application](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5014134/) ##### Interim Analyses / Sequential Analysis / Stopping -[Squential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. + [Sequential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. [Treatment Effects Monitoring](https://online.stat.psu.edu/stat509/node/75/) - Design and Analysis of Clinical Trials PennState. [sequential](https://cran.r-project.org/web/packages/Sequential/Sequential.pdf) - Exact Sequential Analysis for Poisson and Binomial Data (R package). [confseq](https://github.com/gostevehoward/confseq) - Uniform boundaries, confidence sequences, and always-valid p-values. @@ -179,7 +179,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. #### Noisy Labels -[cleanlab](https://github.com/cleanlab/cleanlab) - Machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Also see awesome list below. +[cleanlab](https://github.com/cleanlab/cleanlab) - Machine learning with noisy labels, finding mislabelled data, and uncertainty quantification. Also see awesome list below. [doubtlab](https://github.com/koaning/doubtlab) - Find bad or noisy labels. #### Train / Test Split @@ -199,7 +199,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. [feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. -[NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by nvidia. +[NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by Nvidia. #### Computer Vision [Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) @@ -241,7 +241,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [allencell](https://www.allencell.org/segmenter.html) - Tools for the 3D segmentation of intracellular structures. -[py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microsopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. +[py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. [ashlar](https://github.com/labsyspharm/ashlar) - Image stitching and registration. [cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). @@ -255,7 +255,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. [scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). [CORAL](https://github.com/google-research/google-research/tree/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https://github.com/google-research/google-research/blob/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn/transform.py#L152), [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050548/). -[adapt](https://github.com/adapt-python/adapt) - Aweseome Domain Adaptation Python Toolbox. +[adapt](https://github.com/adapt-python/adapt) - Awesome Domain Adaptation Python Toolbox. [pytorch-adapt](https://github.com/KevinMusgrave/pytorch-adapt) - Various neural network models for domain adaptation. #### Feature Engineering Images @@ -389,7 +389,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). [mercury](https://github.com/mljar/mercury) - Convert Python notebook to web app, [Example](https://github.com/pplonski/dashboard-python-jupyter-notebook). [dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. [Resources](https://github.com/ucg8j/awesome-dash). -[visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by facebook. +[visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by Facebook. [panel](https://panel.pyviz.org/index.html) - Dashboarding solution. [altair example](https://github.com/xhochy/altair-vue-vega-example) - [Video](https://www.youtube.com/watch?v=4L568emKOvs). [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. @@ -510,7 +510,7 @@ See also Microscopy Section above. ##### Drug discovery [TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. -[DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modeling and Prediction Toolkit. +[DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modelling and Prediction Toolkit. ##### Courses [mit6874](https://mit6874.github.io/) - Computational Systems Biology: Deep Learning in the Life Sciences. @@ -565,9 +565,9 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [keras-tuner](https://github.com/keras-team/keras-tuner) - Hyperparameter tuning for Keras. [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt: Convenient hyperparameter optimization wrapper. [elephas](https://github.com/maxpumperla/elephas) - Distributed Deep learning with Keras & Spark. -[tflearn](https://github.com/tflearn/tflearn) - Neural Networks on top of tensorflow. -[tensorlayer](https://github.com/tensorlayer/tensorlayer) - Neural Networks on top of tensorflow, [tricks](https://github.com/wagamamaz/tensorlayer-tricks). -[tensorforce](https://github.com/reinforceio/tensorforce) - Tensorflow for applied reinforcement learning. +[tflearn](https://github.com/tflearn/tflearn) - Neural Networks on top of TensorFlow. +[tensorlayer](https://github.com/tensorlayer/tensorlayer) - Neural Networks on top of TensorFlow, [tricks](https://github.com/wagamamaz/tensorlayer-tricks). +[tensorforce](https://github.com/reinforceio/tensorforce) - TensorFlow for applied reinforcement learning. [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. [PlotNeuralNet](https://github.com/HarisIqbal88/PlotNeuralNet) - Plot neural networks. [lucid](https://github.com/tensorflow/lucid) - Neural network interpretability, [Activation Maps](https://openai.com/blog/introducing-activation-atlases/). @@ -577,16 +577,16 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Training metrics. [imgclsmob](https://github.com/osmr/imgclsmob) - Pretrained models. [netron](https://github.com/lutzroeder/netron) - Visualizer for deep learning and machine learning models. -[ffcv](https://github.com/libffcv/ffcv) - Fast dataloder. - -##### Libs Pytorch -[Good Pytorch Introduction](https://cs230.stanford.edu/blog/pytorch/) -[skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps pytorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). -[fastai](https://github.com/fastai/fastai) - Neural Networks in pytorch. -[timm](https://github.com/rwightman/pytorch-image-models) - Pytorch image models. -[ignite](https://github.com/pytorch/ignite) - Highlevel library for pytorch. +[ffcv](https://github.com/libffcv/ffcv) - Fast dataloader. + +##### Libs PyTorch +[Good PyTorch Introduction](https://cs230.stanford.edu/blog/pytorch/) +[skorch](https://github.com/dnouri/skorch) - Scikit-learn compatible neural network library that wraps PyTorch, [talk](https://www.youtube.com/watch?v=0J7FaLk0bmQ), [slides](https://github.com/thomasjpfan/skorch_talk). +[fastai](https://github.com/fastai/fastai) - Neural Networks in PyTorch. +[timm](https://github.com/rwightman/pytorch-image-models) - PyTorch image models. +[ignite](https://github.com/pytorch/ignite) - Highlevel library for PyTorch. [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. -[pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) - Collection of optimizers for pytorch. +[pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) - Collection of optimizers for PyTorch. [pytorch-lightning](https://github.com/PyTorchLightning/PyTorch-lightning) - Wrapper around PyTorch. [lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [MONAI](https://github.com/project-monai/monai) - Deep learning in healthcare imaging. @@ -623,7 +623,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Image Classification [nfnets](https://github.com/ypeleg/nfnets-keras) - Neural network. [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Neural network. -[pycls](https://github.com/facebookresearch/pycls) - Pytorch image classification networks: ResNet, ResNeXt, EfficientNet, and RegNet (by Facebook). +[pycls](https://github.com/facebookresearch/pycls) - PyTorch image classification networks: ResNet, ResNeXt, EfficientNet, and RegNet (by Facebook). ##### Applications and Snippets [SPADE](https://github.com/nvlabs/spade) - Semantic Image Synthesis. @@ -642,10 +642,10 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [Awesome GAN Applications](https://github.com/nashory/gans-awesome-applications) [The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo) - List of Generative Adversarial Networks. [CycleGAN and Pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks. -[Tensorflow GAN implementations](https://github.com/hwalsuklee/tensorflow-generative-model-collections) -[Pytorch GAN implementations](https://github.com/znxlwm/pytorch-generative-model-collections) -[Pytorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder) -[StudioGAN](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) - Pytorch GAN implementations. +[TensorFlow GAN implementations](https://github.com/hwalsuklee/tensorflow-generative-model-collections) +[PyTorch GAN implementations](https://github.com/znxlwm/pytorch-generative-model-collections) +[PyTorch GAN implementations](https://github.com/eriklindernoren/PyTorch-GAN#adversarial-autoencoder) +[StudioGAN](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) - PyTorch GAN implementations. ##### Transformers [SegFormer](https://github.com/NVlabs/SegFormer) - Simple and Efficient Design for Semantic Segmentation with Transformers. @@ -664,7 +664,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [cugraph](https://github.com/rapidsai/cugraph) - RAPIDS, Graph library on the GPU. [pytorch-geometric](https://github.com/rusty1s/pytorch_geometric) - Various methods for deep learning on graphs. [dgl](https://github.com/dmlc/dgl) - Deep Graph Library. -[graph_nets](https://github.com/deepmind/graph_nets) - Build graph networks in Tensorflow, by deepmind. +[graph_nets](https://github.com/deepmind/graph_nets) - Build graph networks in TensorFlow, by DeepMind. #### Model conversion [hummingbird](https://github.com/microsoft/hummingbird) - Compile trained ML models into tensor computations (by Microsoft). @@ -699,10 +699,10 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [Contrastive Representation Learning](https://lilianweng.github.io/lil-log/2021/05/31/contrastive-representation-learning.html) [metric-learn](https://github.com/scikit-learn-contrib/metric-learn) - Supervised and weakly-supervised metric learning algorithms. -[pytorch-metric-learning](https://github.com/KevinMusgrave/pytorch-metric-learning) - Pytorch metric learning. +[pytorch-metric-learning](https://github.com/KevinMusgrave/pytorch-metric-learning) - PyTorch metric learning. [deep_metric_learning](https://github.com/ronekko/deep_metric_learning) - Methods for deep metric learning. [ivis](https://bering-ivis.readthedocs.io/en/latest/supervised.html) - Metric learning using siamese neural networks. -[tensorflow similarity](https://github.com/tensorflow/similarity) - Metric learning. +[TensorFlow similarity](https://github.com/tensorflow/similarity) - Metric learning. #### Distance Functions [scipy.spatial](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) - All kinds of distance metrics. @@ -768,10 +768,10 @@ Other measures: #### Signal Processing and Filtering [Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). -[Visual fourier explanation](https://dsego.github.io/demystifying-fourier/). +[Visual Fourier explanation](https://dsego.github.io/demystifying-fourier/). [The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html). [Kalman Filter article](https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures). -[Kalman Filter book](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Baysian and various Kalman filters. +[Kalman Filter book](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Bayesian and various Kalman filters. [Interactive Tool](https://fiiir.com/) for FIR and IIR filters, [Examples](https://plot.ly/python/fft-filters/). [filterpy](https://github.com/rlabbe/filterpy) - Kalman filtering and optimal estimation library. @@ -782,7 +782,7 @@ Other measures: [statsmodels](https://www.statsmodels.org/dev/tsa.html) - Time series analysis, [seasonal decompose](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) [example](https://gist.github.com/balzer82/5cec6ad7adc1b550e7ee), [SARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html), [granger causality](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.grangercausalitytests.html). [kats](https://github.com/facebookresearch/kats) - Time series prediction library by Facebook. [prophet](https://github.com/facebook/prophet) - Time series prediction library by Facebook. -[neural_prophet](https://github.com/ourownstory/neural_prophet) - Time series prediction built on Pytorch. +[neural_prophet](https://github.com/ourownstory/neural_prophet) - Time series prediction built on PyTorch. [pyramid](https://github.com/tgsmith61591/pyramid), [pmdarima](https://github.com/tgsmith61591/pmdarima) - Wrapper for (Auto-) ARIMA. [modeltime](https://cran.r-project.org/web/packages/modeltime/index.html) - Time series forecasting framework (R package). [pyflux](https://github.com/RJT1990/pyflux) - Time series prediction algorithms (ARIMA, GARCH, GAS, Bayesian). @@ -803,7 +803,7 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor [pastas](https://pastas.readthedocs.io/en/latest/examples.html) - Simulation of time series. [fastdtw](https://github.com/slaypni/fastdtw) - Dynamic Time Warp Distance. [fable](https://www.rdocumentation.org/packages/fable/versions/0.0.0.9000) - Time Series Forecasting (R package). -[pydlm](https://github.com/wwrechard/pydlm) - Bayesian time series modeling ([R package](https://cran.r-project.org/web/packages/bsts/index.html), [Blog post](http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html)) +[pydlm](https://github.com/wwrechard/pydlm) - Bayesian time series modelling ([R package](https://cran.r-project.org/web/packages/bsts/index.html), [Blog post](http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html)) [PyAF](https://github.com/antoinecarme/pyaf) - Automatic Time Series Forecasting. [luminol](https://github.com/linkedin/luminol) - Anomaly Detection and Correlation library from Linkedin. [matrixprofile-ts](https://github.com/target/matrixprofile-ts) - Detecting patterns and anomalies, [website](https://www.cs.ucr.edu/~eamonn/MatrixProfile.html), [ppt](https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1.pdf), [alternative](https://github.com/matrix-profile-foundation/mass-ts). @@ -836,7 +836,7 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [bt](https://github.com/pmorissette/bt) - Backtesting algorithms. [alpaca-trade-api-python](https://github.com/alpacahq/alpaca-trade-api-python) - Commission-free trading through API. [eiten](https://github.com/tradytics/eiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies. -[tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in tensorflow, by Google. +[tf-quant-finance](https://github.com/google/tf-quant-finance) - Quantitative finance tools in TensorFlow, by Google. [quantstats](https://github.com/ranaroussi/quantstats) - Portfolio management. [Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib) - Portfolio optimization and strategic asset allocation. [OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) - Terminal. @@ -902,7 +902,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [Bours - Confounding](https://edisciplinas.usp.br/pluginfile.php/5625667/mod_resource/content/3/Nontechnicalexplanation-counterfactualdefinition-confounding.pdf) [Bours - Effect Modification and Interaction](https://www.sciencedirect.com/science/article/pii/S0895435621000330) -#### Probabilistic Modeling and Bayes +#### Probabilistic Modelling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) [PyMC3](https://docs.pymc.io/) - Bayesian modelling, [intro](https://docs.pymc.io/notebooks/getting_started) [numpyro](https://github.com/pyro-ppl/numpyro) - Probabilistic programming with numpy, built on [pyro](https://github.com/pyro-ppl/pyro). @@ -910,9 +910,9 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [pmlearn](https://github.com/pymc-learn/pymc-learn) - Probabilistic machine learning. [arviz](https://github.com/arviz-devs/arviz) - Exploratory analysis of Bayesian models. [zhusuan](https://github.com/thu-ml/zhusuan) - Bayesian deep learning, generative models. -[edward](https://github.com/blei-lab/edward) - Probabilistic modeling, inference, and criticism, [Mixture Density Networks (MNDs)](http://edwardlib.org/tutorials/mixture-density-network), [MDN Explanation](https://towardsdatascience.com/a-hitchhikers-guide-to-mixture-density-networks-76b435826cca). +[edward](https://github.com/blei-lab/edward) - Probabilistic modelling, inference, and criticism, [Mixture Density Networks (MNDs)](http://edwardlib.org/tutorials/mixture-density-network), [MDN Explanation](https://towardsdatascience.com/a-hitchhikers-guide-to-mixture-density-networks-76b435826cca). [Pyro](https://github.com/pyro-ppl/pyro) - Deep Universal Probabilistic Programming. -[tensorflow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk1](https://www.youtube.com/watch?v=KJxmC5GCWe4), [notebook talk1](https://github.com/AlxndrMlk/PyDataGlobal2021/blob/main/00_PyData_Global_2021_nb_full.ipynb), [talk2](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). +[TensorFlow probability](https://github.com/tensorflow/probability) - Deep learning and probabilistic modelling, [talk1](https://www.youtube.com/watch?v=KJxmC5GCWe4), [notebook talk1](https://github.com/AlxndrMlk/PyDataGlobal2021/blob/main/00_PyData_Global_2021_nb_full.ipynb), [talk2](https://www.youtube.com/watch?v=BrwKURU-wpk), [example](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_TFP.ipynb). [bambi](https://github.com/bambinos/bambi) - High-level Bayesian model-building interface on top of PyMC3. [neural-tangents](https://github.com/google/neural-tangents) - Infinite Neural Networks. [bnlearn](https://github.com/erdogant/bnlearn) - Bayesian networks, parameter learning, inference and sampling methods. @@ -920,8 +920,8 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Gaussian Processes [Visualization](http://www.infinitecuriosity.org/vizgp/), [Article](https://distill.pub/2019/visual-exploration-gaussian-processes/) [GPyOpt](https://github.com/SheffieldML/GPyOpt) - Gaussian process optimization. -[GPflow](https://github.com/GPflow/GPflow) - Gaussian processes (Tensorflow). -[gpytorch](https://gpytorch.ai/) - Gaussian processes (Pytorch). +[GPflow](https://github.com/GPflow/GPflow) - Gaussian processes (TensorFlow). +[gpytorch](https://gpytorch.ai/) - Gaussian processes (PyTorch). #### Stacking Models and Ensembles [Model Stacking Blog Post](http://blog.kaggle.com/2017/06/15/stacking-made-easy-an-introduction-to-stacknet-by-competitions-grandmaster-marios-michailidis-kazanova/) @@ -974,7 +974,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [captum](https://github.com/pytorch/captum) - Model interpretability and understanding for PyTorch. #### Automated Machine Learning -[AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on tensorflow. +[AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on TensorFlow. [tpot](https://github.com/EpistasisLab/tpot) - Automated machine learning tool, optimizes machine learning pipelines. [auto_ml](https://github.com/ClimbsRocks/auto_ml) - Automated machine learning for analytics & production. [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. @@ -986,11 +986,11 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin #### Graph Representation Learning [Karate Club](https://github.com/benedekrozemberczki/karateclub) - Unsupervised learning on graphs. -[Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric) - Graph representation learning with PyTorch. +[PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric) - Graph representation learning with PyTorch. [DLG](https://github.com/dmlc/dgl) - Graph representation learning with TensorFlow. #### Convex optimization -[cvxpy](https://github.com/cvxgrp/cvxpy) - Modeling language for convex optimization problems. Tutorial: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html), [2](https://calmcode.io/cvxpy-two/introduction.html) +[cvxpy](https://github.com/cvxgrp/cvxpy) - Modelling language for convex optimization problems. Tutorial: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html), [2](https://calmcode.io/cvxpy-two/introduction.html) #### Evolutionary Algorithms & Optimization [deap](https://github.com/DEAP/deap) - Evolutionary computation framework (Genetic Algorithm, Evolution strategies). @@ -1166,7 +1166,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Visual Transformer](https://github.com/dk-liang/Awesome-Visual-Transformer) #### Lectures -[NYU Deep Learning SP21](https://www.youtube.com/playlist?list=PLLHTzKZzVU9e6xUfG10TkTWApKSZCzuBI) - Youtube Playlist. +[NYU Deep Learning SP21](https://www.youtube.com/playlist?list=PLLHTzKZzVU9e6xUfG10TkTWApKSZCzuBI) - YouTube Playlist. #### Things I google a lot [Color codes](https://github.com/d3/d3-3.x-api-reference/blob/master/Ordinal-Scales.md#categorical-colors) From 2c6556567ee869ed5a9b8c1515bd7ff5d2c2b59a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 25 Jan 2023 20:17:21 +0100 Subject: [PATCH 402/550] py-shiny --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ff5ea0f..966ab56 100644 --- a/README.md +++ b/README.md @@ -385,6 +385,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [Named Colors Wheel](https://arantius.github.io/web-color-wheel/) - Color wheel for all named HTML colors. #### Dashboards +[py-shiny](https://github.com/rstudio/py-shiny) - Shiny for Python, [talk](https://www.youtube.com/watch?v=ijRBbtT2tgc). [superset](https://github.com/apache/superset) - Dashboarding solution by Apache. [streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). [mercury](https://github.com/mljar/mercury) - Convert Python notebook to web app, [Example](https://github.com/pplonski/dashboard-python-jupyter-notebook). From 436c2d4d75b4ecc5ec667ecb027b476f3f58bb5a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 30 Jan 2023 10:16:18 +0100 Subject: [PATCH 403/550] Foundations of Data Science Book --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 966ab56..fb0eeb2 100644 --- a/README.md +++ b/README.md @@ -1117,6 +1117,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [datasharing](https://github.com/jtleek/datasharing) - Guide to data sharing. ##### Books +[Blum - Foundations of Data Science](https://www.cs.cornell.edu/jeh/book.pdf?file=book.pdf) [Chan - Introduction to Probability for Data Science](https://probability4datascience.com/index.html) [Colonescu - Principles of Econometrics with R](https://bookdown.org/ccolonescu/RPoE4/) From 0a32af5d47547132fc0f1e9b56a68196febe8ce1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 2 Feb 2023 16:00:23 +0100 Subject: [PATCH 404/550] Removed dead link. --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index fb0eeb2..99f14b8 100644 --- a/README.md +++ b/README.md @@ -1073,7 +1073,6 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [m2cgen](https://github.com/BayesWitnesses/m2cgen) - Transpile trained ML models into other languages. [sklearn-porter](https://github.com/nok/sklearn-porter) - Transpile trained scikit-learn estimators to C, Java, JavaScript and others. [mlflow](https://mlflow.org/) - Manage the machine learning lifecycle, including experimentation, reproducibility and deployment. -[modelchimp](https://github.com/ModelChimp/modelchimp) - Experiment Tracking. [skll](https://github.com/EducationalTestingService/skll) - Command-line utilities to make it easier to run machine learning experiments. [BentoML](https://github.com/bentoml/BentoML) - Package and deploy machine learning models for serving in production. [dagster](https://github.com/dagster-io/dagster) - Tool with focus on dependency graphs. From 8b907053927d13a5e91e9286ca0f424019e13d9d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 27 Feb 2023 17:08:20 +0100 Subject: [PATCH 405/550] Update README.md --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 99f14b8..8e401b7 100644 --- a/README.md +++ b/README.md @@ -225,6 +225,11 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. [broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. +#### Labsyspharm +[mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). +[MCQuant](https://github.com/labsyspharm/quantification) - Quantification of cell features. +[cylinter](https://github.com/labsyspharm/cylinter) - Quality assurance for microscopy images, [Website](https://labsyspharm.github.io/cylinter/). + #### Packages [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. @@ -235,7 +240,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). [ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection, [Project page](https://csbdeep.bioimagecomputing.com/tools/). -[mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Paper](https://www.nature.com/articles/s41592-021-01308-y). [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. [stardist](https://github.com/stardist/stardist) - Object Detection with Star-convex Shapes. [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. From c701a4e5286bf6bd483124d07afcdc3ca83d35a7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 1 Mar 2023 13:29:31 +0100 Subject: [PATCH 406/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 8e401b7..8ed249c 100644 --- a/README.md +++ b/README.md @@ -1065,7 +1065,6 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe ##### Data Versioning, Databases, Pipelines and Model Serving [dvc](https://github.com/iterative/dvc) - Version control for large files. -[hangar](https://github.com/tensorwerk/hangar-py) - Version control for tensor data. [kedro](https://github.com/quantumblacklabs/kedro) - Build data pipelines. [feast](https://github.com/feast-dev/feast) - Feature store. [Video](https://www.youtube.com/watch?v=_omcXenypmo). [pinecone](https://www.pinecone.io/) - Database for vector search applications. From 1cdd2ffe968e455bad8d7c3d443d1b817057faf2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 1 Mar 2023 13:38:55 +0100 Subject: [PATCH 407/550] Removed unmaintained packages. --- README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/README.md b/README.md index 8ed249c..360d48c 100644 --- a/README.md +++ b/README.md @@ -1058,10 +1058,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [cog](https://github.com/replicate/cog) - Facilitates building Docker images. ##### Dependency Management -[dephell](https://github.com/dephell/dephell) - Dependency management. [poetry](https://github.com/python-poetry/poetry) - Dependency management. -[pyup](https://github.com/pyupio/pyup) - Dependency management. -[pypi-timemachine](https://github.com/astrofrog/pypi-timemachine) - Install packages with pip as if you were in the past. ##### Data Versioning, Databases, Pipelines and Model Serving [dvc](https://github.com/iterative/dvc) - Version control for large files. From 955108cdb23156af081bda1c1e709f0b335c4c04 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 2 Mar 2023 16:25:59 +0100 Subject: [PATCH 408/550] DESeq2 --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 360d48c..fa2f708 100644 --- a/README.md +++ b/README.md @@ -494,6 +494,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Sequencing [Single cell tutorial](https://github.com/theislab/single-cell-tutorial). +[DESeq2](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) - Analyzing RNA-seq data (R package). [cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). [besca](https://github.com/bedapub/besca) - Beyond single-cell analysis. From 11ae6e4ab5f5da5d4aae0551e92ebdca66d4b5b9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 6 Mar 2023 22:51:09 +0100 Subject: [PATCH 409/550] Awesome Single Cell --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index fa2f708..2583029 100644 --- a/README.md +++ b/README.md @@ -1160,6 +1160,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Pytorch](https://github.com/bharathgs/Awesome-pytorch-list) [Awesome Quantitative Finance](https://github.com/wilsonfreitas/awesome-quant) [Awesome Recommender Systems](https://github.com/grahamjenson/list_of_recommender_systems) +[Awesome Single Cell](https://github.com/seandavi/awesome-single-cell) [Awesome Semantic Segmentation](https://github.com/mrgloom/awesome-semantic-segmentation) [Awesome Sentence Embedding](https://github.com/Separius/awesome-sentence-embedding) [Awesome Time Series](https://github.com/MaxBenChrist/awesome_time_series_in_python) From 23c0168f2bc6121671828222f63052019587d8f0 Mon Sep 17 00:00:00 2001 From: Andy Kipp Date: Thu, 9 Mar 2023 19:41:38 +0600 Subject: [PATCH 410/550] Added xonsh shell --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2583029..b254f2b 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,7 @@ [sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Helpful `DataFrameMapper` class. [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. [rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - Plugin to display .csv files with nice colors. +[xonsh](https://xon.sh) - Python-powered shell as alternative to Bash for simplifying data science automations. #### Environment and Jupyter [General Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) From b77a955ffb92fe5d440ac2e07639338bd72e646c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 13 Mar 2023 11:03:01 +0100 Subject: [PATCH 411/550] seg-eval --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b254f2b..b03d940 100644 --- a/README.md +++ b/README.md @@ -236,6 +236,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). +[seg-eval](https://github.com/lstrgar/seg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https://www.biorxiv.org/content/10.1101/2023.02.23.529809v1.full.pdf). [skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). From 4f146614c0998e13b4e0974247cba863ccebab38 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 7 Apr 2023 10:07:36 +0200 Subject: [PATCH 412/550] Cell-ACDC --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b03d940..e7bec30 100644 --- a/README.md +++ b/README.md @@ -252,6 +252,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). +[Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Cell segmentation and tracking. #### Domain Adaptation / Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). From 07e67ab9f6fd872949a6569aec0de6e1203dbae8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 29 Apr 2023 16:59:08 +0200 Subject: [PATCH 413/550] PlateEditor --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index e7bec30..64d9dc0 100644 --- a/README.md +++ b/README.md @@ -490,6 +490,9 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www #### Biology / Bioinformatics +##### Assay +[PlateEditor](https://github.com/vindelorme/PlateEditor) - Drug Layout for plates, [app](https://plateeditor.sourceforge.io/), [zip](https://sourceforge.net/projects/plateeditor/), [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252488). + ##### Biostatistics / Robust statistics [MinCovDet](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.MinCovDet.html) - Robust estimator of covariance, RMPV, [Paper](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421), [App1](https://journals.sagepub.com/doi/10.1177/1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [App2](https://www.cell.com/cell-reports/pdf/S2211-1247(21)00694-X.pdf). [winsorize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers. From bfd15d73967645d7926a1bb55d7112326d48de2d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 2 May 2023 19:02:15 +0200 Subject: [PATCH 414/550] DESeq2 -> PyDESeq2 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 64d9dc0..24df77d 100644 --- a/README.md +++ b/README.md @@ -500,7 +500,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Sequencing [Single cell tutorial](https://github.com/theislab/single-cell-tutorial). -[DESeq2](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) - Analyzing RNA-seq data (R package). +[PyDESeq2](https://github.com/owkin/PyDESeq2) - Analyzing RNA-seq data. [cellxgene](https://github.com/chanzuckerberg/cellxgene) - Interactive explorer for single-cell transcriptomics data. [scanpy](https://github.com/theislab/scanpy) - Analyze single-cell gene expression data, [tutorial](https://github.com/theislab/single-cell-tutorial). [besca](https://github.com/bedapub/besca) - Beyond single-cell analysis. From 4064e7362d6df35d8ad3afee9d647e5c9d5642cd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 5 May 2023 16:34:15 +0200 Subject: [PATCH 415/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 24df77d..399ce4c 100644 --- a/README.md +++ b/README.md @@ -46,7 +46,6 @@ Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/1 [swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas DataFrame faster. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. -[pandapy](https://github.com/firmai/pandapy) - Additional features for pandas. [lux](https://github.com/lux-org/lux) - DataFrame visualization within Jupyter. [dtale](https://github.com/man-group/dtale) - View and analyze Pandas data structures, integrating with Jupyter. [polars](https://github.com/pola-rs/polars) - Multi-threaded alternative to pandas. From 177a4ad3f01e8ef8c048a9022a80796ffe24dc4f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 May 2023 20:04:18 +0200 Subject: [PATCH 416/550] CellSeg --- README.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 399ce4c..6d814e6 100644 --- a/README.md +++ b/README.md @@ -229,29 +229,31 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). [MCQuant](https://github.com/labsyspharm/quantification) - Quantification of cell features. [cylinter](https://github.com/labsyspharm/cylinter) - Quality assurance for microscopy images, [Website](https://labsyspharm.github.io/cylinter/). +[ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. + +#### Segmentation +[Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). +[CellSeg](https://github.com/michaellee1/CellSeg) - Cell segmentation. [Paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04570-9) +[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). +[stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. +[UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. +[nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. +[allencell](https://www.allencell.org/segmenter.html) - Tools for 3D segmentation, classical and deep learning methods. +[Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. #### Packages [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. -[Tree of Microscopy](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). -[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [seg-eval](https://github.com/lstrgar/seg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https://www.biorxiv.org/content/10.1101/2023.02.23.529809v1.full.pdf). [skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). -[ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection, [Project page](https://csbdeep.bioimagecomputing.com/tools/). -[UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. -[stardist](https://github.com/stardist/stardist) - Object Detection with Star-convex Shapes. -[nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. -[allencell](https://www.allencell.org/segmenter.html) - Tools for the 3D segmentation of intracellular structures. [py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. -[ashlar](https://github.com/labsyspharm/ashlar) - Image stitching and registration. [cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). -[Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Cell segmentation and tracking. #### Domain Adaptation / Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). From 00f9bdb5d529fbf2af090a90736b408976641f04 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 May 2023 20:20:10 +0200 Subject: [PATCH 417/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 6d814e6..b10648a 100644 --- a/README.md +++ b/README.md @@ -233,7 +233,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Segmentation [Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). -[CellSeg](https://github.com/michaellee1/CellSeg) - Cell segmentation. [Paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04570-9) [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. From 6a27653c7cdc9b2213ef85279e98686117ba7786 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 May 2023 21:08:10 +0200 Subject: [PATCH 418/550] zarr --- README.md | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index b10648a..dc43da4 100644 --- a/README.md +++ b/README.md @@ -204,10 +204,12 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Computer Vision [Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) -#### Image Cleanup +#### Image Viewers [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [napari](https://github.com/napari/napari) - Multi-dimensional image viewer. [fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. + +#### Image Cleanup [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [aydin](https://github.com/royerlab/aydin) - Image denoising. [unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. @@ -216,7 +218,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal ##### Tutorials [Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. - +[python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. ##### Datasets [jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. @@ -225,13 +227,21 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. [broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. -#### Labsyspharm +##### Data Formats and Converters +OME-Zarr - [Paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.full), [Standard](https://ngff.openmicroscopy.org/latest/) +[bioformats2raw](https://github.com/glencoesoftware/bioformats2raw) - Various formats to zarr. +[raw2ometiff](https://github.com/glencoesoftware/raw2ometiff) - Zarr to tiff. +[BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). +[napari](https://napari.org/stable) - Viewer for various image formats. +[vizarr](https://github.com/hms-dbmi/vizarr) - Viewer for zarr files. + +##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). [MCQuant](https://github.com/labsyspharm/quantification) - Quantification of cell features. [cylinter](https://github.com/labsyspharm/cylinter) - Quality assurance for microscopy images, [Website](https://labsyspharm.github.io/cylinter/). [ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. -#### Segmentation +##### Segmentation [Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. @@ -239,8 +249,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [allencell](https://www.allencell.org/segmenter.html) - Tools for 3D segmentation, classical and deep learning methods. [Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. +[ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. -#### Packages +##### Packages [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [seg-eval](https://github.com/lstrgar/seg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https://www.biorxiv.org/content/10.1101/2023.02.23.529809v1.full.pdf). @@ -509,8 +520,6 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Image-related See also Microscopy Section above. -[Overview over cell segmentation algorithms](https://biomag-lab.github.io/microscopy-tree/) -[python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. [mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). [imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. [scimap](https://github.com/labsyspharm/scimap) - Spatial Single-Cell Analysis Toolkit. @@ -518,7 +527,6 @@ See also Microscopy Section above. [imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). [microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). [cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. -[ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. ##### Drug discovery [TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. From 189c95527077eb7ebeeeb7fc9e08588a95c142a7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 May 2023 21:12:21 +0200 Subject: [PATCH 419/550] REMBI --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index dc43da4..afd350a 100644 --- a/README.md +++ b/README.md @@ -234,6 +234,7 @@ OME-Zarr - [Paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.f [BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). [napari](https://napari.org/stable) - Viewer for various image formats. [vizarr](https://github.com/hms-dbmi/vizarr) - Viewer for zarr files. +REMBI model - Recommended Metadata for Biological Images, [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c) ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). From caedcf53922c80d94be2863b80e8344711cbc654 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 May 2023 21:15:30 +0200 Subject: [PATCH 420/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index afd350a..8f59418 100644 --- a/README.md +++ b/README.md @@ -228,13 +228,13 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. ##### Data Formats and Converters -OME-Zarr - [Paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.full), [Standard](https://ngff.openmicroscopy.org/latest/) +OME-Zarr - [paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.full), [standard](https://ngff.openmicroscopy.org/latest/) [bioformats2raw](https://github.com/glencoesoftware/bioformats2raw) - Various formats to zarr. [raw2ometiff](https://github.com/glencoesoftware/raw2ometiff) - Zarr to tiff. [BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). [napari](https://napari.org/stable) - Viewer for various image formats. [vizarr](https://github.com/hms-dbmi/vizarr) - Viewer for zarr files. -REMBI model - Recommended Metadata for Biological Images, [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c) +REMBI model - Recommended Metadata for Biological Images, [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheed with additional info](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919). ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). From 1cc9ed346f1365ab71e7e6c3dea6fbfd770f27db Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 May 2023 21:32:24 +0200 Subject: [PATCH 421/550] REMBI --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8f59418..40dbd55 100644 --- a/README.md +++ b/README.md @@ -234,7 +234,9 @@ OME-Zarr - [paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.f [BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). [napari](https://napari.org/stable) - Viewer for various image formats. [vizarr](https://github.com/hms-dbmi/vizarr) - Viewer for zarr files. -REMBI model - Recommended Metadata for Biological Images, [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheed with additional info](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919). +REMBI model - Recommended Metadata for Biological Images + * BioImage Archive: [Study Component Guidance](https://www.ebi.ac.uk/bioimage-archive/rembi-help-examples/), [File List Guide](https://www.ebi.ac.uk/bioimage-archive/help-file-list/) + * [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919) ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). From 26913c121921775c3f6a7d8c5992b3ae37ec99a6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 8 May 2023 14:50:18 +0200 Subject: [PATCH 422/550] Bioimaging and Bioimage Analysis Guide --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 40dbd55..a5bf465 100644 --- a/README.md +++ b/README.md @@ -219,6 +219,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal ##### Tutorials [Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. +[Bioimaging and Bioimage Analysis Guide](https://www.bioimagingguide.org/welcome.html) + ##### Datasets [jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. From b16fc22bcae0e17edc2e2926a21bc0cc40d9f246 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 8 May 2023 15:10:38 +0200 Subject: [PATCH 423/550] OMERO --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index a5bf465..4bb056b 100644 --- a/README.md +++ b/README.md @@ -205,9 +205,10 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) #### Image Viewers +[fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. [napari](https://github.com/napari/napari) - Multi-dimensional image viewer. -[fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. +[OMERO](https://www.openmicroscopy.org/omero/) - Feature rich image viewer for high-content screening. [IDR](https://idr.openmicroscopy.org/) uses OMERO. [Intro](https://www.youtube.com/watch?v=nSCrMO_c-5s) #### Image Cleanup [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. From 3c09717ffedf6840289662dab5baa69ca5fc84c7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 May 2023 11:41:13 +0200 Subject: [PATCH 424/550] ipyflow --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4bb056b..116fae6 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ [General Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) Fixing environment: [link](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) +[ipyflow](https://github.com/ipyflow/ipyflow) - IPython kernel for Jupyter with additional features. [pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. [nteract](https://nteract.io/) - Open Jupyter Notebooks with doubleclick. [papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). From 13f1d96db8c7873b4c249d4f217443c3f5012ea9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 May 2023 14:53:43 +0200 Subject: [PATCH 425/550] MEDIAR and cell segmentation datasets --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 116fae6..9222597 100644 --- a/README.md +++ b/README.md @@ -227,7 +227,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. [MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. [CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. -[cellpose dataset](https://www.cellpose.org/dataset) - Cell images. [Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. [broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. @@ -250,6 +249,7 @@ REMBI model - Recommended Metadata for Biological Images ##### Segmentation [Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). +[MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. @@ -258,6 +258,12 @@ REMBI model - Recommended Metadata for Biological Images [Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. +##### Cell Segmentation Datasets +[cellpose](https://www.cellpose.org/dataset) - Cell images. +[omnipose](http://www.cellpose.org/dataset_omnipose) - Cell images. +[LIVECell](https://github.com/sartorius-research/LIVECell) - Cell images. +[Sartorius](https://www.kaggle.com/competitions/sartorius-cell-instance-segmentation/overview) - Neurons. + ##### Packages [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. From 7a9f6182baed99415e9b34622247ca29f958a31c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 May 2023 19:56:12 +0200 Subject: [PATCH 426/550] Spring Cleaning --- README.md | 155 +++++++++++++----------------------------------------- 1 file changed, 37 insertions(+), 118 deletions(-) diff --git a/README.md b/README.md index 9222597..a9c8609 100644 --- a/README.md +++ b/README.md @@ -4,100 +4,73 @@ #### Core [pandas](https://pandas.pydata.org/) - Data structures built on top of [numpy](https://www.numpy.org/). -[scikit-learn](https://scikit-learn.org/stable/) - Core ML library. +[scikit-learn](https://scikit-learn.org/stable/) - Core ML library, [intelex](https://github.com/intel/scikit-learn-intelex). [matplotlib](https://matplotlib.org/) - Plotting library. [seaborn](https://seaborn.pydata.org/) - Data visualization library based on matplotlib. -[datatile](https://github.com/polyaxon/datatile) - Basic statistics using `DataFrameSummary(df).summary()`. -[pandas_profiling](https://github.com/pandas-profiling/pandas-profiling) - Descriptive statistics using `ProfileReport`. +[ydata-profiling](https://github.com/ydataai/ydata-profiling) - Descriptive statistics using `ProfileReport`. [sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Helpful `DataFrameMapper` class. [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. -[rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - Plugin to display .csv files with nice colors. -[xonsh](https://xon.sh) - Python-powered shell as alternative to Bash for simplifying data science automations. +[rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - VSCode plugin to display .csv files with nice colors. + +#### General Python Programming +[more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. +[tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). +[loguru](https://github.com/Delgan/loguru) - Python logging. +[pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. +[poetry](https://github.com/python-poetry/poetry) - Dependency management. +[dateparser](https://github.com/scrapinghub/dateparser) - A better date parser. + +#### Pandas Tricks, Alternatives and Additions +[pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. +[polars](https://github.com/pola-rs/polars) - Multi-threaded alternative to pandas. +[xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. +[pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. +[duckdb](https://github.com/duckdb/duckdb) - Efficiently run SQL queries on pandas DataFrame. + +#### Pandas Parallelization +[modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. +[vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. +[pandarallel](https://github.com/nalepae/pandarallel) - Parallelize pandas operations. +[swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas DataFrame faster. #### Environment and Jupyter -[General Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) -Fixing environment: [link](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) -Python debugger (pdb) - [blog post](https://www.blog.pythonlibrary.org/2018/10/17/jupyter-notebook-debugging/), [video](https://www.youtube.com/watch?v=Z0ssNAbe81M&t=1h44m15s), [cheatsheet](https://nblock.org/2011/11/15/pdb-cheatsheet/) +[Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) [ipyflow](https://github.com/ipyflow/ipyflow) - IPython kernel for Jupyter with additional features. -[pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. [nteract](https://nteract.io/) - Open Jupyter Notebooks with doubleclick. [papermill](https://github.com/nteract/papermill) - Parameterize and execute Jupyter notebooks, [tutorial](https://pbpython.com/papermil-rclone-report-1.html). [nbdime](https://github.com/jupyter/nbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https://www.reviewnb.com/). [RISE](https://github.com/damianavila/RISE) - Turn Jupyter notebooks into presentations. [qgrid](https://github.com/quantopian/qgrid) - Pandas `DataFrame` sorting. -[pivottablejs](https://github.com/nicolaskruchten/jupyter_pivottablejs) - Drag n drop Pivot Tables and Charts for Jupyter notebooks. +[lux](https://github.com/lux-org/lux) - DataFrame visualization within Jupyter. +[pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. +[dtale](https://github.com/man-group/dtale) - View and analyze Pandas data structures, integrating with Jupyter. [itables](https://github.com/mwouts/itables) - Interactive tables in Jupyter. -[jupyter-datatables](https://github.com/CermakM/jupyter-datatables) - Interactive tables in Jupyter. -[debugger](https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559) - Visual debugger for Jupyter. -[nbcommands](https://github.com/vinayak-mehta/nbcommands) - View and search notebooks from terminal. [handcalcs](https://github.com/connorferster/handcalcs) - More convenient way of writing mathematical equations in Jupyter. [notebooker](https://github.com/man-group/notebooker) - Productionize and schedule Jupyter Notebooks. [bamboolib](https://github.com/tkrabel/bamboolib) - Intuitive GUI for tables. [voila](https://github.com/QuantStack/voila) - Turn Jupyter notebooks into standalone web applications. [voila-gridstack](https://github.com/voila-dashboards/voila-gridstack) - Voila grid layout. -#### Pandas Tricks, Alternatives and Additions -[Pandas Tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431) -[Using df.pipe() (video)](https://www.youtube.com/watch?v=yXGCKqo5cEY) -[pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. -[modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. -[vaex](https://github.com/vaexio/vaex) - Out-of-Core DataFrames. -[pandarallel](https://github.com/nalepae/pandarallel) - Parallelize pandas operations. -[xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. -[swifter](https://github.com/jmcarpenter2/swifter) - Apply any function to a pandas DataFrame faster. -[pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. -[pandas-log](https://github.com/eyaltrabelsi/pandas-log) - Find business logic issues and performance issues in pandas. -[lux](https://github.com/lux-org/lux) - DataFrame visualization within Jupyter. -[dtale](https://github.com/man-group/dtale) - View and analyze Pandas data structures, integrating with Jupyter. -[polars](https://github.com/pola-rs/polars) - Multi-threaded alternative to pandas. -[duckdb](https://github.com/duckdb/duckdb) - Efficiently run SQL queries on pandas DataFrame. - -#### Scikit-Learn Alternatives -[scikit-learn-intelex](https://github.com/intel/scikit-learn-intelex) - Intel extension for scikit-learn for speed. - -#### Helpful -[drawdata](https://github.com/koaning/drawdata) - Quickly draw some points and export them as csv, [website](https://drawdata.xyz/). -[tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). -[icecream](https://github.com/gruns/icecream) - Simple debugging output. -[loguru](https://github.com/Delgan/loguru) - Python logging. -[pyprojroot](https://github.com/chendaniely/pyprojroot) - Helpful `here()` command from R. -[intake](https://github.com/intake/intake) - Loading datasets made easier, [talk](https://www.youtube.com/watch?v=s7Ww5-vD2Os&t=33m40s). - #### Extraction [textract](https://github.com/deanmalmgren/textract) - Extract text from any document. -[camelot](https://github.com/socialcopsdev/camelot) - Extract text from PDF. #### Big Data [spark](https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html#work-with-dataframes) - `DataFrame` for big data, [cheatsheet](https://gist.github.com/crawles/b47e23da8218af0b9bd9d47f5242d189), [tutorial](https://github.com/ericxiao251/spark-syntax). -[sparkit-learn](https://github.com/lensacom/sparkit-learn), [spark-deep-learning](https://github.com/databricks/spark-deep-learning) - ML frameworks for spark. -[koalas](https://github.com/databricks/koalas) - Pandas API on Apache Spark. [dask](https://github.com/dask/dask), [dask-ml](http://ml.dask.org/) - Pandas `DataFrame` for big data and machine learning library, [resources](https://matthewrocklin.com/blog//work/2018/07/17/dask-dev), [talk1](https://www.youtube.com/watch?v=ccfsbuqsjgI), [talk2](https://www.youtube.com/watch?v=RA_2qdipVng), [notebooks](https://github.com/dask/dask-ec2/tree/master/notebooks), [videos](https://www.youtube.com/user/mdrocklin). -[dask-gateway](https://github.com/jcrist/dask-gateway) - Managing dask clusters. -[turicreate](https://github.com/apple/turicreate) - Helpful `SFrame` class for out-of-memory dataframes. [h2o](https://github.com/h2oai/h2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes. [datatable](https://github.com/h2oai/datatable) - Data Table for big data support. [cuDF](https://github.com/rapidsai/cudf) - GPU DataFrame Library, [Intro](https://www.youtube.com/watch?v=6XzS5XcpicM&t=2m50s). +[cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. [ray](https://github.com/ray-project/ray/) - Flexible, high-performance distributed execution framework. -[mars](https://github.com/mars-project/mars) - Tensor-based unified framework for large-scale data computation. [bottleneck](https://github.com/kwgoodman/bottleneck) - Fast NumPy array functions written in C. -[bolz](https://github.com/Blosc/bcolz) - A columnar data container that can be compressed. -[cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. [petastorm](https://github.com/uber/petastorm) - Data access library for parquet files by Uber. [zarr](https://github.com/zarr-developers/zarr-python) - Distributed NumPy arrays. [NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by Nvidia. [tensorstore](https://github.com/google/tensorstore) - Reading and writing large multi-dimensional arrays (Google). -#### Distributed Systems -[nextflow](https://github.com/goodwright/nextflow.py) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch, [Website](https://github.com/nextflow-io/nextflow). -[dsub](https://github.com/DataBiosphere/dsub) - Run batch computing tasks in Docker image in the Google Cloud. - #### Command line tools, CSV -[ni](https://github.com/spencertipping/ni) - Command line tool for big data. -[xsv](https://github.com/BurntSushi/xsv) - Command line tool for indexing, slicing, analyzing, splitting and joining CSV files. -[csvkit](https://csvkit.readthedocs.io/en/1.0.3/) - Another command line tool for CSV files. +[csvkit](https://github.com/wireservice/csvkit) - Command line tool for CSV files. [csvsort](https://pypi.org/project/csvsort/) - Sort large csv files. -[tsv-utils](https://github.com/eBay/tsv-utils) - Tools for working with CSV files by eBay. -[cheat](https://github.com/cheat/cheat) - Make cheatsheets for command line commands. #### Classical Statistics @@ -121,7 +94,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [torch-two-sample](https://github.com/josipd/torch-two-sample) - Friedman-Rafsky Test: Compare two population based on a multivariate generalization of the Runstest. [Explanation](https://www.real-statistics.com/multivariate-statistics/multivariate-normal-distribution/friedman-rafsky-test/), [Application](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5014134/) ##### Interim Analyses / Sequential Analysis / Stopping - [Sequential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. +[Sequential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. [Treatment Effects Monitoring](https://online.stat.psu.edu/stat509/node/75/) - Design and Analysis of Clinical Trials PennState. [sequential](https://cran.r-project.org/web/packages/Sequential/Sequential.pdf) - Exact Sequential Analysis for Poisson and Binomial Data (R package). [confseq](https://github.com/gostevehoward/confseq) - Uniform boundaries, confidence sequences, and always-valid p-values. @@ -167,17 +140,13 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). -[pandasgui](https://github.com/adamerose/pandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames. -[janitor](https://pyjanitor.readthedocs.io/) - Clean messy column names. +[pyjanitor](https://github.com/pyjanitor-devs/pyjanitor) - Clean messy column names. [pandera](https://github.com/unionai-oss/pandera) - Data / Schema validation. [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Resampling for imbalanced datasets. [tspreprocess](https://github.com/MaxBenChrist/tspreprocess) - Time series preprocessing: Denoising, Compression, Resampling. [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Utility functions (`OneHotEncoder(min_obs=100)`) -[pyupset](https://github.com/ImSoErgodic/py-upset) - Visualizing intersecting sets. -[pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance / Wasserstein distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html), [POT implementation](https://pythonot.github.io/auto_examples/plot_OT_2D_samples.html) -[littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) - Sampling from graphs. #### Noisy Labels [cleanlab](https://github.com/cleanlab/cleanlab) - Machine learning with noisy labels, finding mislabelled data, and uncertainty quantification. Also see awesome list below. @@ -187,11 +156,11 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [iterative-stratification](https://github.com/trent-b/iterative-stratification) - Stratification of multilabel data. #### Feature Engineering -[Talk](https://www.youtube.com/watch?v=68ABAU_V8qI) +[Vincent Warmerdam: Untitled12.ipynb](https://www.youtube.com/watch?v=yXGCKqo5cEY) - Using df.pipe() +[Vincent Warmerdam: Winning with Simple, even Linear, Models](https://www.youtube.com/watch?v=68ABAU_V8qI) [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) - Pipeline, [examples](https://github.com/jem1031/pandas-pipelines-custom-transformers). [pdpipe](https://github.com/shaypal5/pdpipe) - Pipelines for DataFrames. [scikit-lego](https://github.com/koaning/scikit-lego) - Custom transformers for pipelines. -[skoot](https://github.com/tgsmith61591/skoot) - Pipeline helper functions. [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding) - Categorical encoding of variables, [vtreat (R package)](https://cran.r-project.org/web/packages/vtreat/vignettes/vtreat.html). [dirty_cat](https://github.com/dirty-cat/dirty_cat) - Encoding dirty categorical variables. [patsy](https://github.com/pydata/patsy/) - R-like syntax for statistical models. @@ -200,7 +169,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. [feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. -[NVTabular](https://github.com/NVIDIA/NVTabular) - Feature engineering and preprocessing library for tabular data by Nvidia. #### Computer Vision [Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) @@ -214,7 +182,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Image Cleanup [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [aydin](https://github.com/royerlab/aydin) - Image denoising. -[unprocessing](https://github.com/timothybrooks/unprocessing) - Image denoising by reverting the image processing pipeline. #### Microscopy / Segmentation @@ -373,10 +340,6 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [linearsdr](https://github.com/HarrisQ/linearsdr) - Linear Sufficient Dimension Reduction (R package). [PHATE](https://github.com/KrishnaswamyLab/PHATE) - Tool for visualizing high dimensional data. -#### Training-related -[iterative-stratification](https://github.com/trent-b/iterative-stratification) - Cross validators with stratification for multilabel data. -[livelossplot](https://github.com/stared/livelossplot) - Live training loss plot in Jupyter Notebook. - #### Visualization [All charts](https://datavizproject.com/), [Austrian monuments](https://github.com/njanakiev/austrian-monuments-visualization). [Better heatmaps and correlation plots](https://towardsdatascience.com/better-heatmaps-and-correlation-matrix-plots-in-python-41445d0f2bec). @@ -455,12 +418,10 @@ Predict economic indicators from Open Street Map [ipynb](https://github.com/njan #### Recommender Systems Examples: [1](https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/), [2](https://medium.com/@james_aka_yale/the-4-recommendation-engines-that-can-predict-your-movie-tastes-bbec857b8223), [2-ipynb](https://github.com/khanhnamle1994/movielens/blob/master/Content_Based_and_Collaborative_Filtering_Models.ipynb), [3](https://www.kaggle.com/morrisb/how-to-recommend-anything-deep-recommender). [surprise](https://github.com/NicolasHug/Surprise) - Recommender, [talk](https://www.youtube.com/watch?v=d7iIb_XVkZs). -[turicreate](https://github.com/apple/turicreate) - Recommender. [implicit](https://github.com/benfred/implicit) - Fast Collaborative Filtering for Implicit Feedback Datasets. [spotlight](https://github.com/maciejkula/spotlight) - Deep recommender models using PyTorch. [lightfm](https://github.com/lyst/lightfm) - Recommendation algorithms for both implicit and explicit feedback. [funk-svd](https://github.com/gbolmier/funk-svd) - Fast SVD. -[pywFM](https://github.com/jfloff/pywFM) - Factorization. #### Decision Tree Models [Intro to Decision Trees and Random Forests](https://victorzhou.com/blog/intro-to-random-forests/), Intro to Gradient Boosting [1](https://explained.ai/gradient-boosting/), [2](https://www.gormanalysis.com/blog/gradient-boosting-explained/), [Decision Tree Visualization](https://explained.ai/decision-tree-viz/index.html) @@ -468,22 +429,15 @@ Examples: [1](https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and- [xgboost](https://github.com/dmlc/xgboost) - Gradient boosting (GBDT, GBRT or GBM) library, [doc](https://sites.google.com/view/lauraepp/parameters), Methods for CIs: [link1](https://stats.stackexchange.com/questions/255783/confidence-interval-for-xgb-forecast), [link2](https://towardsdatascience.com/regression-prediction-intervals-with-xgboost-428e0a018b). [catboost](https://github.com/catboost/catboost) - Gradient boosting. [h2o](https://github.com/h2oai/h2o-3) - Gradient boosting and general machine learning framework. -[snapml](https://www.zurich.ibm.com/snapml/) - Gradient boosting and general machine learning framework by IBM, for CPU and GPU. [PyPI](https://pypi.org/project/snapml/) [pycaret](https://github.com/pycaret/pycaret) - Wrapper for xgboost, lightgbm, catboost etc. -[thundergbm](https://github.com/Xtra-Computing/thundergbm) - GBDTs and Random Forest. -[h2o](https://github.com/h2oai/h2o-3) - Gradient boosting. [forestci](https://github.com/scikit-learn-contrib/forest-confidence-interval) - Confidence intervals for random forests. -[scikit-garden](https://github.com/scikit-garden/scikit-garden) - Quantile Regression. [grf](https://github.com/grf-labs/grf) - Generalized random forest. [dtreeviz](https://github.com/parrt/dtreeviz) - Decision tree visualization and model interpretation. [Nuance](https://github.com/SauceCat/Nuance) - Decision tree visualization. [rfpimp](https://github.com/parrt/random-forest-importances) - Feature Importance for RandomForests using Permuation Importance. Why the default feature importance for random forests is wrong: [link](http://explained.ai/rf-importance/index.html) -[treeinterpreter](https://github.com/andosa/treeinterpreter) - Interpreting scikit-learn's decision tree and random forest predictions. [bartpy](https://github.com/JakeColtman/bartpy) - Bayesian Additive Regression Trees. -[infiniteboost](https://github.com/arogozhnikov/infiniteboost) - Combination of RFs and GBDTs. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) -[rrcf](https://github.com/kLabUM/rrcf) - Robust Random Cut Forest algorithm for anomaly detection on streams. [groot](https://github.com/tudelft-cda-lab/GROOT) - Robust decision trees. [linear-tree](https://github.com/cerlymarco/linear-tree) - Trees with linear models at the leaves. @@ -504,14 +458,10 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [infomap](https://github.com/mapequation/infomap) - Cluster (word-)vectors to find topics, [example](https://github.com/mapequation/infomap/blob/master/examples/python/infomap-examples.ipynb). [datasketch](https://github.com/ekzhu/datasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog). [flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando. -[stanfordnlp](https://github.com/stanfordnlp/stanfordnlp) - NLP Library. +[stanza](https://github.com/stanfordnlp/stanza) - NLP Library. [Chatistics](https://github.com/MasterScrat/Chatistics) - Turn Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames. -[textvec](https://github.com/textvec/textvec) - Supervised text vectorization tool. [textdistance](https://github.com/life4/textdistance) - Collection for comparing distances between two or more sequences. -##### Papers -[Search Engine Correlation](https://arxiv.org/pdf/1107.2691.pdf) - #### Biology / Bioinformatics ##### Assay @@ -538,8 +488,6 @@ See also Microscopy Section above. [scimap](https://github.com/labsyspharm/scimap) - Spatial Single-Cell Analysis Toolkit. [CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. [imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). -[microscopium](https://github.com/microscopium/microscopium) - Unsupervised clustering of images + viewer, [talk](https://www.youtube.com/watch?v=ytEQl9xs8FQ). -[cytokit](https://github.com/hammerlab/cytokit) - Analyzing properties of cells in fluorescent microscopy datasets. ##### Drug discovery [TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. @@ -651,7 +599,6 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Image Annotation [cvat](https://github.com/openvinotoolkit/cvat) - Image annotation tool. -[pigeon](https://github.com/agermanidis/pigeon) - Create annotations from within a Jupyter notebook. ##### Image Classification [nfnets](https://github.com/ypeleg/nfnets-keras) - Neural network. @@ -904,6 +851,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [banpei](https://github.com/tsurubee/banpei) - Anomaly detection library based on singular spectrum transformation. [telemanom](https://github.com/khundman/telemanom) - Detect anomalies in multivariate time series data using LSTMs. [luminaire](https://github.com/zillow/luminaire) - Anomaly Detection for time series. +[rrcf](https://github.com/kLabUM/rrcf) - Robust Random Cut Forest algorithm for anomaly detection on streams. #### Concept Drift & Domain Shift [TorchDrift](https://github.com/TorchDrift/TorchDrift) - Drift Detection for PyTorch Models. @@ -915,9 +863,6 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Ranking [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear classification, regression and ranking. -#### Scoring -[SLIM](https://github.com/ustunb/slim-python) - Scoring systems for classification, Supersparse linear integer models. - #### Causal Inference [CS 594 Causal Inference and Learning](https://www.cs.uic.edu/~elena/courses/fall19/cs594cil.html) [Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https://bookdown.org/content/4857/), [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). @@ -975,10 +920,6 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [awesome-conformal-prediction](https://github.com/valeman/awesome-conformal-prediction) - Uncertainty quantification. [uncertainty-toolbox](https://github.com/uncertainty-toolbox/uncertainty-toolbox) - Predictive uncertainty quantification, calibration, metrics, and visualization. -#### Interpretable Classifiers and Regressors -[skope-rules](https://github.com/scikit-learn-contrib/skope-rules) - Interpretable classifier, IF-THEN rules. -[sklearn-expertsys](https://github.com/tmadl/sklearn-expertsys) - Interpretable classifiers, Bayesian Rule List classifier. - #### Model Explanation, Interpretability, Feature Importance [Princeton - Reproducibility Crisis in ML‑based Science](https://sites.google.com/princeton.edu/rep-workshop) [Book](https://christophm.github.io/interpretable-ml-book/agnostic.html), [Examples](https://github.com/jphall663/interpretable_machine_learning_with_python) @@ -992,9 +933,6 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [pycebox](https://github.com/AustinRochford/PyCEbox) - Individual Conditional Expectation Plot Toolbox. [pdpbox](https://github.com/SauceCat/PDPbox) - Partial dependence plot toolbox, [example](https://www.kaggle.com/dansbecker/partial-plots). [partial_dependence](https://github.com/nyuvis/partial_dependence) - Visualize and cluster partial dependence. -[skater](https://github.com/datascienceinc/Skater) - Unified framework to enable model interpretation. -[anchor](https://github.com/marcotcr/anchor) - High-Precision Model-Agnostic Explanations for classifiers. -[l2x](https://github.com/Jianbo-Lab/L2X) - Instancewise feature selection as methodology for model interpretation. [contrastive_explanation](https://github.com/MarcelRobeer/ContrastiveExplanation) - Contrastive explanations. [DrWhy](https://github.com/ModelOriented/DrWhy) - Collection of tools for explainable AI. [lucid](https://github.com/tensorflow/lucid) - Neural network interpretability. @@ -1009,10 +947,8 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin #### Automated Machine Learning [AdaNet](https://github.com/tensorflow/adanet) - Automated machine learning based on TensorFlow. [tpot](https://github.com/EpistasisLab/tpot) - Automated machine learning tool, optimizes machine learning pipelines. -[auto_ml](https://github.com/ClimbsRocks/auto_ml) - Automated machine learning for analytics & production. [autokeras](https://github.com/jhfjhfj1/autokeras) - AutoML for deep learning. [nni](https://github.com/Microsoft/nni) - Toolkit for neural architecture search and hyper-parameter tuning by Microsoft. -[automl-gs](https://github.com/minimaxir/automl-gs) - Automated machine learning. [mljar](https://github.com/mljar/mljar-supervised) - Automated machine learning. [automl_zero](https://github.com/google-research/google-research/tree/master/automl_zero) - Automatically discover computer programs that can solve machine learning tasks from Google. [AlphaPy](https://github.com/ScottfreeLLC/AlphaPy) - Automated Machine Learning using scikit-learn xgboost, LightGBM and others. @@ -1056,7 +992,6 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 sklearn - [PassiveAggressiveClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html). [river](https://github.com/online-ml/river) - Online machine learning. [Kaggler](https://github.com/jeongyoonlee/Kaggler) - Online Learning algorithms. -[onelearn](https://github.com/onelearn/onelearn) - Online Random Forests. #### Active Learning [Talk](https://www.youtube.com/watch?v=0efyjq5rWS4) @@ -1072,6 +1007,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe #### Deployment and Lifecycle Management ##### Workflow Scheduling and Orchestration +[nextflow](https://github.com/goodwright/nextflow.py) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch, [Website](https://github.com/nextflow-io/nextflow). [airflow](https://github.com/apache/airflow) - Schedule and monitor workflows. [prefect](https://github.com/PrefectHQ/prefect) - Python specific workflow scheduling. [dagster](https://github.com/dagster-io/dagster) - Development, production and observation of data assets. @@ -1085,9 +1021,6 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [Optimize Docker Image Size](https://www.augmentedmind.de/2022/02/06/optimize-docker-image-size/) [cog](https://github.com/replicate/cog) - Facilitates building Docker images. -##### Dependency Management -[poetry](https://github.com/python-poetry/poetry) - Dependency management. - ##### Data Versioning, Databases, Pipelines and Model Serving [dvc](https://github.com/iterative/dvc) - Version control for large files. [kedro](https://github.com/quantumblacklabs/kedro) - Build data pipelines. @@ -1119,20 +1052,6 @@ Gilbert Strang - [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06- Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machine Learning ](https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/) -#### Other -[daft](https://github.com/dfm/daft) - Render probabilistic graphical models using matplotlib. -[unyt](https://github.com/yt-project/unyt) - Working with units. -[scrapy](https://github.com/scrapy/scrapy) - Web scraping library. -[VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit) - ML Toolkit from Microsoft. -[Python Record Linkage Toolkit](https://github.com/J535D165/recordlinkage) - link records in or between data sources. - -#### General Python Programming -[more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. -[funcy](https://github.com/Suor/funcy) - Fancy and practical functional tools. -[dateparser](https://dateparser.readthedocs.io/en/latest/) - A better date parser. -[jellyfish](https://github.com/jamesturk/jellyfish) - Approximate string matching. -[coloredlogs](https://github.com/xolox/python-coloredlogs) - Colored logging output. - #### Resources [Distill.pub](https://distill.pub/) - Blog. [Machine Learning Videos](https://github.com/dustinvtran/ml-videos) From e3b8592236cea423648e09a06d4cc21a37099352 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 10 May 2023 19:58:40 +0200 Subject: [PATCH 427/550] hydra --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index a9c8609..a481232 100644 --- a/README.md +++ b/README.md @@ -16,9 +16,10 @@ [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). [loguru](https://github.com/Delgan/loguru) - Python logging. +[dateparser](https://github.com/scrapinghub/dateparser) - A better date parser. +[hydra](https://github.com/facebookresearch/hydra) - Configuration management. [pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. [poetry](https://github.com/python-poetry/poetry) - Dependency management. -[dateparser](https://github.com/scrapinghub/dateparser) - A better date parser. #### Pandas Tricks, Alternatives and Additions [pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. From 1e1d6408d9deccbea91047646a4a887928e51e07 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 May 2023 13:37:03 +0200 Subject: [PATCH 428/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a481232..21bbe41 100644 --- a/README.md +++ b/README.md @@ -217,7 +217,7 @@ REMBI model - Recommended Metadata for Biological Images ##### Segmentation [Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). -[MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. +[MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. From acea859443431edded51afd969396ec479c0bd08 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 May 2023 15:17:57 +0200 Subject: [PATCH 429/550] quartets --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 21bbe41..76b76a9 100644 --- a/README.md +++ b/README.md @@ -130,6 +130,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) [Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) +[Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https://www.youtube.com/watch?v=DbJyPELmhJc) #### Epidemiology [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) @@ -138,6 +139,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [researchpy](https://github.com/researchpy/researchpy) - Helpful `summary_cont()` function for summary statistics (Table 1). [zEpid](https://github.com/pzivich/zEpid) - Epidemiology analysis package, [Tutorial](https://github.com/pzivich/Python-for-Epidemiologists). [tipr](https://github.com/LucyMcGowan/tipr) - Sensitivity analyses for unmeasured confounders (R package). +[quartets](https://github.com/r-causal/quartets) - Anscombe’s Quartet, Causal Quartet, [Datasaurus Dozen](https://github.com/jumpingrivers/datasauRus) and others (R package). #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). From 5c58cd0b2132139d37feba7bcea1b3a8dc9363d0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 May 2023 08:07:22 +0200 Subject: [PATCH 430/550] fractal --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 76b76a9..91abcce 100644 --- a/README.md +++ b/README.md @@ -211,7 +211,10 @@ REMBI model - Recommended Metadata for Biological Images * BioImage Archive: [Study Component Guidance](https://www.ebi.ac.uk/bioimage-archive/rembi-help-examples/), [File List Guide](https://www.ebi.ac.uk/bioimage-archive/help-file-list/) * [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919) -##### Labsyspharm +##### Platforms and Pipelines +[fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data. + +###### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). [MCQuant](https://github.com/labsyspharm/quantification) - Quantification of cell features. [cylinter](https://github.com/labsyspharm/cylinter) - Quality assurance for microscopy images, [Website](https://labsyspharm.github.io/cylinter/). From 3148411885676f12bcf4fc85387720bf8e230218 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 May 2023 14:53:33 +0200 Subject: [PATCH 431/550] Update README.md --- README.md | 216 ++++++++++++++++++++++++++---------------------------- 1 file changed, 104 insertions(+), 112 deletions(-) diff --git a/README.md b/README.md index 91abcce..d1efea0 100644 --- a/README.md +++ b/README.md @@ -176,98 +176,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Computer Vision [Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) -#### Image Viewers -[fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. -[Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing package. -[napari](https://github.com/napari/napari) - Multi-dimensional image viewer. -[OMERO](https://www.openmicroscopy.org/omero/) - Feature rich image viewer for high-content screening. [IDR](https://idr.openmicroscopy.org/) uses OMERO. [Intro](https://www.youtube.com/watch?v=nSCrMO_c-5s) - -#### Image Cleanup -[DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. -[aydin](https://github.com/royerlab/aydin) - Image denoising. - -#### Microscopy / Segmentation - -##### Tutorials -[Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. -[python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. -[Bioimaging and Bioimage Analysis Guide](https://www.bioimagingguide.org/welcome.html) - -##### Datasets -[jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. -[MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. -[CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. -[Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. -[broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. - -##### Data Formats and Converters -OME-Zarr - [paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.full), [standard](https://ngff.openmicroscopy.org/latest/) -[bioformats2raw](https://github.com/glencoesoftware/bioformats2raw) - Various formats to zarr. -[raw2ometiff](https://github.com/glencoesoftware/raw2ometiff) - Zarr to tiff. -[BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). -[napari](https://napari.org/stable) - Viewer for various image formats. -[vizarr](https://github.com/hms-dbmi/vizarr) - Viewer for zarr files. -REMBI model - Recommended Metadata for Biological Images - * BioImage Archive: [Study Component Guidance](https://www.ebi.ac.uk/bioimage-archive/rembi-help-examples/), [File List Guide](https://www.ebi.ac.uk/bioimage-archive/help-file-list/) - * [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919) - -##### Platforms and Pipelines -[fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data. - -###### Labsyspharm -[mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). -[MCQuant](https://github.com/labsyspharm/quantification) - Quantification of cell features. -[cylinter](https://github.com/labsyspharm/cylinter) - Quality assurance for microscopy images, [Website](https://labsyspharm.github.io/cylinter/). -[ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. - -##### Segmentation -[Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). -[MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. -[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). -[stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. -[UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. -[nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. -[allencell](https://www.allencell.org/segmenter.html) - Tools for 3D segmentation, classical and deep learning methods. -[Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. -[ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. - -##### Cell Segmentation Datasets -[cellpose](https://www.cellpose.org/dataset) - Cell images. -[omnipose](http://www.cellpose.org/dataset_omnipose) - Cell images. -[LIVECell](https://github.com/sartorius-research/LIVECell) - Cell images. -[Sartorius](https://www.kaggle.com/competitions/sartorius-cell-instance-segmentation/overview) - Neurons. - -##### Packages -[Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) -[BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. -[seg-eval](https://github.com/lstrgar/seg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https://www.biorxiv.org/content/10.1101/2023.02.23.529809v1.full.pdf). -[skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). -[cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. -[BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). -[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Image denoising, restoration and object detection, [Project page](https://csbdeep.bioimagecomputing.com/tools/). -[atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. -[py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. -[cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. -Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). -Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). - -#### Domain Adaptation / Batch-Effect Correction -[Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). -[R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html). -[harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. -[pyliger](https://github.com/welch-lab/pyliger) - Batch-effect correction, [Example](https://github.com/welch-lab/pyliger/blob/master/pyliger/factorization/_iNMF_ANLS.py#L65), [R package](https://github.com/welch-lab/liger). -[nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. -[scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). -[CORAL](https://github.com/google-research/google-research/tree/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https://github.com/google-research/google-research/blob/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn/transform.py#L152), [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050548/). -[adapt](https://github.com/adapt-python/adapt) - Awesome Domain Adaptation Python Toolbox. -[pytorch-adapt](https://github.com/KevinMusgrave/pytorch-adapt) - Various neural network models for domain adaptation. - -#### Feature Engineering Images -[skimage](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. -[mahotas](https://github.com/luispedro/mahotas) - Zernike, Haralick, LBP, and TAS features. -[pyradiomics](https://github.com/AIM-Harvard/pyradiomics) - Radiomics features from medical imaging. -[pyefd](https://github.com/hbldh/pyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series. - #### Feature Selection [Overview Paper](https://www.sciencedirect.com/science/article/pii/S016794731930194X), [Talk](https://www.youtube.com/watch?v=JsArBz46_3s), [Repo](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection) Blog post series - [1](http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/), [2](http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/), [3](http://blog.datadive.net/selecting-good-features-part-iii-random-forests/), [4](http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/) @@ -468,15 +376,114 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [Chatistics](https://github.com/MasterScrat/Chatistics) - Turn Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames. [textdistance](https://github.com/life4/textdistance) - Collection for comparing distances between two or more sequences. -#### Biology / Bioinformatics +#### Bio Image Analysis +[Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) + +##### Tutorials +[Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. +[python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. +[Bioimaging and Bioimage Analysis Guide](https://www.bioimagingguide.org/welcome.html) + +##### Datasets +[jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. +[MedMNIST](https://github.com/MedMNIST/MedMNIST) - Datasets for 2D and 3D Biomedical Image Classification. +[CytoImageNet](https://github.com/stan-hua/CytoImageNet) - Huge diverse dataset like ImageNet but for cell images. +[Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. +[broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. -##### Assay +#### Assay +[BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [PlateEditor](https://github.com/vindelorme/PlateEditor) - Drug Layout for plates, [app](https://plateeditor.sourceforge.io/), [zip](https://sourceforge.net/projects/plateeditor/), [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252488). -##### Biostatistics / Robust statistics +#### Biostatistics / Robust statistics +[Z-factor](https://en.wikipedia.org/wiki/Z-factor) - Measure of statistical effect size. [MinCovDet](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.MinCovDet.html) - Robust estimator of covariance, RMPV, [Paper](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421), [App1](https://journals.sagepub.com/doi/10.1177/1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [App2](https://www.cell.com/cell-reports/pdf/S2211-1247(21)00694-X.pdf). -[winsorize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers. [moderated z-score](https://clue.io/connectopedia/replicate_collapse) - Weighted average of z-scores based on Spearman correlation. +[winsorize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers. + +##### Data Formats and Converters +OME-Zarr - [paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.full), [standard](https://ngff.openmicroscopy.org/latest/) +[bioformats2raw](https://github.com/glencoesoftware/bioformats2raw) - Various formats to zarr. +[raw2ometiff](https://github.com/glencoesoftware/raw2ometiff) - Zarr to tiff. +[BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). +REMBI model - Recommended Metadata for Biological Images + * BioImage Archive: [Study Component Guidance](https://www.ebi.ac.uk/bioimage-archive/rembi-help-examples/), [File List Guide](https://www.ebi.ac.uk/bioimage-archive/help-file-list/) + * [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919) + +#### Image Viewers +[vizarr](https://github.com/hms-dbmi/vizarr) - Browser-based image viewer for zarr format. +[avivator](https://github.com/hms-dbmi/viv) - Browser-based image viewer for tiff files. +[napari](https://github.com/napari/napari) - Image viewer and image processing tool. +[Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing tool. +[OMERO](https://www.openmicroscopy.org/omero/) - Image viewer for high-content screening. [IDR](https://idr.openmicroscopy.org/) uses OMERO. [Intro](https://www.youtube.com/watch?v=nSCrMO_c-5s) +[fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. + +##### Image Restoration and Denoising +[aydin](https://github.com/royerlab/aydin) - Image denoising. +[DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. +[CSBDeep](https://github.com/CSBDeep/CSBDeep) - Content-aware image restoration, [Project page](https://csbdeep.bioimagecomputing.com/tools/). + +##### Illumination correction + Bleed through correction +[skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). +[cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. +[BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). +[cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. +Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). +Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). + +##### Platforms and Pipelines +[fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data. +[atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. +[py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. + +##### Labsyspharm +[mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). +[MCQuant](https://github.com/labsyspharm/quantification) - Quantification of cell features. +[cylinter](https://github.com/labsyspharm/cylinter) - Quality assurance for microscopy images, [Website](https://labsyspharm.github.io/cylinter/). +[ashlar](https://github.com/labsyspharm/ashlar) - Whole-slide microscopy image stitching and registration. +[scimap](https://github.com/labsyspharm/scimap) - Spatial Single-Cell Analysis Toolkit. + +##### Cell Segmentation +[Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). +[MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. +[cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). +[stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. +[UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. +[nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. +[allencell](https://www.allencell.org/segmenter.html) - Tools for 3D segmentation, classical and deep learning methods. +[Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. +[ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. +[EmbedSeg](https://github.com/juglab/EmbedSeg) - Embedding-based Instance Segmentation. + +##### Cell Segmentation Datasets +[cellpose](https://www.cellpose.org/dataset) - Cell images. +[omnipose](http://www.cellpose.org/dataset_omnipose) - Cell images. +[LIVECell](https://github.com/sartorius-research/LIVECell) - Cell images. +[Sartorius](https://www.kaggle.com/competitions/sartorius-cell-instance-segmentation/overview) - Neurons. +[EmbedSeg](https://github.com/juglab/EmbedSeg/releases/tag/v0.1.0) - 2D + 3D images. + +##### Evaluation +[seg-eval](https://github.com/lstrgar/seg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https://www.biorxiv.org/content/10.1101/2023.02.23.529809v1.full.pdf). + +##### Feature Engineering Images +[Computer vision challenges in drug discovery - Maciej Hermanowicz](https://www.youtube.com/watch?v=Y5GJmnIhvFk) +[CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. +[scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. +[scikit-image regionprops](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent. +[mahotas](https://github.com/luispedro/mahotas) - Zernike, Haralick, LBP, and TAS features, [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). +[pyradiomics](https://github.com/AIM-Harvard/pyradiomics) - Radiomics features from medical imaging. +[pyefd](https://github.com/hbldh/pyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series. + +#### Domain Adaptation / Batch-Effect Correction +[Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). +[R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html). +[harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. +[pyliger](https://github.com/welch-lab/pyliger) - Batch-effect correction, [Example](https://github.com/welch-lab/pyliger/blob/master/pyliger/factorization/_iNMF_ANLS.py#L65), [R package](https://github.com/welch-lab/liger). +[nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. +[scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). +[CORAL](https://github.com/google-research/google-research/tree/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https://github.com/google-research/google-research/blob/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn/transform.py#L152), [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050548/). +[adapt](https://github.com/adapt-python/adapt) - Awesome Domain Adaptation Python Toolbox. +[pytorch-adapt](https://github.com/KevinMusgrave/pytorch-adapt) - Various neural network models for domain adaptation. ##### Sequencing [Single cell tutorial](https://github.com/theislab/single-cell-tutorial). @@ -487,28 +494,13 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [janggu](https://github.com/BIMSBbioinfo/janggu) - Deep Learning for Genomics. [gdsctools](https://github.com/CancerRxGene/gdsctools) - Drug responses in the context of the Genomics of Drug Sensitivity in Cancer project, ANOVA, IC50, MoBEM, [doc](https://gdsctools.readthedocs.io/en/master/). -##### Image-related -See also Microscopy Section above. -[mahotas](http://luispedro.org/software/mahotas/) - Image processing (Bioinformatics), [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). -[imagepy](https://github.com/Image-Py/imagepy) - Software package for bioimage analysis. -[scimap](https://github.com/labsyspharm/scimap) - Spatial Single-Cell Analysis Toolkit. -[CellProfiler](https://github.com/CellProfiler/CellProfiler) - Biological image analysis. -[imglyb](https://github.com/imglib/imglyb) - Viewer for large images, [talk](https://www.youtube.com/watch?v=Ddo5z5qGMb8), [slides](https://github.com/hanslovsky/scipy-2019/blob/master/scipy-2019-imglyb.pdf). - ##### Drug discovery [TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modelling and Prediction Toolkit. -##### Courses -[mit6874](https://mit6874.github.io/) - Computational Systems Biology: Deep Learning in the Life Sciences. - -#### Image Processing -[Talk](https://www.youtube.com/watch?v=Y5GJmnIhvFk) -[cv2](https://github.com/skvark/opencv-python) - OpenCV, classical algorithms: [Gaussian Filter](https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html), [Morphological Transformations](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html). -[scikit-image](https://github.com/scikit-image/scikit-image) - Image processing. - #### Neural Networks [Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. +[mit6874](https://mit6874.github.io/) - Computational Systems Biology: Deep Learning in the Life Sciences. [ConvNet Shape Calculator](https://madebyollin.github.io/convnet-calculator/) - Calculate output dimensions of Conv2D layer. [Great Gradient Descent Article](https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9). [Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html). From 111f878572847925851aadc3770fe274ce77281e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 May 2023 14:55:07 +0200 Subject: [PATCH 432/550] Update README.md --- README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/README.md b/README.md index d1efea0..55fcb7e 100644 --- a/README.md +++ b/README.md @@ -595,9 +595,6 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [Detic](https://github.com/facebookresearch/Detic) - Detector with image classes that can use image-level labels (facebookresearch). [EasyCV](https://github.com/alibaba/EasyCV) - Image segmentation, classification, metric-learning, object detection, pose estimation. -##### Image Annotation -[cvat](https://github.com/openvinotoolkit/cvat) - Image annotation tool. - ##### Image Classification [nfnets](https://github.com/ypeleg/nfnets-keras) - Neural network. [efficientnet](https://github.com/lukemelas/EfficientNet-PyTorch) - Neural network. From 9b20a278619f27aa8518f2090f3fd3a00bbcb036 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 14 May 2023 14:56:38 +0200 Subject: [PATCH 433/550] Update README.md --- README.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 55fcb7e..6d3ca6e 100644 --- a/README.md +++ b/README.md @@ -405,10 +405,8 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www OME-Zarr - [paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.full), [standard](https://ngff.openmicroscopy.org/latest/) [bioformats2raw](https://github.com/glencoesoftware/bioformats2raw) - Various formats to zarr. [raw2ometiff](https://github.com/glencoesoftware/raw2ometiff) - Zarr to tiff. -[BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). -REMBI model - Recommended Metadata for Biological Images - * BioImage Archive: [Study Component Guidance](https://www.ebi.ac.uk/bioimage-archive/rembi-help-examples/), [File List Guide](https://www.ebi.ac.uk/bioimage-archive/help-file-list/) - * [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919) +[BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). +REMBI model - Recommended Metadata for Biological Images, BioImage Archive: [Study Component Guidance](https://www.ebi.ac.uk/bioimage-archive/rembi-help-examples/), [File List Guide](https://www.ebi.ac.uk/bioimage-archive/help-file-list/), [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919) #### Image Viewers [vizarr](https://github.com/hms-dbmi/vizarr) - Browser-based image viewer for zarr format. From 63747612731d929bb5c58e3cce5c31f642bc55ab Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 15 May 2023 20:18:29 +0200 Subject: [PATCH 434/550] BioImage Model Zoo --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6d3ca6e..7166e5f 100644 --- a/README.md +++ b/README.md @@ -442,7 +442,8 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [scimap](https://github.com/labsyspharm/scimap) - Spatial Single-Cell Analysis Toolkit. ##### Cell Segmentation -[Overview](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). +[microscopy-tree](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). +[BioImage.IO](https://bioimage.io/#/) - BioImage Model Zoo. [MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. From 67f846fb3031d192e0700104d5729092c234015b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 17 May 2023 17:00:52 +0200 Subject: [PATCH 435/550] hatch and Microscopy Resolution Calculator --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7166e5f..62820c0 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ [hydra](https://github.com/facebookresearch/hydra) - Configuration management. [pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. [poetry](https://github.com/python-poetry/poetry) - Dependency management. +[hatch](https://github.com/pypa/hatch) - Python project management. #### Pandas Tricks, Alternatives and Additions [pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. @@ -391,8 +392,9 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. [broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. -#### Assay +#### Microscopy + Assay [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. +[Microscopy Resolution Calculator](https://www.microscope.healthcare.nikon.com/microtools/resolution-calculator) - Calculate resolution of images (Nikon). [PlateEditor](https://github.com/vindelorme/PlateEditor) - Drug Layout for plates, [app](https://plateeditor.sourceforge.io/), [zip](https://sourceforge.net/projects/plateeditor/), [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252488). #### Biostatistics / Robust statistics From e68f7cda82823492aa2f82bab37208eb07b48b7d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 18 May 2023 10:05:11 +0200 Subject: [PATCH 436/550] matrix data formats --- README.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 62820c0..45a6d90 100644 --- a/README.md +++ b/README.md @@ -392,24 +392,30 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [Haghighi](https://github.com/carpenterlab/2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles. [broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. -#### Microscopy + Assay -[BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. -[Microscopy Resolution Calculator](https://www.microscope.healthcare.nikon.com/microtools/resolution-calculator) - Calculate resolution of images (Nikon). -[PlateEditor](https://github.com/vindelorme/PlateEditor) - Drug Layout for plates, [app](https://plateeditor.sourceforge.io/), [zip](https://sourceforge.net/projects/plateeditor/), [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252488). - #### Biostatistics / Robust statistics [Z-factor](https://en.wikipedia.org/wiki/Z-factor) - Measure of statistical effect size. [MinCovDet](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.MinCovDet.html) - Robust estimator of covariance, RMPV, [Paper](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421), [App1](https://journals.sagepub.com/doi/10.1177/1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [App2](https://www.cell.com/cell-reports/pdf/S2211-1247(21)00694-X.pdf). [moderated z-score](https://clue.io/connectopedia/replicate_collapse) - Weighted average of z-scores based on Spearman correlation. [winsorize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers. -##### Data Formats and Converters +#### Microscopy + Assay +[BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. +[Microscopy Resolution Calculator](https://www.microscope.healthcare.nikon.com/microtools/resolution-calculator) - Calculate resolution of images (Nikon). +[PlateEditor](https://github.com/vindelorme/PlateEditor) - Drug Layout for plates, [app](https://plateeditor.sourceforge.io/), [zip](https://sourceforge.net/projects/plateeditor/), [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252488). + +##### Image Formats and Converters OME-Zarr - [paper](https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1.full), [standard](https://ngff.openmicroscopy.org/latest/) [bioformats2raw](https://github.com/glencoesoftware/bioformats2raw) - Various formats to zarr. [raw2ometiff](https://github.com/glencoesoftware/raw2ometiff) - Zarr to tiff. [BatchConvert](https://github.com/Euro-BioImaging/BatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https://www.youtube.com/watch?v=DeCWV274l0c). REMBI model - Recommended Metadata for Biological Images, BioImage Archive: [Study Component Guidance](https://www.ebi.ac.uk/bioimage-archive/rembi-help-examples/), [File List Guide](https://www.ebi.ac.uk/bioimage-archive/help-file-list/), [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606015/), [video](https://www.youtube.com/watch?v=GVmfOpuP2_c), [spreadsheet](https://docs.google.com/spreadsheets/d/1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo/edit#gid=1023506919) +##### Matrix Formats +[anndata](https://github.com/scverse/anndata) - annotated data matrices in memory and on disk, [Docs](https://anndata.readthedocs.io/en/latest/index.html). +[muon](https://github.com/scverse/muon) - Multimodal omics framework. +[mudata](https://github.com/scverse/mudata) - Multimodal Data (.h5mu) implementation. +[bdz](https://github.com/openssbd/bdz) - Zarr-based format for storing quantitative biological dynamics data. + #### Image Viewers [vizarr](https://github.com/hms-dbmi/vizarr) - Browser-based image viewer for zarr format. [avivator](https://github.com/hms-dbmi/viv) - Browser-based image viewer for tiff files. From f5bde1fdebeaefe48f4b8284a6459eb06fe7a736 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 May 2023 23:16:06 +0200 Subject: [PATCH 437/550] Thermofisher Spectrum Viewer --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 45a6d90..69cfc5a 100644 --- a/README.md +++ b/README.md @@ -400,6 +400,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www #### Microscopy + Assay [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. +[Thermofisher Spectrum Viewer](https://www.thermofisher.com/order/stain-it) - Thermofisher Spectrum Viewer. [Microscopy Resolution Calculator](https://www.microscope.healthcare.nikon.com/microtools/resolution-calculator) - Calculate resolution of images (Nikon). [PlateEditor](https://github.com/vindelorme/PlateEditor) - Drug Layout for plates, [app](https://plateeditor.sourceforge.io/), [zip](https://sourceforge.net/projects/plateeditor/), [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252488). From 5125f64bfa92203ffe1c05f0a1ef224b55fa9272 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 20 May 2023 17:41:26 +0200 Subject: [PATCH 438/550] Cleanup Dead Links --- README.md | 40 +++++++++++++++++----------------------- 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index 69cfc5a..6f1e24a 100644 --- a/README.md +++ b/README.md @@ -86,7 +86,7 @@ [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) - Statistical tests. [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandaltman.html), [2](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. -[ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html), Tutorials: [One-way](https://pythonfordatascience.org/anova-python/), [Two-way](https://pythonfordatascience.org/anova-2-way-n-way/), [Type 1,2,3 explained](https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). +[ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html) ##### Statistical Tests [test_proportions_2indep](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.test_proportions_2indep.html) - Proportion test. @@ -97,7 +97,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal ##### Interim Analyses / Sequential Analysis / Stopping [Sequential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. -[Treatment Effects Monitoring](https://online.stat.psu.edu/stat509/node/75/) - Design and Analysis of Clinical Trials PennState. [sequential](https://cran.r-project.org/web/packages/Sequential/Sequential.pdf) - Exact Sequential Analysis for Poisson and Binomial Data (R package). [confseq](https://github.com/gostevehoward/confseq) - Uniform boundaries, confidence sequences, and always-valid p-values. @@ -126,9 +125,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) [Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) -[Chatruc - The Central Limit Theorem and its misuse](https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) +[Chatruc - The Central Limit Theorem and its misuse](https://web.archive.org/web/20191229234155/https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) [Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) -[Wainer - The Most Dangerous Equation](http://www-stat.wharton.upenn.edu/~hwainer/Readings/Most%20Dangerous%20eqn.pdf) +[Wainer - The Most Dangerous Equation](http://nsmn1.uh.edu/dgraur/niv/themostdangerousequation.pdf) [Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) [Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https://www.youtube.com/watch?v=DbJyPELmhJc) @@ -277,7 +276,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [dtreeviz](https://github.com/parrt/dtreeviz) - Decision tree visualization and model interpretation. [chartify](https://github.com/spotify/chartify/) - Generate charts. [VivaGraphJS](https://github.com/anvaka/VivaGraphJS) - Graph visualization (JS package). -[pm](https://github.com/anvaka/pm) - Navigatable 3D graph visualization (JS package), [example](https://w2v-vis-dot-hcg-team-di.appspot.com/#/galaxy/word2vec?cx=5698&cy=-5135&cz=5923&lx=0.1127&ly=0.3238&lz=-0.1680&lw=0.9242&ml=150&s=1.75&l=1&v=hc). +[pm](https://github.com/anvaka/pm) - Navigatable 3D graph visualization (JS package). [python-ternary](https://github.com/marcharper/python-ternary) - Triangle plots. [falcon](https://github.com/uwdata/falcon) - Interactive visualizations for big data. [hiplot](https://github.com/facebookresearch/hiplot) - High dimensional Interactive Plotting. @@ -296,7 +295,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M #### Dashboards [py-shiny](https://github.com/rstudio/py-shiny) - Shiny for Python, [talk](https://www.youtube.com/watch?v=ijRBbtT2tgc). [superset](https://github.com/apache/superset) - Dashboarding solution by Apache. -[streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](https://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). +[streamlit](https://github.com/streamlit/streamlit) - Dashboarding solution. [Resources](https://github.com/marcskovmadsen/awesome-streamlit), [Gallery](http://awesome-streamlit.org/) [Components](https://www.streamlit.io/components), [bokeh-events](https://github.com/ash2shukla/streamlit-bokeh-events). [mercury](https://github.com/mljar/mercury) - Convert Python notebook to web app, [Example](https://github.com/pplonski/dashboard-python-jupyter-notebook). [dash](https://dash.plot.ly/gallery) - Dashboarding solution by plot.ly. [Resources](https://github.com/ucg8j/awesome-dash). [visdom](https://github.com/facebookresearch/visdom) - Dashboarding library by Facebook. @@ -316,7 +315,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [gmaps](https://github.com/pbugnion/gmaps) - Google Maps for Jupyter notebooks. [stadiamaps](https://stadiamaps.com/) - Plot geographical maps. [datashader](https://github.com/bokeh/datashader) - Draw millions of points on a map. -[sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html) - BallTree, [Example](https://tech.minodes.com/experiments-with-in-memory-spatial-radius-queries-in-python-e40c9e66cf63). +[sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html) - BallTree. [pynndescent](https://github.com/lmcinnes/pynndescent) - Nearest neighbor descent for approximate nearest neighbors. [geocoder](https://github.com/DenisCarriere/geocoder) - Geocoding of addresses, IP addresses. Conversion of different geo formats: [talk](https://www.youtube.com/watch?v=eHRggqAvczE), [repo](https://github.com/dillongardner/PyDataSpatialAnalysis) @@ -325,7 +324,7 @@ Low Level Geospatial Tools (GEOS, GDAL/OGR, PROJ.4) Vector Data (Shapely, Fiona, Pyproj) Raster Data (Rasterio) Plotting (Descartes, Catropy) -Predict economic indicators from Open Street Map [ipynb](https://github.com/njanakiev/osm-predict-economic-measurements/blob/master/osm-predict-economic-indicators.ipynb). +[Predict economic indicators from Open Street Map](https://janakiev.com/blog/osm-predict-economic-indicators/). [PySal](https://github.com/pysal/pysal) - Python Spatial Analysis Library. [geography](https://github.com/ushahidi/geograpy) - Extract countries, regions and cities from a URL or text. [cartogram](https://go-cart.io/cartogram) - Distorted maps based on population. @@ -370,7 +369,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [annoy](https://github.com/spotify/annoy) - Approximate nearest neighbor search. [faiss](https://github.com/facebookresearch/faiss) - Approximate nearest neighbor search. [pysparnn](https://github.com/facebookresearch/pysparnn) - Approximate nearest neighbor search. -[infomap](https://github.com/mapequation/infomap) - Cluster (word-)vectors to find topics, [example](https://github.com/mapequation/infomap/blob/master/examples/python/infomap-examples.ipynb). +[infomap](https://github.com/mapequation/infomap) - Cluster (word-)vectors to find topics. [datasketch](https://github.com/ekzhu/datasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog). [flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando. [stanza](https://github.com/stanfordnlp/stanza) - NLP Library. @@ -486,7 +485,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). [R Tutorial on correcting batch effects](https://broadinstitute.github.io/2019_scWorkshop/correcting-batch-effects.html). [harmonypy](https://github.com/slowkow/harmonypy) - Fuzzy k-means and locally linear adjustments. -[pyliger](https://github.com/welch-lab/pyliger) - Batch-effect correction, [Example](https://github.com/welch-lab/pyliger/blob/master/pyliger/factorization/_iNMF_ANLS.py#L65), [R package](https://github.com/welch-lab/liger). +[pyliger](https://github.com/welch-lab/pyliger) - Batch-effect correction, [R package](https://github.com/welch-lab/liger). [nimfa](https://github.com/mims-harvard/nimfa) - Nonnegative matrix factorization. [scgen](https://github.com/theislab/scgen) - Batch removal. [Doc](https://scgen.readthedocs.io/en/stable/). [CORAL](https://github.com/google-research/google-research/tree/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https://github.com/google-research/google-research/blob/30e54523f08d963ced3fbb37c00e9225579d2e1d/correct_batch_effects_wdn/transform.py#L152), [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7050548/). @@ -514,11 +513,11 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html). ##### Tutorials & Viewer -fast.ai course - [Lessons 1-7](https://course.fast.ai/videos/?lesson=1), [Lessons 8-14](http://course18.fast.ai/lessons/lessons2.html) +[fast.ai course](https://course.fast.ai/) - Practical Deep Learning for Coders. [Tensorflow without a PhD](https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd) - Neural Network course by Google. Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [PPT](http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture12.pdf) [Tensorflow Playground](https://playground.tensorflow.org/) -[Visualization of optimization algorithms](https://vis.ensmallen.org/), [Another visualization](https://github.com/jettify/pytorch-optimizer) +[Visualization of optimization algorithms](http://vis.ensmallen.org/), [Another visualization](https://github.com/jettify/pytorch-optimizer) [cutouts-explorer](https://github.com/mgckind/cutouts-explorer) - Image Viewer. ##### Image Related @@ -706,7 +705,6 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [FCPS](https://github.com/Mthrun/FCPS) - Fundamental Clustering Problems Suite (R package). [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https://www.youtube.com/watch?v=aICqoAG5BXQ). [nmslib](https://github.com/nmslib/nmslib) - Similarity search library and toolkit for evaluation of k-NN methods. -[buckshotpp](https://github.com/zjohn77/buckshotpp) - Outlier-resistant and scalable clustering algorithm. [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [tree-SNE](https://github.com/isaacrob/treesne) - Hierarchical clustering algorithm based on t-SNE. [MiniSom](https://github.com/JustGlowing/minisom) - Pure Python implementation of the Self Organizing Maps. @@ -775,7 +773,7 @@ Other measures: [nupic](https://github.com/numenta/nupic) - Hierarchical Temporal Memory (HTM) for Time Series Prediction and Anomaly Detection. [tensorflow](https://github.com/tensorflow/tensorflow/) - LSTM and others, examples: [link]( https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/ -), [link](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/timeseries), [link](https://github.com/hzy46/TensorFlow-Time-Series-Examples), [Explain LSTM](https://github.com/slundberg/shap/blob/master/notebooks/deep_explainer/Keras%20LSTM%20for%20IMDB%20Sentiment%20Classification.ipynb), seq2seq: [1](https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/), [2](https://github.com/guillaume-chevalier/seq2seq-signal-prediction), [3](https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/TS_Seq2Seq_Intro.ipynb), [4](https://github.com/LukeTonin/keras-seq-2-seq-signal-prediction) +), [link](https://github.com/hzy46/TensorFlow-Time-Series-Examples), seq2seq: [1](https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/), [2](https://github.com/guillaume-chevalier/seq2seq-signal-prediction), [3](https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/TS_Seq2Seq_Intro.ipynb), [4](https://github.com/LukeTonin/keras-seq-2-seq-signal-prediction) [tspreprocess](https://github.com/MaxBenChrist/tspreprocess) - Preprocessing: Denoising, Compression, Resampling. [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. [tsfel](https://github.com/fraunhoferportugal/tsfel) - Time series feature extraction. @@ -783,7 +781,7 @@ https://machinelearningmastery.com/time-series-forecasting-long-short-term-memor [gatspy](https://www.astroml.org/gatspy/) - General tools for Astronomical Time Series, [talk](https://www.youtube.com/watch?v=E4NMZyfao2c). [gendis](https://github.com/IBCNServices/GENDIS) - shapelets, [example](https://github.com/IBCNServices/GENDIS/blob/master/gendis/example.ipynb). [tslearn](https://github.com/rtavenar/tslearn) - Time series clustering and classification, `TimeSeriesKMeans`, `TimeSeriesKMeans`. -[pastas](https://pastas.readthedocs.io/en/latest/examples.html) - Simulation of time series. +[pastas](https://github.com/pastas/pastas) - Analysis of Groundwater Time Series. [fastdtw](https://github.com/slaypni/fastdtw) - Dynamic Time Warp Distance. [fable](https://www.rdocumentation.org/packages/fable/versions/0.0.0.9000) - Time Series Forecasting (R package). [pydlm](https://github.com/wwrechard/pydlm) - Bayesian time series modelling ([R package](https://cran.r-project.org/web/packages/bsts/index.html), [Blog post](http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html)) @@ -850,7 +848,7 @@ RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests). [eif](https://github.com/sahandha/eif) - Extended Isolation Forest. [AnomalyDetection](https://github.com/twitter/AnomalyDetection) - Anomaly detection (R package). [luminol](https://github.com/linkedin/luminol) - Anomaly Detection and Correlation library from Linkedin. -Distances for comparing histograms and detecting outliers - [Talk](https://www.youtube.com/watch?v=U7xdiGc7IRU): [Kolmogorov-Smirnov](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ks_2samp.html), [Wasserstein](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html), [Energy Distance (Cramer)](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.energy_distance.html), [Kullback-Leibler divergence](https://scipy.github.io/devdocs/generated/scipy.stats.entropy.html). +Distances for comparing histograms and detecting outliers - [Talk](https://www.youtube.com/watch?v=U7xdiGc7IRU): [Kolmogorov-Smirnov](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ks_2samp.html), [Wasserstein](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html), [Energy Distance (Cramer)](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.energy_distance.html), [Kullback-Leibler divergence](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kl_div.html). [banpei](https://github.com/tsurubee/banpei) - Anomaly detection library based on singular spectrum transformation. [telemanom](https://github.com/khundman/telemanom) - Detect anomalies in multivariate time series data using LSTMs. [luminaire](https://github.com/zillow/luminaire) - Anomaly Detection for time series. @@ -885,7 +883,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Probabilistic Modelling and Bayes [Intro](https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html), [Guide](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) -[PyMC3](https://docs.pymc.io/) - Bayesian modelling, [intro](https://docs.pymc.io/notebooks/getting_started) +[PyMC3](https://www.pymc.io/projects/docs/en/stable/learn.html) - Bayesian modelling. [numpyro](https://github.com/pyro-ppl/numpyro) - Probabilistic programming with numpy, built on [pyro](https://github.com/pyro-ppl/pyro). [pomegranate](https://github.com/jmschrei/pomegranate) - Probabilistic modelling, [talk](https://www.youtube.com/watch?v=dE5j6NW-Kzg). [pmlearn](https://github.com/pymc-learn/pymc-learn) - Probabilistic machine learning. @@ -931,7 +929,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [lime](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Warning (Myth 7)](https://crazyoscarchang.github.io/2019/02/16/seven-myths-in-machine-learning-research/). [lime_xgboost](https://github.com/jphall663/lime_xgboost) - Create LIMEs for XGBoost. [eli5](https://github.com/TeamHG-Memex/eli5) - Inspecting machine learning classifiers and explaining their predictions. -[lofo-importance](https://github.com/aerdem4/lofo-importance) - Leave One Feature Out Importance, [talk](https://www.youtube.com/watch?v=zqsQ2ojj7sE), examples: [1](https://www.kaggle.com/divrikwicky/pf-f-lofo-importance-on-adversarial-validation), [2](https://www.kaggle.com/divrikwicky/lofo-importance), [3](https://www.kaggle.com/divrikwicky/santanderctp-lofo-feature-importance). +[lofo-importance](https://github.com/aerdem4/lofo-importance) - Leave One Feature Out Importance, [talk](https://www.youtube.com/watch?v=zqsQ2ojj7sE). [pybreakdown](https://github.com/MI2DataLab/pyBreakDown) - Generate feature contribution plots. [pycebox](https://github.com/AustinRochford/PyCEbox) - Individual Conditional Expectation Plot Toolbox. [pdpbox](https://github.com/SauceCat/PDPbox) - Partial dependence plot toolbox, [example](https://www.kaggle.com/dansbecker/partial-plots). @@ -984,7 +982,6 @@ Optometrist algorithm - [paper](https://www.nature.com/articles/s41598-017-06645 [optuna](https://github.com/pfnet/optuna) - Hyperparamter optimization, [Talk](https://www.youtube.com/watch?v=tcrcLRopTX0). [skopt](https://scikit-optimize.github.io/) - `BayesSearchCV` for Hyperparameter search. [tune](https://ray.readthedocs.io/en/latest/tune.html) - Hyperparameter search with a focus on deep learning and deep reinforcement learning. -[hypergraph](https://github.com/aljabr0/hypergraph) - Global optimization methods and hyperparameter optimization. [bbopt](https://github.com/evhub/bbopt) - Black box hyperparameter optimization. [dragonfly](https://github.com/dragonfly/dragonfly) - Scalable Bayesian optimisation. [botorch](https://github.com/pytorch/botorch) - Bayesian optimization in PyTorch. @@ -1096,7 +1093,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning Books](http://matpalm.com/blog/cool_machine_learning_books/) [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) -[Awesome Metric Learning](https://github.com/kdhht2334/Survey_of_Deep_Metric_Learning) [Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) [Awesome Neural Network Visualization](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) @@ -1105,7 +1101,6 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Python](https://github.com/vinta/awesome-python) [Awesome Python Data Science](https://github.com/krzjoa/awesome-python-datascience) [Awesome Python Data Science](https://github.com/thomasjpfan/awesome-python-data-science) -[Awesome Python Data Science](https://github.com/amitness/toolbox) [Awesome Pytorch](https://github.com/bharathgs/Awesome-pytorch-list) [Awesome Quantitative Finance](https://github.com/wilsonfreitas/awesome-quant) [Awesome Recommender Systems](https://github.com/grahamjenson/list_of_recommender_systems) @@ -1121,10 +1116,9 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [NYU Deep Learning SP21](https://www.youtube.com/playlist?list=PLLHTzKZzVU9e6xUfG10TkTWApKSZCzuBI) - YouTube Playlist. #### Things I google a lot -[Color codes](https://github.com/d3/d3-3.x-api-reference/blob/master/Ordinal-Scales.md#categorical-colors) +[Color Codes](https://github.com/d3/d3-3.x-api-reference/blob/master/Ordinal-Scales.md#categorical-colors) [Frequency codes for time series](https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) [Date parsing codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) -[Feature Calculators tsfresh](https://github.com/blue-yonder/tsfresh/blob/master/tsfresh/feature_extraction/feature_calculators.py) ## Contributing Do you know a package that should be on this list? Did you spot a package that is no longer maintained and should be removed from this list? Then feel free to read the [contribution guidelines](CONTRIBUTING.md) and submit your pull request or create a new issue. From 116d32de00ba0a62ad37f7cd06072668902790f8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 Jul 2023 01:13:48 +0200 Subject: [PATCH 439/550] jupyter-scatter --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6f1e24a..ba3c7e5 100644 --- a/README.md +++ b/README.md @@ -286,6 +286,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [largeVis](https://github.com/elbamos/largeVis) - Visualize embeddings (t-SNE etc.) (R package). [proplot](https://github.com/proplot-dev/proplot) - Matplotlib wrapper. [morpheus](https://software.broadinstitute.org/morpheus/) - Broad Institute tool matrix visualization and analysis software. [Source](https://github.com/cmap/morpheus.js), Tutorial: [1](https://www.youtube.com/watch?v=0nkYDeekhtQ), [2](https://www.youtube.com/watch?v=r9mN6MsxUb0), [Code](https://github.com/broadinstitute/BBBC021_Morpheus_Exercise). +[jupyter-scatter](https://github.com/flekschas/jupyter-scatter) - Interactive 2D scatter plot widget for Jupyter. #### Colors [palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). From 1aeddeccc27025ec78c0f23b2447a8acf3b8546e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 Jul 2023 14:10:56 +0200 Subject: [PATCH 440/550] bioimaging --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ba3c7e5..aef94cd 100644 --- a/README.md +++ b/README.md @@ -381,6 +381,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) ##### Tutorials +[bioimaging.org](https://www.bioimagingguide.org/welcome.html) - A biologists guide to planning and performing quantitative bioimaging experiments. [Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. [Bioimaging and Bioimage Analysis Guide](https://www.bioimagingguide.org/welcome.html) From 049b51962acee19385830c60fa56847c9883a8ea Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 14 Jul 2023 14:11:29 +0200 Subject: [PATCH 441/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index aef94cd..3ec81d8 100644 --- a/README.md +++ b/README.md @@ -384,7 +384,6 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [bioimaging.org](https://www.bioimagingguide.org/welcome.html) - A biologists guide to planning and performing quantitative bioimaging experiments. [Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. -[Bioimaging and Bioimage Analysis Guide](https://www.bioimagingguide.org/welcome.html) ##### Datasets [jump-cellpainting](https://github.com/jump-cellpainting/datasets) - Cellpainting dataset. From 610500c9d3b41bc03bcb8ae5f12fc7a6693368e8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 19 Jul 2023 14:03:18 +0200 Subject: [PATCH 442/550] SpectraViewer --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3ec81d8..dc47b4c 100644 --- a/README.md +++ b/README.md @@ -400,6 +400,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www #### Microscopy + Assay [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. +[SpectraViewer](https://www.perkinelmer.com/lab-products-and-services/spectraviewer) - Visualize the spectral compatibility of fluorophores (PerkinElmer). [Thermofisher Spectrum Viewer](https://www.thermofisher.com/order/stain-it) - Thermofisher Spectrum Viewer. [Microscopy Resolution Calculator](https://www.microscope.healthcare.nikon.com/microtools/resolution-calculator) - Calculate resolution of images (Nikon). [PlateEditor](https://github.com/vindelorme/PlateEditor) - Drug Layout for plates, [app](https://plateeditor.sourceforge.io/), [zip](https://sourceforge.net/projects/plateeditor/), [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252488). From 1247afcc79bd3b78fac3a864effd93e336470f31 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 27 Jul 2023 20:36:59 +0200 Subject: [PATCH 443/550] micro-sam --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index dc47b4c..3da6c4e 100644 --- a/README.md +++ b/README.md @@ -463,6 +463,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. [EmbedSeg](https://github.com/juglab/EmbedSeg) - Embedding-based Instance Segmentation. +[micro-sam](https://github.com/computational-cell-analytics/micro-sam) - SegmentAnything for Microscopy. ##### Cell Segmentation Datasets [cellpose](https://www.cellpose.org/dataset) - Cell images. From fb28d59776bec62589ef006ac27d3381348f5651 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 5 Aug 2023 16:50:17 +0200 Subject: [PATCH 444/550] Metrics reloaded --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3da6c4e..98bd3be 100644 --- a/README.md +++ b/README.md @@ -593,8 +593,8 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [visualkeras](https://github.com/paulgavrikov/visualkeras) - Visualize Keras networks. ##### Object detection / Instance Segmentation +[Metrics reloaded: Recommendations for image analysis validation](https://arxiv.org/abs/2206.01653) - Guide for choosing correct image analysis metrics, [Code](https://github.com/Project-MONAI/MetricsReloaded), [Twitter Thread](https://twitter.com/lena_maierhein/status/1625450342006521857) [Good Yolo Explanation](https://jonathan-hui.medium.com/real-time-object-detection-with-yolo-yolov2-28b1b93e2088) -[segmentation_models](https://github.com/qubvel/segmentation_models) - Segmentation models with pretrained backbones: Unet, FPN, Linknet, PSPNet. [yolact](https://github.com/dbolya/yolact) - Fully convolutional model for real-time instance segmentation. [EfficientDet Pytorch](https://github.com/toandaominh1997/EfficientDet.Pytorch), [EfficientDet Keras](https://github.com/xuannianz/EfficientDet) - Scalable and Efficient Object Detection. [detectron2](https://github.com/facebookresearch/detectron2) - Object Detection (Mask R-CNN) by Facebook. From 04836ed61593252fda2fb36200a91786f5ea7b24 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 7 Aug 2023 10:27:06 +0200 Subject: [PATCH 445/550] SCIP --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 98bd3be..f838f6c 100644 --- a/README.md +++ b/README.md @@ -444,6 +444,9 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. +##### Microscopy Pipelines +[SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. + ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). [MCQuant](https://github.com/labsyspharm/quantification) - Quantification of cell features. From 411e6593afda69e44fa6f0b693820203d4d20cea Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 8 Aug 2023 23:51:30 +0200 Subject: [PATCH 446/550] Image Viewers, DeepCell --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f838f6c..2e85a81 100644 --- a/README.md +++ b/README.md @@ -425,7 +425,11 @@ REMBI model - Recommended Metadata for Biological Images, BioImage Archive: [Stu [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing tool. [OMERO](https://www.openmicroscopy.org/omero/) - Image viewer for high-content screening. [IDR](https://idr.openmicroscopy.org/) uses OMERO. [Intro](https://www.youtube.com/watch?v=nSCrMO_c-5s) [fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. - +Image Data Explorer - Microscopy Image Viewer, [Shiny App](https://shiny-portal.embl.de/shinyapps/app/01_image-data-explorer), [Video](https://www.youtube.com/watch?v=H8zIZvOt1MA). +[ImSwitch](https://github.com/ImSwitch/ImSwitch) - Microscopy Image Viewer, [Doc](https://imswitch.readthedocs.io/en/stable/gui.html), [Video](https://www.youtube.com/watch?v=XsbnMkGSPQQ). +[pixmi](https://github.com/piximi/piximi) - Web-based image annotation and classification tool, [App](https://www.piximi.app/). +[DeepCell Label](https://label.deepcell.org/) - Data labeling tool to segment images, [Video](https://www.youtube.com/watch?v=zfsvUBkEeow). + ##### Image Restoration and Denoising [aydin](https://github.com/royerlab/aydin) - Image denoising. [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. @@ -446,6 +450,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins ##### Microscopy Pipelines [SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. +[DeepCell Kiosk](https://github.com/vanvalenlab/kiosk-console/tree/master) - Image analysis platform. ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). @@ -467,6 +472,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. [EmbedSeg](https://github.com/juglab/EmbedSeg) - Embedding-based Instance Segmentation. [micro-sam](https://github.com/computational-cell-analytics/micro-sam) - SegmentAnything for Microscopy. +[deepcell-tf](https://github.com/vanvalenlab/deepcell-tf/tree/master) - Cell segmentation, [DeepCell](https://deepcell.org/). ##### Cell Segmentation Datasets [cellpose](https://www.cellpose.org/dataset) - Cell images. From 50b7750592e4b07fef5f12c969e90de6e0eea91c Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 10 Aug 2023 10:51:39 +0200 Subject: [PATCH 447/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2e85a81..1be411f 100644 --- a/README.md +++ b/README.md @@ -444,7 +444,7 @@ Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.yout Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). ##### Platforms and Pipelines -[fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data. +[fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. From d4dd18a4db89df9ede98b831197a7ec0277a271a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 11 Aug 2023 00:55:58 +0200 Subject: [PATCH 448/550] huey --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1be411f..3b5c59f 100644 --- a/README.md +++ b/README.md @@ -1027,6 +1027,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [kestra](https://github.com/kestra-io/kestra) - Workflow orchestration. [cml](https://github.com/iterative/cml) - CI/CD for Machine Learning Projects. [rocketry](https://github.com/Miksus/rocketry) - Task scheduling. +[huey](https://github.com/coleifer/huey) - Task queue. ##### Containerization and Docker [Reduce size of docker images (video)](https://www.youtube.com/watch?v=Z1Al4I4Os_A) From 5fb3e5d41953fd4b690b617221283a61b3bcfa11 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 11 Aug 2023 00:56:18 +0200 Subject: [PATCH 449/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3b5c59f..c0fccf7 100644 --- a/README.md +++ b/README.md @@ -1019,7 +1019,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe #### Deployment and Lifecycle Management ##### Workflow Scheduling and Orchestration -[nextflow](https://github.com/goodwright/nextflow.py) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch, [Website](https://github.com/nextflow-io/nextflow). +[nextflow](https://github.com/goodwright/nextflow.py) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch, [Website](https://github.com/nextflow-io/nextflow). [airflow](https://github.com/apache/airflow) - Schedule and monitor workflows. [prefect](https://github.com/PrefectHQ/prefect) - Python specific workflow scheduling. [dagster](https://github.com/dagster-io/dagster) - Development, production and observation of data assets. From 263569ad02d564da78136aaf1094fa36ff6cc75b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 12 Aug 2023 17:17:41 +0200 Subject: [PATCH 450/550] pyenv --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index c0fccf7..846d4b1 100644 --- a/README.md +++ b/README.md @@ -13,14 +13,16 @@ [rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - VSCode plugin to display .csv files with nice colors. #### General Python Programming +[Python Best Practices Guide](https://github.com/qiwihui/pocket_readings/issues/1148#issuecomment-874448132) +[pyenv](https://github.com/pyenv/pyenv) - Manage multiple Python versions on your system. +[poetry](https://github.com/python-poetry/poetry) - Dependency management. +[pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. +[hydra](https://github.com/facebookresearch/hydra) - Configuration management. +[hatch](https://github.com/pypa/hatch) - Python project management. [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). [loguru](https://github.com/Delgan/loguru) - Python logging. -[dateparser](https://github.com/scrapinghub/dateparser) - A better date parser. -[hydra](https://github.com/facebookresearch/hydra) - Configuration management. -[pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. -[poetry](https://github.com/python-poetry/poetry) - Dependency management. -[hatch](https://github.com/pypa/hatch) - Python project management. + #### Pandas Tricks, Alternatives and Additions [pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. From 4a30e4de6bde7e456582840745ba1f84d11db26a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 5 Sep 2023 13:54:01 +0200 Subject: [PATCH 451/550] High-Content Screening Assay Design --- README.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 846d4b1..adc985b 100644 --- a/README.md +++ b/README.md @@ -395,11 +395,19 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [broadinstitute/lincs-profiling-complementarity](https://github.com/broadinstitute/lincs-profiling-complementarity) - Cellpainting vs. L1000 assay. #### Biostatistics / Robust statistics -[Z-factor](https://en.wikipedia.org/wiki/Z-factor) - Measure of statistical effect size. [MinCovDet](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.MinCovDet.html) - Robust estimator of covariance, RMPV, [Paper](https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421), [App1](https://journals.sagepub.com/doi/10.1177/1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [App2](https://www.cell.com/cell-reports/pdf/S2211-1247(21)00694-X.pdf). [moderated z-score](https://clue.io/connectopedia/replicate_collapse) - Weighted average of z-scores based on Spearman correlation. [winsorize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers. +#### High-Content Screening Assay Design +[Zhang XHD (2008) - Novel analytic criteria and effective plate designs for quality control in genome-wide RNAi screens](https://slas-discovery.org/article/S2472-5552(22)08204-1/pdf) +[Iversen - A Comparison of Assay Performance Measures in Screening Assays, Signal Window, Z′ Factor, and Assay Variability Ratio](https://www.slas-discovery.org/article/S2472-5552(22)08460-X/pdf) +[Z-factor](https://en.wikipedia.org/wiki/Z-factor) - Measure of statistical effect size. +[Z'-factor](https://link.springer.com/referenceworkentry/10.1007/978-3-540-47648-1_6298) - Measure of statistical effect size. +[CV](https://en.wikipedia.org/wiki/Coefficient_of_variation) - Coefficient of variation. +[SSMD](https://en.wikipedia.org/wiki/Strictly_standardized_mean_difference) - Strictly standardized mean difference. +[Signal Window](https://www.intechopen.com/chapters/48130) - Assay quality measurement. + #### Microscopy + Assay [BD Spectrum Viewer](https://www.bdbiosciences.com/en-us/resources/bd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes. [SpectraViewer](https://www.perkinelmer.com/lab-products-and-services/spectraviewer) - Visualize the spectral compatibility of fluorophores (PerkinElmer). From 2a219e593baa42704999a5aa47500280b8b81698 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 12 Sep 2023 16:29:11 +0200 Subject: [PATCH 452/550] How large is that number in the Law of Large Numbers? --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index adc985b..59074d6 100644 --- a/README.md +++ b/README.md @@ -132,7 +132,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Wainer - The Most Dangerous Equation](http://nsmn1.uh.edu/dgraur/niv/themostdangerousequation.pdf) [Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) -[Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https://www.youtube.com/watch?v=DbJyPELmhJc) +[Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https://www.youtube.com/watch?v=DbJyPELmhJc) +[How large is that number in the Law of Large Numbers?](https://thepalindrome.org/p/how-large-that-number-in-the-law) #### Epidemiology [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) From fc2a64aa0eef5917e3313e240706210ebb6a609f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 26 Sep 2023 13:54:56 +0200 Subject: [PATCH 453/550] monkeybread --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 59074d6..0f0a034 100644 --- a/README.md +++ b/README.md @@ -523,6 +523,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [besca](https://github.com/bedapub/besca) - Beyond single-cell analysis. [janggu](https://github.com/BIMSBbioinfo/janggu) - Deep Learning for Genomics. [gdsctools](https://github.com/CancerRxGene/gdsctools) - Drug responses in the context of the Genomics of Drug Sensitivity in Cancer project, ANOVA, IC50, MoBEM, [doc](https://gdsctools.readthedocs.io/en/master/). +[monkeybread](https://github.com/immunitastx/monkeybread) - Analysis of single-cell spatial transcriptomics data. ##### Drug discovery [TDC](https://github.com/mims-harvard/TDC/tree/main) - Drug Discovery and Development. From bb229efc74a54708ba32df133335d7204448a8bb Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 29 Sep 2023 00:22:42 +0200 Subject: [PATCH 454/550] temporian --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0f0a034..11ffa79 100644 --- a/README.md +++ b/README.md @@ -173,6 +173,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [mlxtend](https://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/) - LDA. [featuretools](https://github.com/Featuretools/featuretools) - Automated feature engineering, [example](https://github.com/WillKoehrsen/automated-feature-engineering/blob/master/walk_through/Automated_Feature_Engineering.ipynb). [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. +[temporian](https://github.com/google/temporian) - Time series feature engineering by Google. [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. [feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. From f234ffb9b01f5c5d2028fa1105ed1f27d12b3857 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 13 Oct 2023 11:23:50 +0200 Subject: [PATCH 455/550] qupath --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 11ffa79..8cd585c 100644 --- a/README.md +++ b/README.md @@ -456,9 +456,11 @@ Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.yout Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). ##### Platforms and Pipelines +[CellProfiler](https://github.com/CellProfiler/CellProfiler), [CellProfilerAnalyst](https://github.com/CellProfiler/CellProfiler-Analyst) - Create image analysis pipelines. [fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data. [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. +[qupath](https://github.com/qupath/qupath) - Image analysis. ##### Microscopy Pipelines [SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. From cc79a3f43f9c1700e88c75902189cf306bfc9556 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 13 Oct 2023 17:17:49 +0200 Subject: [PATCH 456/550] The Prosecutor's Fallacy --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8cd585c..adb902f 100644 --- a/README.md +++ b/README.md @@ -133,7 +133,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Gigerenzer - The Bias Bias in Behavioral Economics](https://www.nowpublishers.com/article/Details/RBE-0092) [Cook - Estimating the chances of something that hasn’t happened yet](https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/) [Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https://www.youtube.com/watch?v=DbJyPELmhJc) -[How large is that number in the Law of Large Numbers?](https://thepalindrome.org/p/how-large-that-number-in-the-law) +[How large is that number in the Law of Large Numbers?](https://thepalindrome.org/p/how-large-that-number-in-the-law) +[The Prosecutor's Fallacy](https://www.cebm.ox.ac.uk/news/views/the-prosecutors-fallacy) #### Epidemiology [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) From 883da97c42303941b2c3ef8d59b7299899185555 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 20 Oct 2023 11:25:23 +0200 Subject: [PATCH 457/550] IMCWorkflow --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index adb902f..d385516 100644 --- a/README.md +++ b/README.md @@ -466,6 +466,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins ##### Microscopy Pipelines [SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. [DeepCell Kiosk](https://github.com/vanvalenlab/kiosk-console/tree/master) - Image analysis platform. +[IMCWorkflow](https://github.com/BodenmillerGroup/IMCWorkflow/) - Image analysis pipeline using [steinbock](https://github.com/BodenmillerGroup/steinbock), [Twitter](https://twitter.com/NilsEling/status/1715020265963258087), [Paper](https://www.nature.com/articles/s41596-023-00881-0), [workflow](https://bodenmillergroup.github.io/IMCDataAnalysis/). ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). From 4cfdafac78d7b9a96c5cbcb3c1a3754c9ff38472 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 21 Oct 2023 13:34:59 +0200 Subject: [PATCH 458/550] evaluate --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d385516..bf254bf 100644 --- a/README.md +++ b/README.md @@ -940,6 +940,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [combo](https://github.com/yzhao062/combo) - Combining ML models (stacking, ensembling). #### Model Evaluation +[evaluate](https://github.com/huggingface/evaluate) - Evaluate machine learning models (huggingface). [pycm](https://github.com/sepandhaghighi/pycm) - Multi-class confusion matrix. [pandas_ml](https://github.com/pandas-ml/pandas-ml) - Confusion matrix. Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learning-curve/). From d84a3d82efe8c4e3769b72c2b404407d99eddfff Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 21 Oct 2023 16:07:38 +0200 Subject: [PATCH 459/550] feature-engine --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bf254bf..964136f 100644 --- a/README.md +++ b/README.md @@ -176,7 +176,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [tsfresh](https://github.com/blue-yonder/tsfresh) - Time series feature engineering. [temporian](https://github.com/google/temporian) - Time series feature engineering by Google. [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. -[feature_engine](https://github.com/solegalli/feature_engine) - Encoders, transformers, etc. +[feature-engine](https://github.com/feature-engine/feature_engine) - Encoders, transformers, etc. #### Computer Vision [Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) From c2907bf8209134f1df9f9ec36b8e6301bf2a3e1d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 23 Oct 2023 21:12:08 +0200 Subject: [PATCH 460/550] Permutation Importance --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 964136f..59eb120 100644 --- a/README.md +++ b/README.md @@ -954,6 +954,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin #### Model Explanation, Interpretability, Feature Importance [Princeton - Reproducibility Crisis in ML‑based Science](https://sites.google.com/princeton.edu/rep-workshop) [Book](https://christophm.github.io/interpretable-ml-book/agnostic.html), [Examples](https://github.com/jphall663/interpretable_machine_learning_with_python) +scikit-learn - [Permutation Importance](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html) (can be used on any trained classifier) and [Partial Dependence](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.partial_dependence.html) [shap](https://github.com/slundberg/shap) - Explain predictions of machine learning models, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Good Shap intro](https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/). [treeinterpreter](https://github.com/andosa/treeinterpreter) - Interpreting scikit-learn's decision tree and random forest predictions. [lime](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Warning (Myth 7)](https://crazyoscarchang.github.io/2019/02/16/seven-myths-in-machine-learning-research/). From aa0ffe32d7860a7b16ac0e042d9ff88a93d07f81 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 27 Oct 2023 14:59:13 +0200 Subject: [PATCH 461/550] pyvips --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 59eb120..9392e5f 100644 --- a/README.md +++ b/README.md @@ -508,6 +508,7 @@ Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins [mahotas](https://github.com/luispedro/mahotas) - Zernike, Haralick, LBP, and TAS features, [example](https://github.com/luispedro/python-image-tutorial/blob/master/Segmenting%20cell%20images%20(fluorescent%20microscopy).ipynb). [pyradiomics](https://github.com/AIM-Harvard/pyradiomics) - Radiomics features from medical imaging. [pyefd](https://github.com/hbldh/pyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series. +[pyvips](https://github.com/libvips/pyvips/tree/master) - Faster image processing operations. #### Domain Adaptation / Batch-Effect Correction [Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9), [Code](https://github.com/JinmiaoChenLab/Batch-effect-removal-benchmarking). @@ -557,6 +558,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [augmix](https://github.com/google-research/augmix) - Image augmentation from Google. [kornia](https://github.com/kornia/kornia) - Image augmentation, feature extraction and loss functions. [augly](https://github.com/facebookresearch/AugLy) - Image, audio, text, video augmentation from Facebook. +[pyvips](https://github.com/libvips/pyvips/tree/master) - Faster image processing operations. ##### Lossfunction Related [SegLoss](https://github.com/JunMa11/SegLoss) - List of loss functions for medical image segmentation. From e8b3a1db13e649a6b288df0823fa1e85a6c2ff0e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 27 Oct 2023 15:29:11 +0200 Subject: [PATCH 462/550] Filters --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9392e5f..4bb018d 100644 --- a/README.md +++ b/README.md @@ -781,12 +781,16 @@ Other measures: #### Signal Processing and Filtering [Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). [Visual Fourier explanation](https://dsego.github.io/demystifying-fourier/). -[The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html). +[The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https://www.analog.com/en/education/education-library/scientist_engineers_guide.html) - Chapter 3 has good introduction to Bessel, Butterworth and Chebyshev filters. [Kalman Filter article](https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures). [Kalman Filter book](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Bayesian and various Kalman filters. [Interactive Tool](https://fiiir.com/) for FIR and IIR filters, [Examples](https://plot.ly/python/fft-filters/). [filterpy](https://github.com/rlabbe/filterpy) - Kalman filtering and optimal estimation library. +#### Filtering in Python +[scipy.signal](https://docs.scipy.org/doc/scipy/reference/signal.html) - [Butterworth low-pass filter example](https://github.com/guillaume-chevalier/filtering-stft-and-laplace-transform), [Savitzky–Golay filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html'), [W](https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter) +[pandas.Series.rolling](https://pandas.pydata.org/docs/reference/api/pandas.Series.rolling.html) - Choose appropriate `win_type`. + #### Geometry [geomstats](https://github.com/geomstats/geomstats) - Computations and statistics on manifolds with geometric structures. From 2b119dc2013b5e41c1f59c9ba0ce18618de455fd Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 27 Oct 2023 15:40:30 +0200 Subject: [PATCH 463/550] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4bb018d..3f072b1 100644 --- a/README.md +++ b/README.md @@ -788,7 +788,9 @@ Other measures: [filterpy](https://github.com/rlabbe/filterpy) - Kalman filtering and optimal estimation library. #### Filtering in Python -[scipy.signal](https://docs.scipy.org/doc/scipy/reference/signal.html) - [Butterworth low-pass filter example](https://github.com/guillaume-chevalier/filtering-stft-and-laplace-transform), [Savitzky–Golay filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html'), [W](https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter) +[scipy.signal](https://docs.scipy.org/doc/scipy/reference/signal.html) +* [Butterworth low-pass filter example](https://github.com/guillaume-chevalier/filtering-stft-and-laplace-transform) +* [Savitzky–Golay filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html), [W](https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter) [pandas.Series.rolling](https://pandas.pydata.org/docs/reference/api/pandas.Series.rolling.html) - Choose appropriate `win_type`. #### Geometry From 9ca943e8a0ae9026078bd3822e404adab43d8df0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 31 Oct 2023 21:24:08 +0100 Subject: [PATCH 464/550] PICASSO --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3f072b1..2332bda 100644 --- a/README.md +++ b/README.md @@ -448,10 +448,13 @@ Image Data Explorer - Microscopy Image Viewer, [Shiny App](https://shiny-portal. [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Content-aware image restoration, [Project page](https://csbdeep.bioimagecomputing.com/tools/). -##### Illumination correction + Bleed through correction +##### Illumination correction [skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). [cidre](https://github.com/smithk/cidre) - Illumination correction method for optical microscopy. [BaSiCPy](https://github.com/peng-lab/BaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https://github.com/marrlab/BaSiC). + +##### Bleedthrough correction / Spectral Unmixing +[PICASSO](https://github.com/nygctech/PICASSO) - Blind unmixing without reference spectra measurement, [Paper](https://www.biorxiv.org/content/10.1101/2021.01.27.428247v1.full) [cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). From d345848d25e3e6bd24295794e30847e4867bb5d5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 2 Nov 2023 10:27:34 +0100 Subject: [PATCH 465/550] AutoUnmix --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2332bda..f45a9aa 100644 --- a/README.md +++ b/README.md @@ -458,6 +458,7 @@ Image Data Explorer - Microscopy Image Viewer, [Shiny App](https://shiny-portal. [cytoflow](https://github.com/cytoflow/cytoflow) - Flow cytometry. Includes Bleedthrough correction methods. Linear unmixing in Fiji for Bleedthrough Correction - [Youtube](https://www.youtube.com/watch?v=W90qs0J29v8). Bleedthrough Correction using Lumos and Fiji - [Link](https://imagej.net/plugins/lumos-spectral-unmixing). +AutoUnmix - [Link](https://www.biorxiv.org/content/10.1101/2023.05.30.542836v1.full). ##### Platforms and Pipelines [CellProfiler](https://github.com/CellProfiler/CellProfiler), [CellProfilerAnalyst](https://github.com/CellProfiler/CellProfiler-Analyst) - Create image analysis pipelines. From 364baeeada1403e7a8845c89789506d65726faa8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 2 Nov 2023 16:04:27 +0100 Subject: [PATCH 466/550] Segment Anything and Segment Everything Everywhere All at Once --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f45a9aa..300cac6 100644 --- a/README.md +++ b/README.md @@ -491,9 +491,13 @@ AutoUnmix - [Link](https://www.biorxiv.org/content/10.1101/2023.05.30.542836v1.f [Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. [EmbedSeg](https://github.com/juglab/EmbedSeg) - Embedding-based Instance Segmentation. -[micro-sam](https://github.com/computational-cell-analytics/micro-sam) - SegmentAnything for Microscopy. +[segment-anything](https://github.com/facebookresearch/segment-anything) - Segment Anything (SAM) from Facebook. +[micro-sam](https://github.com/computational-cell-analytics/micro-sam) - Segment Anything for Microscopy. +[Segment-Everything-Everywhere-All-At-Once](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once) - Segment Everything Everywhere All at Once from Microsoft. [deepcell-tf](https://github.com/vanvalenlab/deepcell-tf/tree/master) - Cell segmentation, [DeepCell](https://deepcell.org/). +https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once + ##### Cell Segmentation Datasets [cellpose](https://www.cellpose.org/dataset) - Cell images. [omnipose](http://www.cellpose.org/dataset_omnipose) - Cell images. From c98c9d94b2c277eea3eac8c5382280e5f2f735e0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 4 Nov 2023 23:21:21 +0100 Subject: [PATCH 467/550] BiaPy --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 300cac6..900707e 100644 --- a/README.md +++ b/README.md @@ -468,6 +468,8 @@ AutoUnmix - [Link](https://www.biorxiv.org/content/10.1101/2023.05.30.542836v1.f [qupath](https://github.com/qupath/qupath) - Image analysis. ##### Microscopy Pipelines +Labsyspharm Stack see below. +[BiaPy](https://github.com/danifranco/BiaPy) - Bioimage analysis pipelines. [SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. [DeepCell Kiosk](https://github.com/vanvalenlab/kiosk-console/tree/master) - Image analysis platform. [IMCWorkflow](https://github.com/BodenmillerGroup/IMCWorkflow/) - Image analysis pipeline using [steinbock](https://github.com/BodenmillerGroup/steinbock), [Twitter](https://twitter.com/NilsEling/status/1715020265963258087), [Paper](https://www.nature.com/articles/s41596-023-00881-0), [workflow](https://bodenmillergroup.github.io/IMCDataAnalysis/). From 6b89c8ab34e6f9704465302830c6d0feb21b6598 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 5 Nov 2023 00:15:57 +0100 Subject: [PATCH 468/550] DL4MicEverywhere --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 900707e..3e0c679 100644 --- a/README.md +++ b/README.md @@ -492,6 +492,7 @@ Labsyspharm Stack see below. [allencell](https://www.allencell.org/segmenter.html) - Tools for 3D segmentation, classical and deep learning methods. [Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. [ZeroCostDL4Mic](https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki) - Deep-Learning in Microscopy. +[DL4MicEverywhere](https://github.com/HenriquesLab/DL4MicEverywhere) - Bringing the ZeroCostDL4Mic experience using Docker. [EmbedSeg](https://github.com/juglab/EmbedSeg) - Embedding-based Instance Segmentation. [segment-anything](https://github.com/facebookresearch/segment-anything) - Segment Anything (SAM) from Facebook. [micro-sam](https://github.com/computational-cell-analytics/micro-sam) - Segment Anything for Microscopy. From 1a136db5c9c6a9dfc63117dad868466a368da1a5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Nov 2023 16:08:54 +0100 Subject: [PATCH 469/550] ilastik --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3e0c679..ad911f3 100644 --- a/README.md +++ b/README.md @@ -488,6 +488,7 @@ Labsyspharm Stack see below. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. +[ilastik](https://github.com/ilastik/ilastik) - Segment, classify, track and count cells. [ImageJ Plugin](https://github.com/ilastik/ilastik4ij). [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. [allencell](https://www.allencell.org/segmenter.html) - Tools for 3D segmentation, classical and deep learning methods. [Cell-ACDC](https://github.com/SchmollerLab/Cell_ACDC) - Python GUI for cell segmentation and tracking. From c5922a213ae260b2256370a78844e89ad0109254 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Nov 2023 16:33:00 +0100 Subject: [PATCH 470/550] labkit --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index ad911f3..dda4bc3 100644 --- a/README.md +++ b/README.md @@ -499,8 +499,7 @@ Labsyspharm Stack see below. [micro-sam](https://github.com/computational-cell-analytics/micro-sam) - Segment Anything for Microscopy. [Segment-Everything-Everywhere-All-At-Once](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once) - Segment Everything Everywhere All at Once from Microsoft. [deepcell-tf](https://github.com/vanvalenlab/deepcell-tf/tree/master) - Cell segmentation, [DeepCell](https://deepcell.org/). - -https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once +[labkit](https://github.com/juglab/labkit-ui) - Fiji plugin for image segmentation. ##### Cell Segmentation Datasets [cellpose](https://www.cellpose.org/dataset) - Cell images. From e456dcbec9516494b4f4a2bddcb8c0e94e6fcf56 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 8 Nov 2023 10:18:33 +0100 Subject: [PATCH 471/550] Napari Plugins --- README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index dda4bc3..dfc137b 100644 --- a/README.md +++ b/README.md @@ -432,17 +432,21 @@ REMBI model - Recommended Metadata for Biological Images, BioImage Archive: [Stu [bdz](https://github.com/openssbd/bdz) - Zarr-based format for storing quantitative biological dynamics data. #### Image Viewers -[vizarr](https://github.com/hms-dbmi/vizarr) - Browser-based image viewer for zarr format. -[avivator](https://github.com/hms-dbmi/viv) - Browser-based image viewer for tiff files. [napari](https://github.com/napari/napari) - Image viewer and image processing tool. [Fiji](https://fiji.sc/) - General purpose tool. Image viewer and image processing tool. +[vizarr](https://github.com/hms-dbmi/vizarr) - Browser-based image viewer for zarr format. +[avivator](https://github.com/hms-dbmi/viv) - Browser-based image viewer for tiff files. [OMERO](https://www.openmicroscopy.org/omero/) - Image viewer for high-content screening. [IDR](https://idr.openmicroscopy.org/) uses OMERO. [Intro](https://www.youtube.com/watch?v=nSCrMO_c-5s) [fiftyone](https://github.com/voxel51/fiftyone) - Viewer and tool for building high-quality datasets and computer vision models. Image Data Explorer - Microscopy Image Viewer, [Shiny App](https://shiny-portal.embl.de/shinyapps/app/01_image-data-explorer), [Video](https://www.youtube.com/watch?v=H8zIZvOt1MA). [ImSwitch](https://github.com/ImSwitch/ImSwitch) - Microscopy Image Viewer, [Doc](https://imswitch.readthedocs.io/en/stable/gui.html), [Video](https://www.youtube.com/watch?v=XsbnMkGSPQQ). [pixmi](https://github.com/piximi/piximi) - Web-based image annotation and classification tool, [App](https://www.piximi.app/). [DeepCell Label](https://label.deepcell.org/) - Data labeling tool to segment images, [Video](https://www.youtube.com/watch?v=zfsvUBkEeow). - + +#### Napari Plugins +[napari-sam](https://github.com/MIC-DKFZ/napari-sam) - Segment Anything Plugin. +[napari-chatgpt](https://github.com/royerlab/napari-chatgpt) - ChatGPT Plugin. + ##### Image Restoration and Denoising [aydin](https://github.com/royerlab/aydin) - Image denoising. [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. From edf75297f78ee5209a44ce5b0afc40b3efc7b0f2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 8 Nov 2023 11:00:55 +0100 Subject: [PATCH 472/550] connectomics --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index dfc137b..fe0b355 100644 --- a/README.md +++ b/README.md @@ -511,6 +511,7 @@ Labsyspharm Stack see below. [LIVECell](https://github.com/sartorius-research/LIVECell) - Cell images. [Sartorius](https://www.kaggle.com/competitions/sartorius-cell-instance-segmentation/overview) - Neurons. [EmbedSeg](https://github.com/juglab/EmbedSeg/releases/tag/v0.1.0) - 2D + 3D images. +[connectomics](https://sites.google.com/view/connectomics/) - Annotation of the EPFL Hippocampus dataset. ##### Evaluation [seg-eval](https://github.com/lstrgar/seg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https://www.biorxiv.org/content/10.1101/2023.02.23.529809v1.full.pdf). From df191d0415a7009efec2f43de57d0e13d3f0a798 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 8 Nov 2023 22:09:17 +0100 Subject: [PATCH 473/550] Fractal --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index fe0b355..79e8482 100644 --- a/README.md +++ b/README.md @@ -477,6 +477,7 @@ Labsyspharm Stack see below. [SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. [DeepCell Kiosk](https://github.com/vanvalenlab/kiosk-console/tree/master) - Image analysis platform. [IMCWorkflow](https://github.com/BodenmillerGroup/IMCWorkflow/) - Image analysis pipeline using [steinbock](https://github.com/BodenmillerGroup/steinbock), [Twitter](https://twitter.com/NilsEling/status/1715020265963258087), [Paper](https://www.nature.com/articles/s41596-023-00881-0), [workflow](https://bodenmillergroup.github.io/IMCDataAnalysis/). +[Fractal](https://fractal-analytics-platform.github.io/) - Image analytics pipeline, [Github](https://github.com/fractal-analytics-platform). ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). From 9065bfa331d916be51fb061830b87ab7216e5db1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 8 Nov 2023 22:10:20 +0100 Subject: [PATCH 474/550] Update README.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 79e8482..784a56a 100644 --- a/README.md +++ b/README.md @@ -466,7 +466,7 @@ AutoUnmix - [Link](https://www.biorxiv.org/content/10.1101/2023.05.30.542836v1.f ##### Platforms and Pipelines [CellProfiler](https://github.com/CellProfiler/CellProfiler), [CellProfilerAnalyst](https://github.com/CellProfiler/CellProfiler-Analyst) - Create image analysis pipelines. -[fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data. +[fractal](https://fractal-analytics-platform.github.io/) - Framework to process high-content imaging data from UZH, [Github](https://github.com/fractal-analytics-platform). [atomai](https://github.com/pycroscopy/atomai) - Deep and Machine Learning for Microscopy. [py-clesperanto](https://github.com/clesperanto/pyclesperanto_prototype/) - Tools for 3D microscopy analysis, [deskewing](https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/demo/transforms/deskew.ipynb) and lots of other tutorials, interacts with napari. [qupath](https://github.com/qupath/qupath) - Image analysis. @@ -477,7 +477,6 @@ Labsyspharm Stack see below. [SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. [DeepCell Kiosk](https://github.com/vanvalenlab/kiosk-console/tree/master) - Image analysis platform. [IMCWorkflow](https://github.com/BodenmillerGroup/IMCWorkflow/) - Image analysis pipeline using [steinbock](https://github.com/BodenmillerGroup/steinbock), [Twitter](https://twitter.com/NilsEling/status/1715020265963258087), [Paper](https://www.nature.com/articles/s41596-023-00881-0), [workflow](https://bodenmillergroup.github.io/IMCDataAnalysis/). -[Fractal](https://fractal-analytics-platform.github.io/) - Image analytics pipeline, [Github](https://github.com/fractal-analytics-platform). ##### Labsyspharm [mcmicro](https://github.com/labsyspharm/mcmicro) - Multiple-choice microscopy pipeline, [Website](https://mcmicro.org/overview/), [Paper](https://www.nature.com/articles/s41592-021-01308-y). From 9abb6e524f259102de65bb0c7f1270ef05098ec5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 9 Nov 2023 11:17:03 +0100 Subject: [PATCH 475/550] Review of organoid pipelines --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 784a56a..6f1e8b4 100644 --- a/README.md +++ b/README.md @@ -487,6 +487,7 @@ Labsyspharm Stack see below. ##### Cell Segmentation [microscopy-tree](https://biomag-lab.github.io/microscopy-tree/) - Review of cell segmentation algorithms, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0962892421002518). +Review of organoid pipelines - [Paper](https://arxiv.org/ftp/arxiv/papers/2301/2301.02341.pdf). [BioImage.IO](https://bioimage.io/#/) - BioImage Model Zoo. [MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). From 7a4d4fa2c9f5be6332f921ca1c1f939608d76456 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 13 Nov 2023 15:25:40 +0100 Subject: [PATCH 476/550] Satellite Image Lists --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 6f1e8b4..04a1e83 100644 --- a/README.md +++ b/README.md @@ -1161,6 +1161,8 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Pytorch](https://github.com/bharathgs/Awesome-pytorch-list) [Awesome Quantitative Finance](https://github.com/wilsonfreitas/awesome-quant) [Awesome Recommender Systems](https://github.com/grahamjenson/list_of_recommender_systems) +[Awesome Satellite Benchmark Datasets](https://github.com/Seyed-Ali-Ahmadi/Awesome_Satellite_Benchmark_Datasets) +[Awesome Satellite Image for Deep Learning](https://github.com/satellite-image-deep-learning/techniques) [Awesome Single Cell](https://github.com/seandavi/awesome-single-cell) [Awesome Semantic Segmentation](https://github.com/mrgloom/awesome-semantic-segmentation) [Awesome Sentence Embedding](https://github.com/Separius/awesome-sentence-embedding) From 227d21fabb472fa2a79632f8e527905f8f0bd380 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Nov 2023 10:04:50 +0100 Subject: [PATCH 477/550] Awesome Biological Image Analysis --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 04a1e83..525e76a 100644 --- a/README.md +++ b/README.md @@ -1129,6 +1129,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome AI Booksmarks](https://github.com/goodrahstar/my-awesome-AI-bookmarks) [Awesome AI on Kubernetes](https://github.com/CognonicLabs/awesome-AI-kubernetes) [Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) +[Awesome Biological Image Analysis](https://github.com/hallvaaw/awesome-biological-image-analysis) [Awesome Business Machine Learning](https://github.com/firmai/business-machine-learning) [Awesome Causality](https://github.com/rguo12/awesome-causality-algorithms) [Awesome Community Detection](https://github.com/benedekrozemberczki/awesome-community-detection) From ff944cbb0b8a05ae8976c0a47e355354d1cef819 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 15 Nov 2023 16:32:35 +0100 Subject: [PATCH 478/550] ZeroCostDL4Mic training dataset --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 525e76a..9cca59c 100644 --- a/README.md +++ b/README.md @@ -513,6 +513,7 @@ Review of organoid pipelines - [Paper](https://arxiv.org/ftp/arxiv/papers/2301/2 [Sartorius](https://www.kaggle.com/competitions/sartorius-cell-instance-segmentation/overview) - Neurons. [EmbedSeg](https://github.com/juglab/EmbedSeg/releases/tag/v0.1.0) - 2D + 3D images. [connectomics](https://sites.google.com/view/connectomics/) - Annotation of the EPFL Hippocampus dataset. +[ZeroCostDL4Mic](https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD895) - Stardist example training and test dataset. ##### Evaluation [seg-eval](https://github.com/lstrgar/seg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https://www.biorxiv.org/content/10.1101/2023.02.23.529809v1.full.pdf). From fdcff75c0ba93e5c01704c2f44c3c3fd6c2d11c6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 21 Nov 2023 18:34:31 +0100 Subject: [PATCH 479/550] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9cca59c..716edf3 100644 --- a/README.md +++ b/README.md @@ -928,9 +928,10 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [causallib](https://github.com/IBM/causallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https://github.com/IBM/causallib/tree/master/examples). [causalml](https://github.com/uber/causalml) - Causal inference by Uber. [upliftml](https://github.com/bookingcom/upliftml) - Causal inference by Booking.com. -[EconML](https://github.com/microsoft/EconML) - Heterogeneous Treatment Effects Estimation by Microsoft. [causality](https://github.com/akelleh/causality) - Causal analysis using observational datasets. [DoubleML](https://github.com/DoubleML/doubleml-for-py) - Machine Learning + Causal inference, [Tweet](https://twitter.com/ChristophMolnar/status/1574338002305880068), [Presentation](https://scholar.princeton.edu/sites/default/files/bstewart/files/felton.chern_.slides.20190318.pdf), [Paper](https://arxiv.org/abs/1608.00060v1). +[EconML](https://github.com/py-why/EconML) - Heterogeneous Treatment Effects Estimation by Microsoft. + ##### Papers [Bours - Confounding](https://edisciplinas.usp.br/pluginfile.php/5625667/mod_resource/content/3/Nontechnicalexplanation-counterfactualdefinition-confounding.pdf) From eca7dcf871a8f8eec6c23ef7f91470e892063a51 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 27 Nov 2023 20:55:01 +0100 Subject: [PATCH 480/550] The Dunning-Kruger Effect is Autocorrelation --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 716edf3..99069be 100644 --- a/README.md +++ b/README.md @@ -135,6 +135,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https://www.youtube.com/watch?v=DbJyPELmhJc) [How large is that number in the Law of Large Numbers?](https://thepalindrome.org/p/how-large-that-number-in-the-law) [The Prosecutor's Fallacy](https://www.cebm.ox.ac.uk/news/views/the-prosecutors-fallacy) +[The Dunning-Kruger Effect is Autocorrelation](https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/) #### Epidemiology [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) From 2718fa5e8f0494c1abb3db6d57633375e6186fb0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 29 Nov 2023 16:13:35 +0100 Subject: [PATCH 481/550] Friends don't let friends make certain types of data visualization --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 99069be..5ef07e1 100644 --- a/README.md +++ b/README.md @@ -103,6 +103,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [confseq](https://github.com/gostevehoward/confseq) - Uniform boundaries, confidence sequences, and always-valid p-values. ##### Visualizations +[Friends don't let friends make certain types of data visualization](https://github.com/cxli233/FriendsDontLetFriends) [Great Overview over Visualizations](https://textvis.lnu.se/) [Dependent Propabilities](https://static.laszlokorte.de/stochastic/) [Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https://rpsychologist.com/d3/NHST/) From 4ec8206a856005fc54da2e60c23d84580d6d7801 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 30 Nov 2023 16:56:14 +0100 Subject: [PATCH 482/550] Staining and imaging videos. --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 5ef07e1..a6c1887 100644 --- a/README.md +++ b/README.md @@ -388,6 +388,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) ##### Tutorials +[MIT 7.016 Introductory Biology, Fall 2018](https://www.youtube.com/playlist?list=PLUl4u3cNGP63LmSVIVzy584-ZbjbJ-Y63) - Videos 27, 28, and 29 talk about staining and imaging. [bioimaging.org](https://www.bioimagingguide.org/welcome.html) - A biologists guide to planning and performing quantitative bioimaging experiments. [Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. From 4e6e34f9d2ad581d31306a43ded34a2d58577754 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Dec 2023 09:31:38 +0100 Subject: [PATCH 483/550] skimpy --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index a6c1887..a269085 100644 --- a/README.md +++ b/README.md @@ -150,6 +150,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). [pyjanitor](https://github.com/pyjanitor-devs/pyjanitor) - Clean messy column names. +[skimpy](https://github.com/aeturrell/skimpy) - Create summary statistics of dataframes. Helpful `clean_columns()` function. [pandera](https://github.com/unionai-oss/pandera) - Data / Schema validation. [impyute](https://github.com/eltonlaw/impyute) - Imputations. [fancyimpute](https://github.com/iskandr/fancyimpute) - Matrix completion and imputation algorithms. From f646546367ce89b141f4de8955411ed3e036425b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 21 Dec 2023 14:09:21 +0100 Subject: [PATCH 484/550] PCA papers --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index a269085..2b3777e 100644 --- a/README.md +++ b/README.md @@ -237,6 +237,8 @@ SimCLR - [link](https://github.com/lightly-ai/lightly) ##### Packages [Dangers of PCA (paper)](https://www.nature.com/articles/s41598-022-14395-4). +[Phantom oscillations in PCA](https://www.biorxiv.org/content/10.1101/2023.06.20.545619v1.full). +[What to use instead of PCA](https://www.pnas.org/doi/10.1073/pnas.2319169120). [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. Additional plots for PCA - Factor Loadings, Cumulative Variance Explained, [Correlation Circle Plot](http://rasbt.github.io/mlxtend/user_guide/plotting/plot_pca_correlation_graph/), [Tweet](https://twitter.com/rasbt/status/1555999903398219777/photo/1) From a1506eed0f78f50df7f1d22360c3195b0a97acff Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 21 Dec 2023 14:09:44 +0100 Subject: [PATCH 485/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2b3777e..8190e8c 100644 --- a/README.md +++ b/README.md @@ -238,7 +238,7 @@ SimCLR - [link](https://github.com/lightly-ai/lightly) ##### Packages [Dangers of PCA (paper)](https://www.nature.com/articles/s41598-022-14395-4). [Phantom oscillations in PCA](https://www.biorxiv.org/content/10.1101/2023.06.20.545619v1.full). -[What to use instead of PCA](https://www.pnas.org/doi/10.1073/pnas.2319169120). +[What to use instead of PCA](https://www.pnas.org/doi/10.1073/pnas.2319169120). [Talk](https://www.youtube.com/watch?v=9iol3Lk6kyU), [tsne intro](https://distill.pub/2016/misread-tsne/). [sklearn.manifold](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) and [sklearn.decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others. Additional plots for PCA - Factor Loadings, Cumulative Variance Explained, [Correlation Circle Plot](http://rasbt.github.io/mlxtend/user_guide/plotting/plot_pca_correlation_graph/), [Tweet](https://twitter.com/rasbt/status/1555999903398219777/photo/1) From bce5e92a9c2f27781e2edbf9108c85c8ff8ce1bf Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 4 Jan 2024 17:15:24 +0100 Subject: [PATCH 486/550] Estimating Effect Sizes --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 8190e8c..d0546cc 100644 --- a/README.md +++ b/README.md @@ -90,6 +90,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandaltman.html), [2](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html) +##### Effect Size +[Estimating Effect Sizes From Pretest-Posttest-Control Group Designs](https://journals.sagepub.com/doi/epdf/10.1177/1094428106291059) - Scott B. Morris + ##### Statistical Tests [test_proportions_2indep](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.test_proportions_2indep.html) - Proportion test. [G-Test](https://en.wikipedia.org/wiki/G-test) - Alternative to chi-square test, [power_divergence](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.power_divergence.html). From 83f45eb723cdfd7138222e162ba485bdabe68ff8 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 4 Jan 2024 17:16:10 +0100 Subject: [PATCH 487/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d0546cc..fe94d25 100644 --- a/README.md +++ b/README.md @@ -91,7 +91,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html) ##### Effect Size -[Estimating Effect Sizes From Pretest-Posttest-Control Group Designs](https://journals.sagepub.com/doi/epdf/10.1177/1094428106291059) - Scott B. Morris +[Estimating Effect Sizes From Pretest-Posttest-Control Group Designs](https://journals.sagepub.com/doi/epdf/10.1177/1094428106291059) - Scott B. Morris, [Twitter](https://twitter.com/MatthewBJane/status/1742588609025200557) ##### Statistical Tests [test_proportions_2indep](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.test_proportions_2indep.html) - Proportion test. From 62f62205acd922f3580cd22c0c560a5716835275 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 6 Jan 2024 11:29:26 +0100 Subject: [PATCH 488/550] Rafi, Greenland article --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index fe94d25..54a314d 100644 --- a/README.md +++ b/README.md @@ -140,6 +140,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [How large is that number in the Law of Large Numbers?](https://thepalindrome.org/p/how-large-that-number-in-the-law) [The Prosecutor's Fallacy](https://www.cebm.ox.ac.uk/news/views/the-prosecutors-fallacy) [The Dunning-Kruger Effect is Autocorrelation](https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/) +[Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01105-9) #### Epidemiology [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) From 306118f6ee338781581a435301fcab9f6725277d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 Jan 2024 10:26:05 +0100 Subject: [PATCH 489/550] mlx --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 54a314d..a8a09fd 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ [pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. [polars](https://github.com/pola-rs/polars) - Multi-threaded alternative to pandas. [xarray](https://github.com/pydata/xarray/) - Extends pandas to n-dimensional arrays. +[mlx](https://github.com/ml-explore/mlx) - An array framework for Apple silicon. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [duckdb](https://github.com/duckdb/duckdb) - Efficiently run SQL queries on pandas DataFrame. From 915a1266e10f154de53eb2c391fe28e8592943f7 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 8 Jan 2024 13:14:54 +0100 Subject: [PATCH 490/550] Evaluation --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index a8a09fd..abc109c 100644 --- a/README.md +++ b/README.md @@ -143,6 +143,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [The Dunning-Kruger Effect is Autocorrelation](https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/) [Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01105-9) +#### Evaluation +[Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https://www.bmj.com/content/384/bmj-2023-074819.full) + #### Epidemiology [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) [incidence2](https://github.com/reconhub/incidence2) - Computation, handling, visualisation and simple modelling of incidence (R package). From e4b8afa9d5b874bcbb5ee7fbbe76477917bbf082 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 8 Jan 2024 13:15:59 +0100 Subject: [PATCH 491/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index abc109c..2dacc3f 100644 --- a/README.md +++ b/README.md @@ -144,7 +144,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01105-9) #### Evaluation -[Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https://www.bmj.com/content/384/bmj-2023-074819.full) +[Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https://www.bmj.com/content/384/bmj-2023-074819.full) - [Twitter](https://twitter.com/GSCollins/status/1744309712995098624) #### Epidemiology [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) From b02087d40409f8acbdc5c0c7afb95304eb3ec0ec Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 19 Jan 2024 21:59:18 +0100 Subject: [PATCH 492/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2dacc3f..7bb2958 100644 --- a/README.md +++ b/README.md @@ -118,6 +118,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Bayesian two-sample t test](https://rpsychologist.com/d3/bayes/) [Distribution of p-values when comparing two groups](https://rpsychologist.com/d3/pdist/) [Understanding the t-distribution and its normal approximation](https://rpsychologist.com/d3/tdist/) +[Statistical Power and Sample Size Calculation Tools](https://pwrss.shinyapps.io/index/) ##### Talks [Inverse Propensity Weighting](https://www.youtube.com/watch?v=SUq0shKLPPs) From b5030f064bc013278c1d4091a47052355da18644 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 11 Feb 2024 21:43:18 +0100 Subject: [PATCH 493/550] ASA Statement on p-Values --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7bb2958..b24919c 100644 --- a/README.md +++ b/README.md @@ -79,6 +79,11 @@ #### Classical Statistics +##### p-values +[The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) +[Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) +[Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) + ##### Correlation [phik](https://github.com/kaveio/phik) - Correlation between categorical, ordinal and interval variables. @@ -130,8 +135,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Verifying the Assumptions of Linear Models](https://github.com/erykml/medium_articles/blob/master/Statistics/linear_regression_assumptions.ipynb) [Mediation and Moderation Intro](https://ademos.people.uic.edu/Chapter14.html) [Montgomery et al. - How conditioning on post-treatment variables can ruin your experiment and what to do about it](https://cpb-us-e1.wpmucdn.com/sites.dartmouth.edu/dist/5/2293/files/2021/03/post-treatment-bias.pdf) -[Greenland - Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) -[Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) [Lindeløv - Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) [Chatruc - The Central Limit Theorem and its misuse](https://web.archive.org/web/20191229234155/https://lambdaclass.com/data_etudes/central_limit_theorem_misuse/) [Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http://www.stat.tugraz.at/AJS/ausg093/093Al-Saleh.pdf) From a2498ee142377d075d0033c78b78704777a5c90a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 11 Feb 2024 21:43:55 +0100 Subject: [PATCH 494/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b24919c..e244dfa 100644 --- a/README.md +++ b/README.md @@ -80,7 +80,7 @@ #### Classical Statistics ##### p-values -[The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) +[The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) [Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) From eadd434a5cec4c4dfde2472999be67c6864aad7f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 22 Feb 2024 09:27:14 +0100 Subject: [PATCH 495/550] hoeffd --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e244dfa..c70b7c0 100644 --- a/README.md +++ b/README.md @@ -86,6 +86,7 @@ ##### Correlation [phik](https://github.com/kaveio/phik) - Correlation between categorical, ordinal and interval variables. +[hoeffd](https://search.r-project.org/CRAN/refmans/Hmisc/html/hoeffd.html) - Hoeffding's D Statistics, measure of dependence (R package). ##### Packages [statsmodels](https://www.statsmodels.org/stable/index.html) - Statistical tests. From 5e620ececd19760566f83ccbd1dbb8c3823256e0 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 26 Feb 2024 22:17:05 +0100 Subject: [PATCH 496/550] pwrss --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index c70b7c0..0d2b3bc 100644 --- a/README.md +++ b/README.md @@ -107,6 +107,9 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal ##### Comparing Two Populations [torch-two-sample](https://github.com/josipd/torch-two-sample) - Friedman-Rafsky Test: Compare two population based on a multivariate generalization of the Runstest. [Explanation](https://www.real-statistics.com/multivariate-statistics/multivariate-normal-distribution/friedman-rafsky-test/), [Application](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5014134/) +##### Power and Sample Size Calculations +[pwrss](https://cran.r-project.org/web/packages/pwrss/index.html) - Statistical Power and Sample Size Calculation Tools (R package), [Tutorial with t-test](https://rpubs.com/metinbulus/welch) + ##### Interim Analyses / Sequential Analysis / Stopping [Sequential Analysis](https://en.wikipedia.org/wiki/Sequential_analysis) - Wikipedia. [sequential](https://cran.r-project.org/web/packages/Sequential/Sequential.pdf) - Exact Sequential Analysis for Poisson and Binomial Data (R package). From f821174cddc25750b5df297c1ab03767b4541d14 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 1 Mar 2024 14:04:32 +0100 Subject: [PATCH 497/550] daft --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0d2b3bc..6a62ee1 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,7 @@ [mlx](https://github.com/ml-explore/mlx) - An array framework for Apple silicon. [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [duckdb](https://github.com/duckdb/duckdb) - Efficiently run SQL queries on pandas DataFrame. +[daft](https://github.com/Eventual-Inc/Daft) - Distributed DataFrame. #### Pandas Parallelization [modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. From 23fa0708b29f7848d10edc684bf3c36e20147c74 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 2 Mar 2024 14:11:28 +0100 Subject: [PATCH 498/550] Data Science Books with R --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 6a62ee1..61971d9 100644 --- a/README.md +++ b/README.md @@ -1148,6 +1148,8 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Blum - Foundations of Data Science](https://www.cs.cornell.edu/jeh/book.pdf?file=book.pdf) [Chan - Introduction to Probability for Data Science](https://probability4datascience.com/index.html) [Colonescu - Principles of Econometrics with R](https://bookdown.org/ccolonescu/RPoE4/) +[Rafael Irizarry - Introduction to Data Science](https://rafalab.dfci.harvard.edu/dsbook-part-1/) (R Language) +[Rafael Irizarry - Advanced Data Science](https://rafalab.dfci.harvard.edu/dsbook-part-2/) (R Language) ##### Other Awesome Lists [Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) From c04e2be0397d11b606d2ccf4143806fef3ae98ad Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 2 Mar 2024 14:11:53 +0100 Subject: [PATCH 499/550] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 61971d9..845faf6 100644 --- a/README.md +++ b/README.md @@ -1148,8 +1148,8 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Blum - Foundations of Data Science](https://www.cs.cornell.edu/jeh/book.pdf?file=book.pdf) [Chan - Introduction to Probability for Data Science](https://probability4datascience.com/index.html) [Colonescu - Principles of Econometrics with R](https://bookdown.org/ccolonescu/RPoE4/) -[Rafael Irizarry - Introduction to Data Science](https://rafalab.dfci.harvard.edu/dsbook-part-1/) (R Language) -[Rafael Irizarry - Advanced Data Science](https://rafalab.dfci.harvard.edu/dsbook-part-2/) (R Language) +[Rafael Irizarry - Introduction to Data Science](https://rafalab.dfci.harvard.edu/dsbook-part-1/) (R Language) +[Rafael Irizarry - Advanced Data Science](https://rafalab.dfci.harvard.edu/dsbook-part-2/) (R Language) ##### Other Awesome Lists [Awesome Adversarial Machine Learning](https://github.com/yenchenlin/awesome-adversarial-machine-learning) From 89f3ef1e9087c136c508fea00689f289004ef464 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 4 Mar 2024 16:43:31 +0100 Subject: [PATCH 500/550] Guess the Correlation --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 845faf6..544abf7 100644 --- a/README.md +++ b/README.md @@ -86,6 +86,7 @@ [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) ##### Correlation +[Guess the Correlation](https://www.guessthecorrelation.com/) - Correlation guessing game. [phik](https://github.com/kaveio/phik) - Correlation between categorical, ordinal and interval variables. [hoeffd](https://search.r-project.org/CRAN/refmans/Hmisc/html/hoeffd.html) - Hoeffding's D Statistics, measure of dependence (R package). From eb0addb9aa6bb56a57232981ec0532abdabf8842 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 11 Mar 2024 00:08:11 +0100 Subject: [PATCH 501/550] Introduction to Bioimage Analysis --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 544abf7..eea4e4e 100644 --- a/README.md +++ b/README.md @@ -411,6 +411,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www ##### Tutorials [MIT 7.016 Introductory Biology, Fall 2018](https://www.youtube.com/playlist?list=PLUl4u3cNGP63LmSVIVzy584-ZbjbJ-Y63) - Videos 27, 28, and 29 talk about staining and imaging. [bioimaging.org](https://www.bioimagingguide.org/welcome.html) - A biologists guide to planning and performing quantitative bioimaging experiments. +[Introduction to Bioimage Analysis](https://bioimagebook.github.io/index.html) - Book. [Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. From 6bb03c01d399dae26f6381016b49c63f14dd4136 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 7 Apr 2024 22:30:33 +0200 Subject: [PATCH 502/550] Rubin - Inconsistent multiple testing corrections --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index eea4e4e..746bca0 100644 --- a/README.md +++ b/README.md @@ -84,6 +84,7 @@ [The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) [Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) +[Rubin - Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses](https://www.sciencedirect.com/science/article/pii/S2590260124000067?via%3Dihub) ##### Correlation [Guess the Correlation](https://www.guessthecorrelation.com/) - Correlation guessing game. From 378c88e4eca367f1c613aecdd11371897dec070f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 Apr 2024 17:27:42 +0200 Subject: [PATCH 503/550] On the uses and abuses of regression models --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 746bca0..349c5fa 100644 --- a/README.md +++ b/README.md @@ -152,7 +152,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [How large is that number in the Law of Large Numbers?](https://thepalindrome.org/p/how-large-that-number-in-the-law) [The Prosecutor's Fallacy](https://www.cebm.ox.ac.uk/news/views/the-prosecutors-fallacy) [The Dunning-Kruger Effect is Autocorrelation](https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/) -[Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01105-9) +[Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01105-9) +[Carlin et al. - On the uses and abuses of regression models: a call for reform of statistical practice and teaching](https://arxiv.org/abs/2309.06668) #### Evaluation [Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https://www.bmj.com/content/384/bmj-2023-074819.full) - [Twitter](https://twitter.com/GSCollins/status/1744309712995098624) From 228931142d43473553703296c23a5b0a4e3eca60 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 12 Apr 2024 18:36:33 +0200 Subject: [PATCH 504/550] Gigerenzer - Mindless Statistics --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 349c5fa..31fc7cb 100644 --- a/README.md +++ b/README.md @@ -85,6 +85,7 @@ [Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) [Rubin - Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses](https://www.sciencedirect.com/science/article/pii/S2590260124000067?via%3Dihub) +[Gigerenzer - Mindless Statistics](https://library.mpib-berlin.mpg.de/ft/gg/GG_Mindless_2004.pdf) ##### Correlation [Guess the Correlation](https://www.guessthecorrelation.com/) - Correlation guessing game. From 3186182b26c36e7967807cda409c5e7f6d86664a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 12 Apr 2024 19:22:34 +0200 Subject: [PATCH 505/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 31fc7cb..23cfdb7 100644 --- a/README.md +++ b/README.md @@ -86,6 +86,7 @@ [Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) [Rubin - Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses](https://www.sciencedirect.com/science/article/pii/S2590260124000067?via%3Dihub) [Gigerenzer - Mindless Statistics](https://library.mpib-berlin.mpg.de/ft/gg/GG_Mindless_2004.pdf) +[Rubin - That's not a two-sided test! It's two one-sided tests!](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1740-9713.01405) ##### Correlation [Guess the Correlation](https://www.guessthecorrelation.com/) - Correlation guessing game. From da2da2f3b6ba6ef562f990c971b9864ed1a20798 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 10 May 2024 23:20:12 +0200 Subject: [PATCH 506/550] Rdatasets --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 23cfdb7..7b017d8 100644 --- a/README.md +++ b/README.md @@ -80,6 +80,9 @@ #### Classical Statistics +##### Datasets +[Rdatasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html) - Collection of more than 2000 datasets, stored as csv files. + ##### p-values [The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) [Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) From 2660d85eb5f1abf5b448f4420f4d280931a41fe6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 21 May 2024 23:45:29 +0200 Subject: [PATCH 507/550] vegdist --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7b017d8..4618274 100644 --- a/README.md +++ b/README.md @@ -769,6 +769,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach #### Distance Functions [scipy.spatial](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) - All kinds of distance metrics. +[vegdist](https://rdrr.io/cran/vegan/man/vegdist.html) - Distance metrics (R package). [pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance / Wasserstein distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html), [POT implementation](https://pythonot.github.io/auto_examples/plot_OT_2D_samples.html) [dcor](https://github.com/vnmabus/dcor) - Distance correlation and related Energy statistics. [GeomLoss](https://www.kernel-operations.io/geomloss/) - Kernel norms, Hausdorff divergences, Debiased Sinkhorn divergences (=approximation of Wasserstein distance). From 5bf5466bf48ad5d193aea4a8c5db679922c3e357 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 21 May 2024 23:47:09 +0200 Subject: [PATCH 508/550] Cosine-Similarity paper --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4618274..95345b8 100644 --- a/README.md +++ b/README.md @@ -768,6 +768,7 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [TensorFlow similarity](https://github.com/tensorflow/similarity) - Metric learning. #### Distance Functions +[Steck et al. - Is Cosine-Similarity of Embeddings Really About Similarity?](https://arxiv.org/abs/2403.05440) [scipy.spatial](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) - All kinds of distance metrics. [vegdist](https://rdrr.io/cran/vegan/man/vegdist.html) - Distance metrics (R package). [pyemd](https://github.com/wmayner/pyemd) - Earth Mover's Distance / Wasserstein distance, similarity between histograms. [OpenCV implementation](https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html), [POT implementation](https://pythonot.github.io/auto_examples/plot_OT_2D_samples.html) From 6ac94707ef321ca4b882d80ccd43e463636e5b53 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 27 May 2024 23:10:10 +0200 Subject: [PATCH 509/550] episensr + Lesko paper --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 95345b8..9929148 100644 --- a/README.md +++ b/README.md @@ -164,6 +164,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https://www.bmj.com/content/384/bmj-2023-074819.full) - [Twitter](https://twitter.com/GSCollins/status/1744309712995098624) #### Epidemiology +[Lesko et al. - A Framework for Descriptive Epidemiology](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10144679/) [R Epidemics Consortium](https://www.repidemicsconsortium.org/projects/) - Large tool suite for working with epidemiological data (R packages). [Github](https://github.com/reconhub) [incidence2](https://github.com/reconhub/incidence2) - Computation, handling, visualisation and simple modelling of incidence (R package). [EpiEstim](https://github.com/mrc-ide/EpiEstim) - Estimate time varying instantaneous reproduction number R during epidemics (R package) [paper](https://academic.oup.com/aje/article/178/9/1505/89262). @@ -171,6 +172,8 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [zEpid](https://github.com/pzivich/zEpid) - Epidemiology analysis package, [Tutorial](https://github.com/pzivich/Python-for-Epidemiologists). [tipr](https://github.com/LucyMcGowan/tipr) - Sensitivity analyses for unmeasured confounders (R package). [quartets](https://github.com/r-causal/quartets) - Anscombe’s Quartet, Causal Quartet, [Datasaurus Dozen](https://github.com/jumpingrivers/datasauRus) and others (R package). +[episensr](https://cran.r-project.org/web/packages/episensr/vignettes/episensr.html) - Quantitative Bias Analysis for Epidemiologic Data (=simulation of possible effects of different sources of bias) (R package). + #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). From 9239e403c9bc2b54bed90018e5b35e88990541e9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 1 Jun 2024 13:38:53 +0200 Subject: [PATCH 510/550] Marginal Effects Tutorial --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9929148..e0473a7 100644 --- a/README.md +++ b/README.md @@ -958,6 +958,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y #### Causal Inference [CS 594 Causal Inference and Learning](https://www.cs.uic.edu/~elena/courses/fall19/cs594cil.html) +[Marginal Effects Tutorial](https://marginaleffects.com/vignettes/gcomputation.html) - Marginal Effects, g-computation and more. [Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https://bookdown.org/content/4857/), [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). [Python Causality Handbook](https://github.com/matheusfacure/python-causality-handbook) [dowhy](https://github.com/py-why/dowhy) - Estimate causal effects. @@ -969,7 +970,6 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [DoubleML](https://github.com/DoubleML/doubleml-for-py) - Machine Learning + Causal inference, [Tweet](https://twitter.com/ChristophMolnar/status/1574338002305880068), [Presentation](https://scholar.princeton.edu/sites/default/files/bstewart/files/felton.chern_.slides.20190318.pdf), [Paper](https://arxiv.org/abs/1608.00060v1). [EconML](https://github.com/py-why/EconML) - Heterogeneous Treatment Effects Estimation by Microsoft. - ##### Papers [Bours - Confounding](https://edisciplinas.usp.br/pluginfile.php/5625667/mod_resource/content/3/Nontechnicalexplanation-counterfactualdefinition-confounding.pdf) [Bours - Effect Modification and Interaction](https://www.sciencedirect.com/science/article/pii/S0895435621000330) From 29641ca60469f3cccb0f9302efd0b2bc808471e4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 1 Jun 2024 17:49:59 +0200 Subject: [PATCH 511/550] Awesome MLOps + Awesome Data Science --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index e0473a7..25ce3ab 100644 --- a/README.md +++ b/README.md @@ -1176,6 +1176,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Community Detection](https://github.com/benedekrozemberczki/awesome-community-detection) [Awesome CSV](https://github.com/secretGeek/AwesomeCSV) [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) +[Awesome Data Science](https://github.com/academic/awesome-datascience) [Awesome Data Science with Ruby](https://github.com/arbox/data-science-with-ruby) [Awesome Dash](https://github.com/ucg8j/awesome-dash) [Awesome Decision Trees](https://github.com/benedekrozemberczki/awesome-decision-tree-papers) @@ -1193,6 +1194,7 @@ Gilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machin [Awesome Machine Learning Interpretability](https://github.com/jphall663/awesome-machine-learning-interpretability) [Awesome Machine Learning Operations](https://github.com/EthicalML/awesome-machine-learning-operations) [Awesome Monte Carlo Tree Search](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) +[Awesome MLOps](https://github.com/kelvins/awesome-mlops) [Awesome Neural Network Visualization](https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network) [Awesome Online Machine Learning](https://github.com/MaxHalford/awesome-online-machine-learning) [Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) From da232b54058baff7daaca5aeb1648feb20c34ad9 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 1 Jun 2024 20:48:01 +0200 Subject: [PATCH 512/550] An introduction to g methods --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 25ce3ab..330a3bc 100644 --- a/README.md +++ b/README.md @@ -957,6 +957,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear classification, regression and ranking. #### Causal Inference +[Naimi et al. - An introduction to g methods](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074945/) [CS 594 Causal Inference and Learning](https://www.cs.uic.edu/~elena/courses/fall19/cs594cil.html) [Marginal Effects Tutorial](https://marginaleffects.com/vignettes/gcomputation.html) - Marginal Effects, g-computation and more. [Statistical Rethinking](https://github.com/rmcelreath/stat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https://bookdown.org/content/4857/), [python](https://github.com/pymc-devs/resources/tree/master/Rethinking_2), [numpyro1](https://github.com/asuagar/statrethink-course-numpyro-2019), [numpyro2](https://fehiepsi.github.io/rethinking-numpyro/), [tensorflow-probability](https://github.com/ksachdeva/rethinking-tensorflow-probability). From c99db3cfc0242122e029d748c69b2336b47c62ea Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 2 Jun 2024 23:04:55 +0200 Subject: [PATCH 513/550] Logs with zeros? Some problems and solutions --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 330a3bc..6a8469c 100644 --- a/README.md +++ b/README.md @@ -159,6 +159,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [The Dunning-Kruger Effect is Autocorrelation](https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/) [Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01105-9) [Carlin et al. - On the uses and abuses of regression models: a call for reform of statistical practice and teaching](https://arxiv.org/abs/2309.06668) +[Chen, Roth - Logs with zeros? Some problems and solutions](https://arxiv.org/abs/2212.06080) #### Evaluation [Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https://www.bmj.com/content/384/bmj-2023-074819.full) - [Twitter](https://twitter.com/GSCollins/status/1744309712995098624) From 085248f83f8de0fe8616dee473e9531c70cb6b71 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 4 Jun 2024 13:56:53 +0200 Subject: [PATCH 514/550] A beginner's guide to rigor and reproducibility in fluorescence imaging experiments --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 6a8469c..da0a9d8 100644 --- a/README.md +++ b/README.md @@ -417,12 +417,11 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [textdistance](https://github.com/life4/textdistance) - Collection for comparing distances between two or more sequences. #### Bio Image Analysis +[Lee et al. - A beginner's guide to rigor and reproducibility in fluorescence imaging experiments](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080651/) [Awesome Cytodata](https://github.com/cytodata/awesome-cytodata) ##### Tutorials [MIT 7.016 Introductory Biology, Fall 2018](https://www.youtube.com/playlist?list=PLUl4u3cNGP63LmSVIVzy584-ZbjbJ-Y63) - Videos 27, 28, and 29 talk about staining and imaging. -[bioimaging.org](https://www.bioimagingguide.org/welcome.html) - A biologists guide to planning and performing quantitative bioimaging experiments. -[Introduction to Bioimage Analysis](https://bioimagebook.github.io/index.html) - Book. [Bio-image Analysis Notebooks](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/intro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/extract_psf.html) and [deconvolution](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/18a_deconvolution/introduction_deconvolution.html), [3D cell segmentation](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/20_image_segmentation/Segmentation_3D.html), [feature extraction](https://haesleinhuepf.github.io/BioImageAnalysisNotebooks/22_feature_extraction/statistics_with_pyclesperanto.html) using [pyclesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) and others. [python_for_microscopists](https://github.com/bnsreenu/python_for_microscopists) - Notebooks and associated [youtube channel](https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w/videos) for a variety of image processing tasks. From 5ee499611ccaa7c268e88dc3070548a65a6acf21 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 5 Jun 2024 18:53:46 +0200 Subject: [PATCH 515/550] Update README.md StatCheck --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index da0a9d8..f9866b6 100644 --- a/README.md +++ b/README.md @@ -104,6 +104,7 @@ [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons. Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandaltman.html), [2](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html) +[StatCheck](https://statcheck.steveharoz.com/) - Extract statistics from articles and recompute p-values (R package). ##### Effect Size [Estimating Effect Sizes From Pretest-Posttest-Control Group Designs](https://journals.sagepub.com/doi/epdf/10.1177/1094428106291059) - Scott B. Morris, [Twitter](https://twitter.com/MatthewBJane/status/1742588609025200557) From 8e3eb241ce0c21366fa3a00d006165f79018d8b4 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 28 Jun 2024 13:52:48 +0200 Subject: [PATCH 516/550] The Causal Cookbook --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index f9866b6..bb5aa99 100644 --- a/README.md +++ b/README.md @@ -958,6 +958,7 @@ Distances for comparing histograms and detecting outliers - [Talk](https://www.y [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear classification, regression and ranking. #### Causal Inference +[Chatton et al. - The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization](https://journals.sagepub.com/doi/10.1177/25152459241236149) [Naimi et al. - An introduction to g methods](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074945/) [CS 594 Causal Inference and Learning](https://www.cs.uic.edu/~elena/courses/fall19/cs594cil.html) [Marginal Effects Tutorial](https://marginaleffects.com/vignettes/gcomputation.html) - Marginal Effects, g-computation and more. From c91856bb3b428013cd9826cada26f3cf26070c73 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 29 Jun 2024 10:20:34 +0200 Subject: [PATCH 517/550] gibbs-diffusion --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index bb5aa99..9de5a6c 100644 --- a/README.md +++ b/README.md @@ -487,6 +487,7 @@ Image Data Explorer - Microscopy Image Viewer, [Shiny App](https://shiny-portal. [aydin](https://github.com/royerlab/aydin) - Image denoising. [DivNoising](https://github.com/juglab/DivNoising) - Unsupervised denoising method. [CSBDeep](https://github.com/CSBDeep/CSBDeep) - Content-aware image restoration, [Project page](https://csbdeep.bioimagecomputing.com/tools/). +[gibbs-diffusion](https://github.com/rubenohana/gibbs-diffusion) - Image denoising. ##### Illumination correction [skimage](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE). From dd5d3205e57a8720b774d01f34a9c8414de99bfc Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 29 Jun 2024 13:41:16 +0200 Subject: [PATCH 518/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9de5a6c..bd0e7c4 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ [rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - VSCode plugin to display .csv files with nice colors. #### General Python Programming -[Python Best Practices Guide](https://github.com/qiwihui/pocket_readings/issues/1148#issuecomment-874448132) +[Python Best Practices Guide](https://medium.com/@mronakjain94/comprehensive-guide-to-installing-poetry-on-ubuntu-and-managing-python-projects-949b49ef4f76) [pyenv](https://github.com/pyenv/pyenv) - Manage multiple Python versions on your system. [poetry](https://github.com/python-poetry/poetry) - Dependency management. [pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. From 7ebf024a6854a82c1ddaca32f94fdb57c53de53d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 2 Jul 2024 22:38:13 +0200 Subject: [PATCH 519/550] TOSTER --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index bd0e7c4..b99782e 100644 --- a/README.md +++ b/README.md @@ -86,10 +86,10 @@ ##### p-values [The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) [Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/) -[Blume - Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299) [Rubin - Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses](https://www.sciencedirect.com/science/article/pii/S2590260124000067?via%3Dihub) [Gigerenzer - Mindless Statistics](https://library.mpib-berlin.mpg.de/ft/gg/GG_Mindless_2004.pdf) -[Rubin - That's not a two-sided test! It's two one-sided tests!](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1740-9713.01405) +[Rubin - That's not a two-sided test! It's two one-sided tests! (TOST)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1740-9713.01405) +[Lakens - How were we supposed to move beyond p < .05, and why didn’t we?](https://errorstatistics.com/2024/07/01/guest-post-daniel-lakens-how-were-we-supposed-to-move-beyond-p-05-and-why-didnt-we-thoughts-on-abandon-statistical-significance-5-years-on/) ##### Correlation [Guess the Correlation](https://www.guessthecorrelation.com/) - Correlation guessing game. @@ -105,6 +105,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandaltman.html), [2](http://www.statsmodels.org/dev/generated/statsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement. [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html) [StatCheck](https://statcheck.steveharoz.com/) - Extract statistics from articles and recompute p-values (R package). +[TOSTER](https://github.com/Lakens/TOSTER) - TOST equivalence test and power functions (R package). ##### Effect Size [Estimating Effect Sizes From Pretest-Posttest-Control Group Designs](https://journals.sagepub.com/doi/epdf/10.1177/1094428106291059) - Scott B. Morris, [Twitter](https://twitter.com/MatthewBJane/status/1742588609025200557) From 3fb2dd03499a9de944e741c78aa2f5b8117d2178 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 3 Jul 2024 17:43:48 +0200 Subject: [PATCH 520/550] Abandon Statistical Significance --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b99782e..4cbe145 100644 --- a/README.md +++ b/README.md @@ -89,7 +89,8 @@ [Rubin - Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses](https://www.sciencedirect.com/science/article/pii/S2590260124000067?via%3Dihub) [Gigerenzer - Mindless Statistics](https://library.mpib-berlin.mpg.de/ft/gg/GG_Mindless_2004.pdf) [Rubin - That's not a two-sided test! It's two one-sided tests! (TOST)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1740-9713.01405) -[Lakens - How were we supposed to move beyond p < .05, and why didn’t we?](https://errorstatistics.com/2024/07/01/guest-post-daniel-lakens-how-were-we-supposed-to-move-beyond-p-05-and-why-didnt-we-thoughts-on-abandon-statistical-significance-5-years-on/) +[Lakens - How were we supposed to move beyond p < .05, and why didn’t we?](https://errorstatistics.com/2024/07/01/guest-post-daniel-lakens-how-were-we-supposed-to-move-beyond-p-05-and-why-didnt-we-thoughts-on-abandon-statistical-significance-5-years-on/) +[McShane et al. - Abandon Statistical Significance](https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1527253) ##### Correlation [Guess the Correlation](https://www.guessthecorrelation.com/) - Correlation guessing game. From e243bd083839efed1c0425531ec962a202f8d5db Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 11 Jul 2024 15:13:53 +0200 Subject: [PATCH 521/550] rye --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4cbe145..35f807f 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ #### General Python Programming [Python Best Practices Guide](https://medium.com/@mronakjain94/comprehensive-guide-to-installing-poetry-on-ubuntu-and-managing-python-projects-949b49ef4f76) +[rye](https://github.com/astral-sh/rye) - Dependency management. [pyenv](https://github.com/pyenv/pyenv) - Manage multiple Python versions on your system. [poetry](https://github.com/python-poetry/poetry) - Dependency management. [pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. @@ -665,7 +666,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [MONAI](https://github.com/project-monai/monai) - Deep learning in healthcare imaging. [kornia](https://github.com/kornia/kornia) - Image transformations, epipolar geometry, depth estimation. -[torchinfo](https://github.com/TylerYep/torchinfo) - Nice model summary. +[torchinfo](https://github.com/Tylep/torchinfo) - Nice model summary. [lovely-tensors](https://github.com/xl0/lovely-tensors/) - Inspect tensors, mean, std, inf values. ##### Distributed Libs From e56f5d9abaf4d7dd2ff9407002b1e1b4afa1cf33 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 16 Jul 2024 18:09:09 +0200 Subject: [PATCH 522/550] Links to transformers --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 35f807f..ad2f404 100644 --- a/README.md +++ b/README.md @@ -719,6 +719,9 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po [StudioGAN](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) - PyTorch GAN implementations. ##### Transformers +[The Annotated Transformer](https://nlp.seas.harvard.edu/annotated-transformer/) - Intro to transformers. +[Transformers from Scratch](https://e2eml.school/transformers.html] - Intro. +[Neural Networks: Zero to Hero](https://karpathy.ai/zero-to-hero.html) - Video series on building neural networks. [SegFormer](https://github.com/NVlabs/SegFormer) - Simple and Efficient Design for Semantic Segmentation with Transformers. [esvit](https://github.com/microsoft/esvit) - Efficient self-supervised Vision Transformers. [nystromformer](https://github.com/Rishit-dagli/Nystromformer) - More efficient transformer because of approximate self-attention. From bd988b51b660723090456be24538e2ad4885c677 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 27 Jul 2024 01:29:47 +0200 Subject: [PATCH 523/550] quak --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ad2f404..9aa9505 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,7 @@ [pandas_flavor](https://github.com/Zsailer/pandas_flavor) - Write custom accessors like `.str` and `.dt`. [duckdb](https://github.com/duckdb/duckdb) - Efficiently run SQL queries on pandas DataFrame. [daft](https://github.com/Eventual-Inc/Daft) - Distributed DataFrame. +[quak](https://github.com/manzt/quak) - Scalable, interactive data table, [twitter](https://x.com/trevmanz/status/1816760923949809982). #### Pandas Parallelization [modin](https://github.com/modin-project/modin) - Parallelization library for faster pandas `DataFrame`. From 3d020a3bec46832c89b712684c47d7f074982ef1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 12 Sep 2024 22:53:32 +0200 Subject: [PATCH 524/550] instanseg --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9aa9505..653b09a 100644 --- a/README.md +++ b/README.md @@ -533,6 +533,7 @@ Review of organoid pipelines - [Paper](https://arxiv.org/ftp/arxiv/papers/2301/2 [MEDIAR](https://github.com/Lee-Gihun/MEDIAR) - Cell segmentation. [cellpose](https://github.com/mouseland/cellpose) - Cell segmentation. [Paper](https://www.biorxiv.org/content/10.1101/2020.02.02.931238v1), [Dataset](https://www.cellpose.org/dataset). [stardist](https://github.com/stardist/stardist) - Cell segmentation with Star-convex Shapes. +[instanseg](https://github.com/instanseg/instanseg) - Cell segmentation. [UnMicst](https://github.com/HMS-IDAC/UnMicst) - Identifying Cells and Segmenting Tissue. [ilastik](https://github.com/ilastik/ilastik) - Segment, classify, track and count cells. [ImageJ Plugin](https://github.com/ilastik/ilastik4ij). [nnUnet](https://github.com/MIC-DKFZ/nnUNet) - 3D biomedical image segmentation. From 1092edffad433a8e1955bb9d7f47c701f004b19d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 19 Sep 2024 22:21:59 +0200 Subject: [PATCH 525/550] litserve --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 653b09a..5096dbf 100644 --- a/README.md +++ b/README.md @@ -665,6 +665,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [torchcv](https://github.com/donnyyou/torchcv) - Deep Learning in Computer Vision. [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) - Collection of optimizers for PyTorch. [pytorch-lightning](https://github.com/PyTorchLightning/PyTorch-lightning) - Wrapper around PyTorch. +[litserve](https://github.com/Lightning-AI/LitServe) - Serve models. [lightly](https://github.com/lightly-ai/lightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR. [MONAI](https://github.com/project-monai/monai) - Deep learning in healthcare imaging. [kornia](https://github.com/kornia/kornia) - Image transformations, epipolar geometry, depth estimation. From 1a1cc17bafecd65e63c3705838f2521935311ea6 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 26 Sep 2024 08:55:19 +0200 Subject: [PATCH 526/550] uv, python-dotenv --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5096dbf..23ecadd 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ #### General Python Programming [Python Best Practices Guide](https://medium.com/@mronakjain94/comprehensive-guide-to-installing-poetry-on-ubuntu-and-managing-python-projects-949b49ef4f76) -[rye](https://github.com/astral-sh/rye) - Dependency management. +[uv](https://github.com/astral-sh/uv) - Dependency management. [pyenv](https://github.com/pyenv/pyenv) - Manage multiple Python versions on your system. [poetry](https://github.com/python-poetry/poetry) - Dependency management. [pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. @@ -23,7 +23,7 @@ [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). [loguru](https://github.com/Delgan/loguru) - Python logging. - +[python-dotenv](https://github.com/theskumar/python-dotenv) - Manage environment variables. #### Pandas Tricks, Alternatives and Additions [pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. From 5d71f2b8268af4db84d5e6d3c47dc3f4783dd697 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 27 Sep 2024 11:51:09 +0200 Subject: [PATCH 527/550] shapiq --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 23ecadd..2e13764 100644 --- a/README.md +++ b/README.md @@ -1033,6 +1033,7 @@ Plotting learning curve: [link](http://www.ritchieng.com/machinelearning-learnin [Book](https://christophm.github.io/interpretable-ml-book/agnostic.html), [Examples](https://github.com/jphall663/interpretable_machine_learning_with_python) scikit-learn - [Permutation Importance](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html) (can be used on any trained classifier) and [Partial Dependence](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.partial_dependence.html) [shap](https://github.com/slundberg/shap) - Explain predictions of machine learning models, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Good Shap intro](https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/). +[shapiq](https://github.com/mmschlk/shapiq) - Shapley interaction quantification. [treeinterpreter](https://github.com/andosa/treeinterpreter) - Interpreting scikit-learn's decision tree and random forest predictions. [lime](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier, [talk](https://www.youtube.com/watch?v=C80SQe16Rao), [Warning (Myth 7)](https://crazyoscarchang.github.io/2019/02/16/seven-myths-in-machine-learning-research/). [lime_xgboost](https://github.com/jphall663/lime_xgboost) - Create LIMEs for XGBoost. From 1f51fd93aad2e4735e551fd2113c32445dc6e69f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 4 Oct 2024 10:28:56 +0200 Subject: [PATCH 528/550] ultralytics --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2e13764..4d7095a 100644 --- a/README.md +++ b/README.md @@ -684,6 +684,7 @@ Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), ##### Object detection / Instance Segmentation [Metrics reloaded: Recommendations for image analysis validation](https://arxiv.org/abs/2206.01653) - Guide for choosing correct image analysis metrics, [Code](https://github.com/Project-MONAI/MetricsReloaded), [Twitter Thread](https://twitter.com/lena_maierhein/status/1625450342006521857) [Good Yolo Explanation](https://jonathan-hui.medium.com/real-time-object-detection-with-yolo-yolov2-28b1b93e2088) +[ultralytics](https://github.com/ultralytics/ultralytics) - Easily accessible Yolo and SAM models. [yolact](https://github.com/dbolya/yolact) - Fully convolutional model for real-time instance segmentation. [EfficientDet Pytorch](https://github.com/toandaominh1997/EfficientDet.Pytorch), [EfficientDet Keras](https://github.com/xuannianz/EfficientDet) - Scalable and Efficient Object Detection. [detectron2](https://github.com/facebookresearch/detectron2) - Object Detection (Mask R-CNN) by Facebook. From 62da70e60e2b6c9af760db53de38a6a06572fb0d Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 9 Oct 2024 20:06:43 +0200 Subject: [PATCH 529/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 4d7095a..153c186 100644 --- a/README.md +++ b/README.md @@ -413,7 +413,6 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [fastText](https://github.com/facebookresearch/fastText) - Efficient text classification and representation learning. [annoy](https://github.com/spotify/annoy) - Approximate nearest neighbor search. [faiss](https://github.com/facebookresearch/faiss) - Approximate nearest neighbor search. -[pysparnn](https://github.com/facebookresearch/pysparnn) - Approximate nearest neighbor search. [infomap](https://github.com/mapequation/infomap) - Cluster (word-)vectors to find topics. [datasketch](https://github.com/ekzhu/datasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog). [flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando. From f2f0d214e5c8ddc9851cd7df6bb46c6d5ec3df86 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 10 Oct 2024 00:18:27 +0200 Subject: [PATCH 530/550] LSHForest --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 153c186..8abb9a2 100644 --- a/README.md +++ b/README.md @@ -413,6 +413,7 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [fastText](https://github.com/facebookresearch/fastText) - Efficient text classification and representation learning. [annoy](https://github.com/spotify/annoy) - Approximate nearest neighbor search. [faiss](https://github.com/facebookresearch/faiss) - Approximate nearest neighbor search. +[LSHForest](https://scikit-learn.org/0.16/modules/generated/sklearn.neighbors.LSHForest.html#sklearn.neighbors.LSHForest) - Locality-sensitive hashing (LSH) forest. [infomap](https://github.com/mapequation/infomap) - Cluster (word-)vectors to find topics. [datasketch](https://github.com/ekzhu/datasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog). [flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando. From 03007b4972f2b49a87614ebbf61bcf367a94c4d3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Thu, 10 Oct 2024 11:11:06 +0200 Subject: [PATCH 531/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 8abb9a2..153c186 100644 --- a/README.md +++ b/README.md @@ -413,7 +413,6 @@ Embeddings - [GloVe](https://nlp.stanford.edu/projects/glove/) ([[1](https://www [fastText](https://github.com/facebookresearch/fastText) - Efficient text classification and representation learning. [annoy](https://github.com/spotify/annoy) - Approximate nearest neighbor search. [faiss](https://github.com/facebookresearch/faiss) - Approximate nearest neighbor search. -[LSHForest](https://scikit-learn.org/0.16/modules/generated/sklearn.neighbors.LSHForest.html#sklearn.neighbors.LSHForest) - Locality-sensitive hashing (LSH) forest. [infomap](https://github.com/mapequation/infomap) - Cluster (word-)vectors to find topics. [datasketch](https://github.com/ekzhu/datasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog). [flair](https://github.com/zalandoresearch/flair) - NLP Framework by Zalando. From beafe8684d6bb2035a82e0d7e2c5bb6d15e22c36 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 22 Oct 2024 14:47:57 +0200 Subject: [PATCH 532/550] Update README.md --- README.md | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 153c186..42f236b 100644 --- a/README.md +++ b/README.md @@ -13,17 +13,12 @@ [rainbow-csv](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv) - VSCode plugin to display .csv files with nice colors. #### General Python Programming -[Python Best Practices Guide](https://medium.com/@mronakjain94/comprehensive-guide-to-installing-poetry-on-ubuntu-and-managing-python-projects-949b49ef4f76) [uv](https://github.com/astral-sh/uv) - Dependency management. -[pyenv](https://github.com/pyenv/pyenv) - Manage multiple Python versions on your system. -[poetry](https://github.com/python-poetry/poetry) - Dependency management. -[pyscaffold](https://github.com/pyscaffold/pyscaffold) - Python project template generator. -[hydra](https://github.com/facebookresearch/hydra) - Configuration management. -[hatch](https://github.com/pypa/hatch) - Python project management. +[python-dotenv](https://github.com/theskumar/python-dotenv) - Manage environment variables. +[structlog](https://github.com/hynek/structlog) - Python logging. [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. [tqdm](https://github.com/tqdm/tqdm) - Progress bars for for-loops. Also supports [pandas apply()](https://stackoverflow.com/a/34365537/1820480). -[loguru](https://github.com/Delgan/loguru) - Python logging. -[python-dotenv](https://github.com/theskumar/python-dotenv) - Manage environment variables. +[hydra](https://github.com/facebookresearch/hydra) - Configuration management. #### Pandas Tricks, Alternatives and Additions [pandasvault](https://github.com/firmai/pandasvault) - Large collection of pandas tricks. From 79b606628530699c7e6fb7aa23f2697865632912 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 22 Oct 2024 14:49:22 +0200 Subject: [PATCH 533/550] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 42f236b..000fda0 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ #### General Python Programming [uv](https://github.com/astral-sh/uv) - Dependency management. +[just](https://github.com/casey/just) - Command runner. Replacement for make. [python-dotenv](https://github.com/theskumar/python-dotenv) - Manage environment variables. [structlog](https://github.com/hynek/structlog) - Python logging. [more_itertools](https://more-itertools.readthedocs.io/en/latest/) - Extension of itertools. From 53d0447543be267221d0d36951eb237c66700c24 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 27 Oct 2024 15:55:35 +0100 Subject: [PATCH 534/550] Added 3 R dataset collections --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 000fda0..bc4fcb1 100644 --- a/README.md +++ b/README.md @@ -79,7 +79,10 @@ #### Classical Statistics ##### Datasets -[Rdatasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html) - Collection of more than 2000 datasets, stored as csv files. +[Rdatasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html) - Collection of more than 2000 datasets, stored as csv files (R package). +[MedDataSets](https://lightbluetitan.github.io/meddatasets/index.html) - Datasets related to medicine, diseases, treatments, drugs, and public health (R package). +[usdatasets](https://lightbluetitan.github.io/usdatasets/) - US-exclusive datasets (crime, economics, education, finance, energy, healthcare) (R package). +[timeseriesdatasets_R](https://lightbluetitan.github.io/timeseriesdatasets_R/) - Time series datasets (R package). ##### p-values [The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) From c7c939d2dc7cd963554c5f63fc8bd57d1052924a Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sun, 10 Nov 2024 10:50:45 +0100 Subject: [PATCH 535/550] typo fixed --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bc4fcb1..a1116b1 100644 --- a/README.md +++ b/README.md @@ -722,7 +722,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po ##### Transformers [The Annotated Transformer](https://nlp.seas.harvard.edu/annotated-transformer/) - Intro to transformers. -[Transformers from Scratch](https://e2eml.school/transformers.html] - Intro. +[Transformers from Scratch](https://e2eml.school/transformers.html) - Intro. [Neural Networks: Zero to Hero](https://karpathy.ai/zero-to-hero.html) - Video series on building neural networks. [SegFormer](https://github.com/NVlabs/SegFormer) - Simple and Efficient Design for Semantic Segmentation with Transformers. [esvit](https://github.com/microsoft/esvit) - Efficient self-supervised Vision Transformers. From 42f7554f8a04e9b913cd85872ead18a418131080 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 16 Nov 2024 15:27:16 +0100 Subject: [PATCH 536/550] BiaPy Paper --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a1116b1..e216b9d 100644 --- a/README.md +++ b/README.md @@ -512,7 +512,7 @@ AutoUnmix - [Link](https://www.biorxiv.org/content/10.1101/2023.05.30.542836v1.f ##### Microscopy Pipelines Labsyspharm Stack see below. -[BiaPy](https://github.com/danifranco/BiaPy) - Bioimage analysis pipelines. +[BiaPy](https://github.com/danifranco/BiaPy) - Bioimage analysis pipelines, [paper](https://www.biorxiv.org/content/10.1101/2024.02.03.576026v2.full). [SCIP](https://scalable-cytometry-image-processing.readthedocs.io/en/latest/usage.html) - Image processing pipeline on top of Dask. [DeepCell Kiosk](https://github.com/vanvalenlab/kiosk-console/tree/master) - Image analysis platform. [IMCWorkflow](https://github.com/BodenmillerGroup/IMCWorkflow/) - Image analysis pipeline using [steinbock](https://github.com/BodenmillerGroup/steinbock), [Twitter](https://twitter.com/NilsEling/status/1715020265963258087), [Paper](https://www.nature.com/articles/s41596-023-00881-0), [workflow](https://bodenmillergroup.github.io/IMCDataAnalysis/). From 38f6ed80af3e79206e809821c36398a8681e4e6e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Nov 2024 10:01:10 +0100 Subject: [PATCH 537/550] MedImageInsight --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index e216b9d..afc3b50 100644 --- a/README.md +++ b/README.md @@ -545,6 +545,7 @@ Review of organoid pipelines - [Paper](https://arxiv.org/ftp/arxiv/papers/2301/2 [Segment-Everything-Everywhere-All-At-Once](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once) - Segment Everything Everywhere All at Once from Microsoft. [deepcell-tf](https://github.com/vanvalenlab/deepcell-tf/tree/master) - Cell segmentation, [DeepCell](https://deepcell.org/). [labkit](https://github.com/juglab/labkit-ui) - Fiji plugin for image segmentation. +[MedImageInsight](https://arxiv.org/abs/2410.06542) - Open-Source Embedding Model for General Domain Medical Imaging. ##### Cell Segmentation Datasets [cellpose](https://www.cellpose.org/dataset) - Cell images. From 444f6713bf258c366b0c4a3d16e1195b028ed079 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 19 Nov 2024 12:41:30 +0100 Subject: [PATCH 538/550] CHIEF --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index afc3b50..817aa32 100644 --- a/README.md +++ b/README.md @@ -545,7 +545,8 @@ Review of organoid pipelines - [Paper](https://arxiv.org/ftp/arxiv/papers/2301/2 [Segment-Everything-Everywhere-All-At-Once](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once) - Segment Everything Everywhere All at Once from Microsoft. [deepcell-tf](https://github.com/vanvalenlab/deepcell-tf/tree/master) - Cell segmentation, [DeepCell](https://deepcell.org/). [labkit](https://github.com/juglab/labkit-ui) - Fiji plugin for image segmentation. -[MedImageInsight](https://arxiv.org/abs/2410.06542) - Open-Source Embedding Model for General Domain Medical Imaging. +[MedImageInsight](https://arxiv.org/abs/2410.06542) - Embedding Model for General Domain Medical Imaging. +[CHIEF](https://github.com/hms-dbmi/CHIEF) - Clinical Histopathology Imaging Evaluation Foundation Model. ##### Cell Segmentation Datasets [cellpose](https://www.cellpose.org/dataset) - Cell images. From aea211d3432f69e26eb2a1ea796f3cde75a69af3 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 30 Nov 2024 08:52:23 +0100 Subject: [PATCH 539/550] supertree --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 817aa32..8e5f597 100644 --- a/README.md +++ b/README.md @@ -398,6 +398,7 @@ Why the default feature importance for random forests is wrong: [link](http://ex [merf](https://github.com/manifoldai/merf) - Mixed Effects Random Forest for Clustering, [video](https://www.youtube.com/watch?v=gWj4ZwB7f3o) [groot](https://github.com/tudelft-cda-lab/GROOT) - Robust decision trees. [linear-tree](https://github.com/cerlymarco/linear-tree) - Trees with linear models at the leaves. +[supertree](https://github.com/mljar/supertree) - Decision tree visualization. #### Natural Language Processing (NLP) / Text Processing [talk](https://www.youtube.com/watch?v=6zm9NC9uRkk)-[nb](https://nbviewer.jupyter.org/github/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb), [nb2](https://ahmedbesbes.com/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html), [talk](https://www.youtube.com/watch?time_continue=2&v=sI7VpFNiy_I). From 285a95fb15d4061141d49f9e10691f22838652a1 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 3 Dec 2024 11:54:59 +0100 Subject: [PATCH 540/550] Update README.md 1 dataset 100 viz + The Return of Pseudosciences paper --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 8e5f597..95ecbd7 100644 --- a/README.md +++ b/README.md @@ -130,6 +130,7 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal ##### Visualizations [Friends don't let friends make certain types of data visualization](https://github.com/cxli233/FriendsDontLetFriends) [Great Overview over Visualizations](https://textvis.lnu.se/) +[1 dataset, 100 visualizations](https://100.datavizproject.com/) [Dependent Propabilities](https://static.laszlokorte.de/stochastic/) [Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https://rpsychologist.com/d3/NHST/) [Correlation](https://rpsychologist.com/d3/correlation/) @@ -846,6 +847,9 @@ Other measures: #### Multi-label classification [scikit-multilearn](https://github.com/scikit-multilearn/scikit-multilearn) - Multi-label classification, [talk](https://www.youtube.com/watch?v=m-tAASQA7XQ&t=18m57s). +#### Critical AI Texts +[Sublime - The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?](https://arxiv.org/abs/2411.18656) + #### Signal Processing and Filtering [Stanford Lecture Series on Fourier Transformation](https://see.stanford.edu/Course/EE261), [Youtube](https://www.youtube.com/watch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1), [Lecture Notes](https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf). [Visual Fourier explanation](https://dsego.github.io/demystifying-fourier/). From 1baa3f71f8860eb93b3323713e049f1e4b090ff2 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 6 Jan 2025 15:52:20 +0100 Subject: [PATCH 541/550] pgvector --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 95ecbd7..3a5db68 100644 --- a/README.md +++ b/README.md @@ -1141,6 +1141,7 @@ AlphaZero methodology - [1](https://github.com/AppliedDataSciencePartners/DeepRe [dvc](https://github.com/iterative/dvc) - Version control for large files. [kedro](https://github.com/quantumblacklabs/kedro) - Build data pipelines. [feast](https://github.com/feast-dev/feast) - Feature store. [Video](https://www.youtube.com/watch?v=_omcXenypmo). +[pgvector](https://github.com/pgvector/pgvector) - Vector similarity search for Postgres. [pinecone](https://www.pinecone.io/) - Database for vector search applications. [truss](https://github.com/basetenlabs/truss) - Serve ML models. [milvus](https://github.com/milvus-io/milvus) - Vector database for similarity search. From f4badbe276bccc2c62be195d7664a7590a4ba05b Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 6 Jan 2025 21:13:42 +0100 Subject: [PATCH 542/550] Time Series Anomaly Detection Review Paper --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3a5db68..13ec486 100644 --- a/README.md +++ b/README.md @@ -869,6 +869,7 @@ Other measures: [geomstats](https://github.com/geomstats/geomstats) - Computations and statistics on manifolds with geometric structures. #### Time Series +[Time Series Anomaly Detection Review Paper](https://arxiv.org/abs/2412.20512) [statsmodels](https://www.statsmodels.org/dev/tsa.html) - Time series analysis, [seasonal decompose](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) [example](https://gist.github.com/balzer82/5cec6ad7adc1b550e7ee), [SARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html), [granger causality](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.grangercausalitytests.html). [kats](https://github.com/facebookresearch/kats) - Time series prediction library by Facebook. [prophet](https://github.com/facebook/prophet) - Time series prediction library by Facebook. From a59241fa3830f5cd2a65050f3671399ae0d2fd7e Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 7 Jan 2025 23:49:20 +0100 Subject: [PATCH 543/550] Google Tuning Playbook --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 13ec486..cda3458 100644 --- a/README.md +++ b/README.md @@ -605,6 +605,7 @@ Review of organoid pipelines - [Paper](https://arxiv.org/ftp/arxiv/papers/2301/2 [Intro to semi-supervised learning](https://lilianweng.github.io/lil-log/2021/12/05/semi-supervised-learning.html). ##### Tutorials & Viewer +[Google Tuning Playbook](https://github.com/google-research/tuning_playbook) - A playbook for systematically maximizing the performance of deep learning models by Google. [fast.ai course](https://course.fast.ai/) - Practical Deep Learning for Coders. [Tensorflow without a PhD](https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd) - Neural Network course by Google. Feature Visualization: [Blog](https://distill.pub/2017/feature-visualization/), [PPT](http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture12.pdf) From 3815bf9d1674191f83d4b2c8abcc76362c93e74f Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Wed, 8 Jan 2025 21:27:39 +0100 Subject: [PATCH 544/550] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index cda3458..c6c5627 100644 --- a/README.md +++ b/README.md @@ -62,7 +62,6 @@ [spark](https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html#work-with-dataframes) - `DataFrame` for big data, [cheatsheet](https://gist.github.com/crawles/b47e23da8218af0b9bd9d47f5242d189), [tutorial](https://github.com/ericxiao251/spark-syntax). [dask](https://github.com/dask/dask), [dask-ml](http://ml.dask.org/) - Pandas `DataFrame` for big data and machine learning library, [resources](https://matthewrocklin.com/blog//work/2018/07/17/dask-dev), [talk1](https://www.youtube.com/watch?v=ccfsbuqsjgI), [talk2](https://www.youtube.com/watch?v=RA_2qdipVng), [notebooks](https://github.com/dask/dask-ec2/tree/master/notebooks), [videos](https://www.youtube.com/user/mdrocklin). [h2o](https://github.com/h2oai/h2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes. -[datatable](https://github.com/h2oai/datatable) - Data Table for big data support. [cuDF](https://github.com/rapidsai/cudf) - GPU DataFrame Library, [Intro](https://www.youtube.com/watch?v=6XzS5XcpicM&t=2m50s). [cupy](https://github.com/cupy/cupy) - NumPy-like API accelerated with CUDA. [ray](https://github.com/ray-project/ray/) - Flexible, high-performance distributed execution framework. From 0aa801c6ac8080a38b1e22510906a48c56c772d5 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Jan 2025 08:56:15 +0100 Subject: [PATCH 545/550] Added R datasets --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c6c5627..bfc1d78 100644 --- a/README.md +++ b/README.md @@ -79,9 +79,13 @@ ##### Datasets [Rdatasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html) - Collection of more than 2000 datasets, stored as csv files (R package). +[crimedatasets](https://lightbluetitan.github.io/crimedatasets/) - Datasets focused on crimes, criminal activities (R package). +[educationr](https://lightbluetitan.github.io/educationr/) - Datasets related to education (performance, learning methods, test scores, absenteeism) (R package). [MedDataSets](https://lightbluetitan.github.io/meddatasets/index.html) - Datasets related to medicine, diseases, treatments, drugs, and public health (R package). -[usdatasets](https://lightbluetitan.github.io/usdatasets/) - US-exclusive datasets (crime, economics, education, finance, energy, healthcare) (R package). +[oncodatasets](https://lightbluetitan.github.io/oncodatasets/) - Datasets focused on cancer research, survival rates, genetic studies, biomarkers, epidemiology (R package). [timeseriesdatasets_R](https://lightbluetitan.github.io/timeseriesdatasets_R/) - Time series datasets (R package). +[usdatasets](https://lightbluetitan.github.io/usdatasets/) - US-exclusive datasets (crime, economics, education, finance, energy, healthcare) (R package). + ##### p-values [The ASA Statement on p-Values: Context, Process, and Purpose](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN) From 4e1271bee8d2edf72314c26753e69fa84184cf83 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Tue, 14 Jan 2025 17:46:33 +0100 Subject: [PATCH 546/550] darts --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index bfc1d78..9e42355 100644 --- a/README.md +++ b/README.md @@ -875,6 +875,7 @@ Other measures: #### Time Series [Time Series Anomaly Detection Review Paper](https://arxiv.org/abs/2412.20512) [statsmodels](https://www.statsmodels.org/dev/tsa.html) - Time series analysis, [seasonal decompose](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) [example](https://gist.github.com/balzer82/5cec6ad7adc1b550e7ee), [SARIMA](https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html), [granger causality](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.grangercausalitytests.html). +[darts](https://github.com/unit8co/darts) - Time Series library (LightGBM, Neural Networks). [kats](https://github.com/facebookresearch/kats) - Time series prediction library by Facebook. [prophet](https://github.com/facebook/prophet) - Time series prediction library by Facebook. [neural_prophet](https://github.com/ourownstory/neural_prophet) - Time series prediction built on PyTorch. From 7687443b8bdf101ab79cc05fd07c7284d49e85ba Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Mon, 3 Mar 2025 18:57:07 +0100 Subject: [PATCH 547/550] Statistical Inference and Regression book --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9e42355..6d18279 100644 --- a/README.md +++ b/README.md @@ -760,6 +760,7 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po Legate Numpy - Distributed Numpy array multiple using GPUs by Nvidia (not released yet) [video](https://www.youtube.com/watch?v=Jxxs_moibog). #### Regression +Good introduction: [A User’s Guide to Statistical Inference and Regression](https://mattblackwell.github.io/gov2002-book/) Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf), [forum](https://www.quora.com/How-does-support-vector-regression-work), [paper](http://alex.smola.org/papers/2003/SmoSch03b.pdf) [pyearth](https://github.com/scikit-learn-contrib/py-earth) - Multivariate Adaptive Regression Splines (MARS), [tutorial](https://uc-r.github.io/mars). From ef03448c2969561c518ef32679ea6a9b12935f93 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 7 Mar 2025 23:48:00 +0100 Subject: [PATCH 548/550] Applied Machine Learning in Python, Ridgeplot --- README.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 6d18279..2a1ebc5 100644 --- a/README.md +++ b/README.md @@ -183,6 +183,10 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [quartets](https://github.com/r-causal/quartets) - Anscombe’s Quartet, Causal Quartet, [Datasaurus Dozen](https://github.com/jumpingrivers/datasauRus) and others (R package). [episensr](https://cran.r-project.org/web/packages/episensr/vignettes/episensr.html) - Quantitative Bias Analysis for Epidemiologic Data (=simulation of possible effects of different sources of bias) (R package). +#### Machine Learning Tutorials +[Statistical Inference and Regression](https://mattblackwell.github.io/gov2002-book/) +[Applied Machine Learning in Python](https://geostatsguy.github.io/MachineLearningDemos_Book/intro.html) +[Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. #### Exploration and Cleaning [Checklist](https://github.com/r0f1/ml_checklist). @@ -218,9 +222,6 @@ Bland-Altman Plot [1](https://pingouin-stats.org/generated/pingouin.plot_blandal [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines. [feature-engine](https://github.com/feature-engine/feature_engine) - Encoders, transformers, etc. -#### Computer Vision -[Intro to Computer Vision](https://www.youtube.com/playlist?list=PLjMXczUzEYcHvw5YYSU92WrY8IwhTuq7p) - #### Feature Selection [Overview Paper](https://www.sciencedirect.com/science/article/pii/S016794731930194X), [Talk](https://www.youtube.com/watch?v=JsArBz46_3s), [Repo](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection) Blog post series - [1](http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/), [2](http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/), [3](http://blog.datadive.net/selecting-good-features-part-iii-random-forests/), [4](http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/) @@ -309,6 +310,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [physt](https://github.com/janpipek/physt) - Better histograms, [talk](https://www.youtube.com/watch?v=ZG-wH3-Up9Y), [notebook](https://nbviewer.jupyter.org/github/janpipek/pydata2018-berlin/blob/master/notebooks/talk.ipynb). [fast-histogram](https://github.com/astrofrog/fast-histogram) - Fast histograms. [matplotlib_venn](https://github.com/konstantint/matplotlib-venn) - Venn diagrams, [alternative](https://github.com/penrose/penrose). +[ridgeplot](https://github.com/tpvasconcelos/ridgeplot) - Ridge plots. [joypy](https://github.com/sbebo/joypy) - Draw stacked density plots (=ridge plots), [Ridge plots in seaborn](https://seaborn.pydata.org/examples/kde_ridgeplot.html). [mosaic plots](https://www.statsmodels.org/dev/generated/statsmodels.graphics.mosaicplot.mosaic.html) - Categorical variable visualization, [example](https://sukhbinder.wordpress.com/2018/09/18/mosaic-plot-in-python/). [scikit-plot](https://github.com/reiinakano/scikit-plot) - ROC curves and other visualizations for ML models. @@ -601,7 +603,6 @@ Review of organoid pipelines - [Paper](https://arxiv.org/ftp/arxiv/papers/2301/2 [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) - Deep Learning Based Molecular Modelling and Prediction Toolkit. #### Neural Networks -[Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) - Stanford CS class. [mit6874](https://mit6874.github.io/) - Computational Systems Biology: Deep Learning in the Life Sciences. [ConvNet Shape Calculator](https://madebyollin.github.io/convnet-calculator/) - Calculate output dimensions of Conv2D layer. [Great Gradient Descent Article](https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9). @@ -760,7 +761,6 @@ Cell Segmentation - [Talk](https://www.youtube.com/watch?v=dVFZpodqJiI), Blog Po Legate Numpy - Distributed Numpy array multiple using GPUs by Nvidia (not released yet) [video](https://www.youtube.com/watch?v=Jxxs_moibog). #### Regression -Good introduction: [A User’s Guide to Statistical Inference and Regression](https://mattblackwell.github.io/gov2002-book/) Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf), [forum](https://www.quora.com/How-does-support-vector-regression-work), [paper](http://alex.smola.org/papers/2003/SmoSch03b.pdf) [pyearth](https://github.com/scikit-learn-contrib/py-earth) - Multivariate Adaptive Regression Splines (MARS), [tutorial](https://uc-r.github.io/mars). @@ -768,7 +768,6 @@ Understanding SVM Regression: [slides](https://cs.adelaide.edu.au/~chhshen/teach [GLRM](https://github.com/madeleineudell/LowRankModels.jl) - Generalized Low Rank Models. [tweedie](https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tweedie-regression-objective-reg-tweedie) - Specialized distribution for zero inflated targets, [Talk](https://www.youtube.com/watch?v=-o0lpHBq85I). [MAPIE](https://github.com/scikit-learn-contrib/MAPIE) - Estimating prediction intervals. -[Regressio](https://github.com/brendanartley/Regressio) - Regression and Spline models. #### Polynomials [orthopy](https://github.com/nschloe/orthopy) - Orthogonal polynomials in all shapes and sizes. From 7e7c213a75fb7023bbe503a8c7b88fa391197228 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Fri, 7 Mar 2025 23:53:29 +0100 Subject: [PATCH 549/550] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2a1ebc5..0575777 100644 --- a/README.md +++ b/README.md @@ -949,7 +949,7 @@ Tutorial on using cvxpy: [1](https://calmcode.io/cvxpy-one/the-stigler-diet.html [Time-dependent Cox Model in R](https://stats.stackexchange.com/questions/101353/cox-regression-with-time-varying-covariates). [lifelines](https://lifelines.readthedocs.io/en/latest/) - Survival analysis, Cox PH Regression, [talk](https://www.youtube.com/watch?v=aKZQUaNHYb0), [talk2](https://www.youtube.com/watch?v=fli-yE5grtY). [scikit-survival](https://github.com/sebp/scikit-survival) - Survival analysis. -[xgboost](https://github.com/dmlc/xgboost) - `"objective": "survival:cox"` [NHANES example](https://slundberg.github.io/shap/notebooks/NHANES%20I%20Survival%20Model.html) +[xgboost](https://github.com/dmlc/xgboost) - `"objective": "survival:cox"` [NHANES example](https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/NHANES%20I%20Survival%20Model.html) [survivalstan](https://github.com/hammerlab/survivalstan) - Survival analysis, [intro](http://www.hammerlab.org/2017/06/26/introducing-survivalstan/). [convoys](https://github.com/better/convoys) - Analyze time lagged conversions. RandomSurvivalForests (R packages: randomForestSRC, ggRandomForests). From bec82aed08e30e6b4ab61fcc15a80f5244eb6e36 Mon Sep 17 00:00:00 2001 From: Florian Rohrer Date: Sat, 15 Mar 2025 19:30:52 +0100 Subject: [PATCH 550/550] fastplotlib --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0575777..2898169 100644 --- a/README.md +++ b/README.md @@ -336,6 +336,7 @@ Faster t-SNE implementations: [lvdmaaten](https://lvdmaaten.github.io/tsne/), [M [proplot](https://github.com/proplot-dev/proplot) - Matplotlib wrapper. [morpheus](https://software.broadinstitute.org/morpheus/) - Broad Institute tool matrix visualization and analysis software. [Source](https://github.com/cmap/morpheus.js), Tutorial: [1](https://www.youtube.com/watch?v=0nkYDeekhtQ), [2](https://www.youtube.com/watch?v=r9mN6MsxUb0), [Code](https://github.com/broadinstitute/BBBC021_Morpheus_Exercise). [jupyter-scatter](https://github.com/flekschas/jupyter-scatter) - Interactive 2D scatter plot widget for Jupyter. +[fastplotlib](https://github.com/fastplotlib/fastplotlib) - Fast plotting library using pygfx. #### Colors [palettable](https://github.com/jiffyclub/palettable) - Color palettes from [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3).