Skip to content

Commit 40ce9f6

Browse files
Add webscraping tools and organize visualization category (krzjoa#20)
* Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Set theme jekyll-theme-minimal * Set theme jekyll-theme-cayman
1 parent 117236c commit 40ce9f6

File tree

2 files changed

+26
-7
lines changed

2 files changed

+26
-7
lines changed

README.md

+25-7
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
## Contents
2020
* [Machine Learning](#machine-learning)
2121
* [Deep Learning](#deep-learning)
22+
* [Web Scraping](#web-scraping)
2223
* [Data Manipulation](#data-manipulation)
2324
* [Feature Engineering](#feature-engineering)
2425
* [Visualization](#visualization)
@@ -187,10 +188,18 @@
187188
* [Caffe2](https://github.com/pytorch/pytorch/tree/master/caffe2) - A lightweight, modular, and scalable deep learning framework (now a part of PyTorch).
188189
* [hipCaffe](https://github.com/ROCmSoftwarePlatform/hipCaffe) - The HIP port of Caffe. <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
189190

191+
## Web Scraping
192+
* [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/): The easiest library to scrape static websites for beginners
193+
* [Scrapy](https://scrapy.org/): Fast and extensible scraping library. Can write rules and create customized scraper without touching the coure
194+
* [Selenium](https://selenium-python.readthedocs.io/installation.html#introduction): Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.
195+
* [Pattern](https://github.com/clips/pattern): High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
196+
* [twitterscraper](https://github.com/taspinar/twitterscraper): Efficient library to scrape twitter
197+
190198
## Data Manipulation
191199

192200
### Data Containers
193201
* [pandas](https://pandas.pydata.org/pandas-docs/stable/) - Powerful Python data analysis toolkit.
202+
* [pandas_profiling](https://github.com/pandas-profiling/pandas-profiling) - Create HTML profiling reports from pandas DataFrame objects
194203
* [cuDF](https://github.com/rapidsai/cudf) - GPU DataFrame Library. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
195204
* [blaze](https://github.com/blaze/blaze) - NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
196205
* [pandasql](https://github.com/yhat/pandasql) - Allows you to query pandas DataFrames using SQL syntax. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
@@ -217,6 +226,7 @@
217226
* [meza](https://github.com/reubano/meza) - A Python toolkit for processing tabular data.
218227
* [Prodmodel](https://github.com/prodmodel/prodmodel) - Build system for data science pipelines.
219228
* [dopanda](https://github.com/dovpanda-dev/dovpanda) - Hints and tips for using pandas in an analysis environment. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
229+
* [CircleCi](https://circleci.com/): Automates your software builds, tests, and deployments.
220230

221231
## Feature Engineering
222232

@@ -235,32 +245,39 @@
235245
* [scikit-rebate](https://github.com/EpistasisLab/scikit-rebate) - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
236246

237247
## Visualization
248+
### General Purposes
238249
* [Matplotlib](https://github.com/matplotlib/matplotlib) - Plotting with Python.
239250
* [seaborn](https://github.com/mwaskom/seaborn) - Statistical data visualization using matplotlib.
240-
* [Bokeh](https://github.com/bokeh/bokeh) - Interactive Web Plotting for Python.
241-
* [HoloViews](https://github.com/ioam/holoviews) - Stop plotting your data - annotate your data and let it visualize itself.
242251
* [prettyplotlib](https://github.com/olgabot/prettyplotlib) - Painlessly create beautiful matplotlib plots.
243252
* [python-ternary](https://github.com/marcharper/python-ternary) - Ternary plotting library for python with matplotlib.
244253
* [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization module for Python.
245254
* [chartify](https://github.com/spotify/chartify/) - Python library that makes it easy for data scientists to create charts.
246255
* [physt](https://github.com/janpipek/physt) - Improved histograms.
256+
### Interactive plots
247257
* [animatplot](https://github.com/t-makaro/animatplot) - A python package for animating plots build on matplotlib.
248258
* [plotly](https://plot.ly/python/) - A Python library that makes interactive and publication-quality graphs.
259+
* [Bokeh](https://github.com/bokeh/bokeh) - Interactive Web Plotting for Python.
260+
* [Altair](https://altair-viz.github.io/) - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
261+
* [bqplot](https://github.com/bqplot/bqplot) - Plotting library for IPython/Jupyter notebooks
262+
### Map
249263
* [folium](https://python-visualization.github.io/folium/quickstart.html#Getting-Started) - Makes it easy to visualize data on an interactive open street map
250264
* [geemap](https://github.com/giswqs/geemap) - Python package for interactive mapping with Google Earth Engine (GEE)
265+
### Automatic Plotting
266+
* [HoloViews](https://github.com/ioam/holoviews) - Stop plotting your data - annotate your data and let it visualize itself.
267+
* [AutoViz](https://github.com/AutoViML/AutoViz): Visualize data automatically with 1 line of code (ideal for machine learning)
268+
* [SweetViz](https://github.com/fbdesignpro/sweetviz): Visualize and compare datasets, target values and associations, with one line of code.
251269

252-
253-
## Deployment
254-
* [datapane](https://datapane.com/) - A collection of APIs to turn scripts and notebooks into interactive reports.
255-
* [fastapi](https://fastapi.tiangolo.com/) - Modern, fast (high-performance), web framework for building APIs with Python
256-
* [streamlit](https://www.streamlit.io/) - Make it easy to deploy machine learning model
270+
### NLP
271+
* [pyLDAvis](https://github.com/bmabey/pyLDAvis): Visualize interactive topic model
257272

258273

259274
## Deployment
260275
* [datapane](https://datapane.com/) - A collection of APIs to turn scripts and notebooks into interactive reports.
276+
* [binder](https://mybinder.org/) - Enable sharing and execute Jupyter Notebooks
261277
* [fastapi](https://fastapi.tiangolo.com/) - Modern, fast (high-performance), web framework for building APIs with Python
262278
* [streamlit](https://www.streamlit.io/) - Make it easy to deploy machine learning model
263279

280+
264281
## Model Explanation
265282
* [Alibi](https://github.com/SeldonIO/alibi) - Algorithms for monitoring and explaining machine learning models.
266283
* [anchor](https://github.com/marcotcr/anchor) - Code for "High-Precision Model-Agnostic Explanations" paper.
@@ -380,6 +397,7 @@
380397
* [flair](https://github.com/zalandoresearch/flair) - Very simple framework for state-of-the-art NLP.
381398
* [spaCy](https://spacy.io/) - Industrial-Strength Natural Language Processing.
382399

400+
383401
## Computer Audition
384402
* [librosa](https://github.com/librosa/librosa) - Python library for audio and music analysis.
385403
* [Yaafe](https://github.com/Yaafe/Yaafe) - Audio features extraction.

_config.yml

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
theme: jekyll-theme-cayman

0 commit comments

Comments
 (0)