Hey there! 👋 This is a personal learning project where I'm exploring data mining using both R and Python — mostly focused on scraping, retrieving, and preparing data that I plan to use later for training machine learning models (also part of my learning journey!).
I'm still in the beginner phase, so this repo is more of a hands-on playground for experimenting, making mistakes, and figuring things out as I go. If you're learning too, feel free to follow along!
- 🧠 What This Repo Is About
- 📁 Folder & Script Info
- 🛠 Requirements
- 🚀 Getting Started
- 🌱 Learning Goals
- 💬 Notes
- ⭐️ Support the Journey
- 🔖 Hashtags
- Collecting data from various sources using R and Python
- Prepping and cleaning the data to make it usable
- Structuring the datasets for future ML experiments
- Keeping things as simple and understandable as possible
You'll find two types of scripts here:
- Scrape or retrieve data using libraries like
rvest
orhttr
- Convert and clean raw data
- Store it for later ML use
- Scrape websites using
requests
,BeautifulSoup
, orSelenium
- Convert scraped content to structured formats (CSV, JSON)
- Prep the data for ML tasks using libraries like
pandas
The code might be a little messy or commented for my own understanding — it's all part of the learning process.
- R (version 3.6 or higher recommended)
- RStudio (optional but helpful)
- These packages installed:
install.packages(c("rvest", "httr", "jsonlite", "dplyr"))
- Python 3.7+
- Recommended to use a virtual environment
- Required packages:
pip install requests beautifulsoup4 pandas selenium
- Clone the repo:
git clone https://github.com/amehnd/Data_mining_R_n_Python.git
- Choose a script (
.R
or.py
) and run it in your environment - Experiment, modify, break things, and learn from it
- Document what you learn (like I do here)
- Get better at scraping and handling data using both R and Python
- Learn how to clean, transform, and prepare datasets for ML
- Build confidence in data engineering skills before diving deeper into ML
- This project is a learning log, not a polished product
- I'm learning as I go — so if you see something that can be improved, feel free to share!
- Suggestions, tips, or PRs are welcome (as long as they’re kind ✌️)
If this repo helped you or you're also learning —
make sure to give it a star ⭐️ and follow along as I keep building and learning!
#rstats
#Python
#WebScraping
#DataMining
#MachineLearning
#LearningInPublic
#DataScience