Skip to content

A learning project dedicated to data mining using R and Python. This repository contains scripts for web scraping, data retrieval, and data preparation, with the goal of creating datasets for future machine learning model training. The project is designed to help develop skills in data handling, processing, and structuring

Notifications You must be signed in to change notification settings

amehnd/Data_mining_R_n_Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊Data_mining_R_and_Python

Hey there! 👋 This is a personal learning project where I'm exploring data mining using both R and Python — mostly focused on scraping, retrieving, and preparing data that I plan to use later for training machine learning models (also part of my learning journey!).

I'm still in the beginner phase, so this repo is more of a hands-on playground for experimenting, making mistakes, and figuring things out as I go. If you're learning too, feel free to follow along!


📚 Table of Contents


🧠 What This Repo Is About

  • Collecting data from various sources using R and Python
  • Prepping and cleaning the data to make it usable
  • Structuring the datasets for future ML experiments
  • Keeping things as simple and understandable as possible

📁 Folder & Script Info

You'll find two types of scripts here:

🔵 R Scripts

  • Scrape or retrieve data using libraries like rvest or httr
  • Convert and clean raw data
  • Store it for later ML use

🟠 Python Scripts

  • Scrape websites using requests, BeautifulSoup, or Selenium
  • Convert scraped content to structured formats (CSV, JSON)
  • Prep the data for ML tasks using libraries like pandas

The code might be a little messy or commented for my own understanding — it's all part of the learning process.


🛠 Requirements

For R

  • R (version 3.6 or higher recommended)
  • RStudio (optional but helpful)
  • These packages installed:
install.packages(c("rvest", "httr", "jsonlite", "dplyr"))

For Python

  • Python 3.7+
  • Recommended to use a virtual environment
  • Required packages:
pip install requests beautifulsoup4 pandas selenium

🚀 Getting Started

  1. Clone the repo:
git clone https://github.com/amehnd/Data_mining_R_n_Python.git
  1. Choose a script (.R or .py) and run it in your environment
  2. Experiment, modify, break things, and learn from it
  3. Document what you learn (like I do here)

🌱 Learning Goals

  • Get better at scraping and handling data using both R and Python
  • Learn how to clean, transform, and prepare datasets for ML
  • Build confidence in data engineering skills before diving deeper into ML

💬 Notes

  • This project is a learning log, not a polished product
  • I'm learning as I go — so if you see something that can be improved, feel free to share!
  • Suggestions, tips, or PRs are welcome (as long as they’re kind ✌️)

⭐️ Support the Journey

If this repo helped you or you're also learning —
make sure to give it a star ⭐️ and follow along as I keep building and learning!


🔖 Hashtags

#rstats #Python #WebScraping #DataMining #MachineLearning #LearningInPublic #DataScience

About

A learning project dedicated to data mining using R and Python. This repository contains scripts for web scraping, data retrieval, and data preparation, with the goal of creating datasets for future machine learning model training. The project is designed to help develop skills in data handling, processing, and structuring

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published