-
dbrx Public
Forked from databricks/dbrxCode examples and resources for DBRX, a large language model developed by Databricks
Python Other UpdatedJun 24, 2025 -
awesome-databricks Public
Forked from reisdebora/awesome-databricksA curated list of awesome Databricks resources, including Spark
Creative Commons Zero v1.0 Universal UpdatedJun 18, 2025 -
awesome-azure-databricks Public
Forked from tfayyaz/awesome-azure-databricksAwesome content all about Azure Databricks
Creative Commons Zero v1.0 Universal UpdatedJun 18, 2025 -
Spark-with-Python Public
Forked from tirthajyoti/Spark-with-PythonFundamentals of Spark with Python (using PySpark), code examples
Jupyter Notebook MIT License UpdatedMay 29, 2025 -
Spark-Programming-In-Python Public
Forked from LearningJournal/Spark-Programming-In-PythonApache Spark 3 - Spark Programming in Python for Beginners
-
Cookbook Public
Forked from andkret/CookbookThe Data Engineering Cookbook
Python Apache License 2.0 UpdatedMay 9, 2025 -
spark-testing-base Public
Forked from MrPowers/spark-testing-baseBase classes to use when writing tests with Spark
Scala Apache License 2.0 UpdatedMay 9, 2025 -
devrel Public
Forked from databricks/devrelThis repository contains the notebooks and presentations we use for our Databricks Tech Talks
HTML UpdatedMay 9, 2025 -
delta Public
Forked from delta-io/deltaAn open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Scala Apache License 2.0 UpdatedMay 9, 2025 -
koalas Public
Forked from databricks/koalasKoalas: pandas API on Apache Spark
Python Apache License 2.0 UpdatedMay 9, 2025 -
sparkling-water Public
Forked from h2oai/sparkling-waterSparkling Water provides H2O functionality inside Spark cluster
Scala Apache License 2.0 UpdatedMay 9, 2025 -
spark-nlp Public
Forked from JohnSnowLabs/spark-nlpState of the Art Natural Language Processing
Scala Apache License 2.0 UpdatedMay 9, 2025 -
spark-gdelt Public
Forked from aamend/spark-gdeltBinding the GDELT universe in a Spark environment
Scala Apache License 2.0 UpdatedMay 9, 2025 -
LearningSparkV2 Public
Forked from databricks/LearningSparkV2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Scala Apache License 2.0 UpdatedMay 9, 2025 -
pyspark-examples Public
Forked from spark-examples/pyspark-examplesPyspark RDD, DataFrame and Dataset Examples in Python language
Python UpdatedMay 9, 2025 -
pyspark-cheatsheet Public
Forked from kevinschaich/pyspark-cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
MIT License UpdatedMay 9, 2025 -
awesome-spark Public
Forked from awesome-spark/awesome-sparkA curated list of awesome Apache Spark packages and resources.
Shell Creative Commons Zero v1.0 Universal UpdatedMay 9, 2025 -
awesome-compliance Public
Forked from getprobo/awesome-complianceA curated list of tools, frameworks, and resources for IT compliance, security standards, and regulatory requirements
Creative Commons Zero v1.0 Universal UpdatedMay 1, 2025 -
GRC Public
Forked from K-Kilpatrick/GRCModule 2: Introduction to Security Within the Organization, Risk Management and Threat Modeling, Governance Frameworks, Compliance, and BCP/DR
UpdatedMay 1, 2025 -
awesome-security-GRC Public
Forked from Arudjreis/awesome-security-GRCCurated list of resources for security Governance, Risk Management, Compliance and Audit professionals and enthusiasts (if they exist).
UpdatedMay 1, 2025 -
GRC-Cybersecurity Public
Forked from MenakaGodakanda/GRC-CybersecurityThis project demonstrates a comprehensive understanding of Governance, Risk, and Compliance (GRC) in the field of cybersecurity. It includes scripts and tools to automate risk assessment and compli…
Python MIT License UpdatedMay 1, 2025 -
unicis-platform-ce Public
Forked from UnicisTech/unicis-platform-ceA modern, all-in-one Governance, Risk & Compliance (GRC) solution designed for privacy, security, and compliance teams. As an open-source alternative to Vanta and Drata, this platform empowers team…
TypeScript Apache License 2.0 UpdatedMay 1, 2025 -
data-engineering-devops Public
Forked from ssp-data/data-engineering-devopsFull stack data engineering tools and infrastructure set-up
Python UpdatedJan 3, 2025 -
practical-data-engineering Public
Forked from ssp-data/practical-data-engineeringPractical Data Engineering: A Hands-On Real-Estate Project Guide
-
youtube_data_engineering_project Public
Forked from Proggleb/youtube_data_engineering_projectData Engineering Project: Extracting music video metrics of Twice using YouTube API, AWS, and Tableau
-
data-engineer-handbook Public
Forked from DataExpert-io/data-engineer-handbookThis is a repo with links to everything you'd ever want to learn about data engineering
Jupyter Notebook UpdatedDec 3, 2024 -
llm-driven-data-engineering Public
Forked from DataExpert-io/llm-driven-data-engineeringThis is a public repository to go over all the LLM-driven data engineering concepts.
Python UpdatedOct 26, 2024 -
data-cleaning-with-pyspark-live-training Public
Forked from datacamp/data-cleaning-with-pyspark-live-trainingLive Training Session: Cleaning Data with Pyspark
-
PySpark-and-MLlib Public
Forked from susanli2016/PySpark-and-MLlibGetting start with PySpark and MLlib
Jupyter Notebook UpdatedSep 24, 2024 -
spark-py-notebooks Public
Forked from jadianes/spark-py-notebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks