Skip to content

josephmachado/advanced_spark_sql_for_data_engineers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feedback form

Live virtual workshop

The workshop will be streamed on YouTube live: Spark SQL Workshop: Advanced join & group by techniques. Post stream, it will be available to watch and follow at your own pace.

Live workshop

Setup

Codespaces

Note Please remember to switch off your code spaces.

  1. Start a codespace machine start code space.
  2. Wait for the terminal to start and then run the command docker compose up -d && sleep 30 on the terminal start containers.
  3. Click on ports tab -> click on the globe icon in the address for port 8888. Open Jupyter Notebook.
  4. Click on the notebooks folder and open adv_joins_group_by.ipynb. Jupyter .
  5. Open as jupyter lab, for better experience. Jupyter Lab

Follow along with the workshop!

Note remember to switch off codespaces as Codespaces off

Local with Docker

Prerequisites:

  1. docker & docker compose

Start the container by cloning the repo and starting the containers (note you will have to stop other containers that you mayh have runnign on port 8888 & 8080)

git clone https://github.com/josephmachado/advanced_spark_sql_for_data_engineers.git
cd advanced_spark_sql_for_data_engineers
docker compose up -d
sleep 30

Open Jupyter lab at http://localhost:8888/lab/tree/notebooks.

Spark UI is available at http://localhost:8080.

Stop container with

docker compose down

About

Advanced Spark SQL for Data Engineers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published