Before we can run the hands-on workshop, a working infrastructure in Confluent Cloud must exist:
- an environment with the Schema Registry enabled
- a Confluent Cluster in one of the supported regions for the Flink provider
- 3 topics
- events generated by our Sample Data Datagen Source connector
And of course, we need a working account for Confluent Cloud to do all of this. Sign-up with Confluent Cloud is very easy and you will get a $400 budget for our Hands-on Workshop. If you don't have a working Confluent Cloud account please Sign-up to Confluent Cloud.
Now you have two possibilities to create the Hands-On Workshop Confluent Cloud resources:
- Let Terraform create it: If you are comfortable running Terraform, then follow this guide.
- Create all resources manually.
On your desktop, we expect that confluent CLI will be installed. Install the cli on your desktop. You need the CLI to run the Flink SQL shell. The shell gives you a better experience in the workshop. Please bring the client on the latest version (Version: v3.53.0):
confluent update
If you installed the CLI via brew install confluentinc/tap/cli
, rerun the command to bring the CLI to the latest version.
IMPORTANT TO KNOW FOR THE WORKSHOP: We are now running in AWS, Azure and GCP. Currently, we support 21 Regions. Please be aware that the cluster and the Flink Pool must be in the same Cloud-Provider-Region.
You can create each Confluent Cloud resource with the Confluent CLI tool, the Confluent Cloud Control Plane GUI or the Confluent Terraform Provider. Both are using the confluent cloud API in the background. If you want to use the CLI, you must install the CLI on your desktop. This workshop guide will cover the GUI only.
Login into Confluent Cloud and create an environment with Schema Registry:
- Click the
Add cloud environment
button - Enter a New environment name e.g.
handson-flink
and push thecreate
button - Choose Essentials Stream Governance package
The environment is ready to work and will create a Schema Registry in the region of the first cluster.
Next, create a Basic Cluster in your chosen environment based on the rule above.
Click the button Create cluster
- choose BASIC and the
Begin configuration
button to start the cluster creation config. - Choose your preferred region with a single zone and click
Continue
- Give the cluster a name, e.g.
cc_handson_cluster
and check rate card overview and configs, then pressLaunch cluster
The cluster will be up and running in seconds.
Now, we need three topics to store our events.
- shoe_products
- shoe_customers
- shoe_orders
Via the GUI the topic creation is straightforward.
Create a topic by clicking (left.hand menu) Topics and then clicking the Create topic
button.
- Topic name: shoe_products, Partitions: 1 and then click the
Create with defaults
button - Skip adding a data contract
- Repeat the same steps for shoe_customers and shoe_orders
Confluent has the Datagen connector, which is a test data generator. In Confluent Cloud a range of Quickstarts (predefined data) are available and will generate data of a given format. NOTE: We use Datagen with the following templates:
- Shoe Products https://github.com/confluentinc/kafka-connect-datagen/blob/master/src/main/resources/shoes.avro
- Shoe Customers https://github.com/confluentinc/kafka-connect-datagen/blob/master/src/main/resources/shoe_customers.avro
- Shoe Orders https://github.com/confluentinc/kafka-connect-datagen/blob/master/src/main/resources/shoe_orders.avro
Choose the Connectors
menu entry (left side) and search for Sample Data
. Click on the Sample Data Icon.
- Press "Additional configuration"
- Choose a topic:
shoe_products
and clickContinue
- Click My Account (already selected by default) and download the API Key. Typically, you will configure the connector with restrictive access to your resources (what we did in the terraform setup). For the hands-on a global key is sufficient. Click
Generate API Key & Download
, enter a descriptionDatagen Connector Products
and clickContinue
- Select the format
AVRO
, because Flink requires AVRO for now, and a template (Show more Option)Shoes
and clickContinue
- Check Summary, we will go with one Task (slider) and click
Continue
- Enter the name
DSoC_products
and finally clickContinue
Now, events will be produced into the topic shoe_products
generated from datagen connector DSoC_products
Click Stream Lineage
(left side) for your current data pipeline. Click on the topic shoe_products
and enter the description Shoe products
. This is how you place metadata on your data product.
Go back to your Cluster cc_handson_cluster
and create two more datagen connectors to fill the topics shoe_customers and shoe_orders, go to Connectors
and click Add Connector
. Pay attention when you select the template for the datagen connector and ensure that it corresponds with the selected topic as shown in the following. Deviations in this step will result in invalid queries later in the workshop.
- Connector Plug-in
Sample Data
, Topicshoe_customers
, Global Access and Download API Key with DescriptionDatagen Connector Customers
, FormatAVRO
, templateShoe customers
, 1 Task, Connector NameDSoC_customers
- Connector Plug-in
Sample Data
, Topicshoe_orders
, Global Access and Download API Key with DescriptionDatagen Connector Orders
, FormatAVRO
, templateShoe orders
, 1 Task, Connector NameDSoC_orders
Three Connectors are up and running and are generating data for us.
All three connectors generate events in AVRO format and automatically create a schema for all three topics.
You can have a look at the schema in the Schema Registry.
Or use the topic viewer, where you can
- View the events flying in
- all metadata information
- configs
- and schemas as well
The preparation is finished, well done.
The infrastructure for the Hands-on Workshop is up and running. And we can now start to develop our use case of a loyalty program in Flink SQL.
End of prerequisites, continue with LAB 1.