Tutorial: Hello World
Your very first Datumize tutorial! Inspired by the best programmers, we couldn’t miss the opportunity to show Datumize in action through the famous Hello World! message.
Overview
In this tutorial, you will go through the basics of creating a simple pipeline in Developer Guide and deploying it to your local machine. Throughout the tutorial you will cover the steps that are required to deliver any real Datumize project, such as:
- Creating and testing a pipeline.
- Configuring some infrastructure to run the pipelines.
- Deploy the pipeline to that infrastructure.
Objectives
In this tutorial you will learn:
- How to use the Datumize Sandbox
- Use Datumize Zentral for the first time
- Create your first DDC Pipeline
- Create your first Deployment Plan
- Say Hello World! using Datumize Data Collector
Requirements
Make sure to comply with following requirements.
- An active Zentral user account
- A machine that can support installation of the DDC
Build the Hello World Pipeline
If you feel lost about using Datumize Zentral, please refer to the available resources in our Getting Started section, including manuals and videos. It is highly recommended you familiarize yourself with several main areas of Zentral before you begin, as you will be Creating a Project, Working with Pipelines and Testing your pipeline.
Let's create our first pipeline.
- From within Zentral you will need to create your first project (Creating your first Project).
- Let's create a DDC Pipeline that looks like the diagram below. A pipeline is generally composed of DDC Source Components, DDC Processor Components and DDC Sink Components, these are joined with Streams. At a minimum a source and a sink is required.
- Indeed, the Hello World pipeline is very simple, and you just need a couple of components:
Component Type | Component | Description |
---|---|---|
Source | Hello Source | Produces a text string (HelloWorld) upon every iteration. |
Sink | Logger Sink | Logs the record into the info log, using the toString() method. |
- Drag the required components from the palette to the canvas, and join them with a stream.
- Click on the Hello Source component to open the property inspector. As the Id is auto-filled and we will not use a Topic in this example, you only need to complete the Message property. Try setting a new message in your own language.
- Click on the Logger Sink component to open the property inspector. As the Id is auto-filled and we will not use a Topic in this example, you don't need to complete any other property.
Test and Publish the Hello World Pipeline
Debugging a pipeline is very powerful. Please see the guide Testing a Pipeline for a more in depth overview of the steps necessary to test your pipeline.
Before considering the pipeline ready for usage, let's do some testing.
- Save your pipeline.
- Click on the Discover tab in the bottom of the page.
- For this tutorial the debugging will be very simple. Just click on Debugger Start.
- You will see that Input panel shows some content. This corresponds to records flowing in the pipeline.
- Play around with different debugging options, and click on Console.
Finally, let's make available the pipeline by clicking on Publish and then back to the Resources page.
Create the Infrastructure to run the Pipeline
You will only need to have one machine (your local machine, or a target test environment), one DDC instance and the Hello World Pipeline we just configured. For more advanced documentation on using the Infrastructure management tool please see the Zentral guide to Infrastructure Deployment.
First, let's provision a valid machine for running Datumize products.
- Under resources select, create a new resource. You will want a new resource called Deployment plan
- Select Infrastructure and Add a machine. Enter a name and description, and select Manual installation type.
- Click on Create.
- You will have downloaded a bootstrapping script to manually install the Datumize agent in the sandbox.
- If you need additional information, please check the Bootstrap Datumize software installation guide.
- Once you have installed the machine you should get something similar to this. Please notice the green bullet in the machine that shows your agent is up and running.
Second, let's create a DDC instance in the machine we just provisioned.
- Click on Add Instance.
- Select Datumize Data Collector (DDC) and set an instance name.
- Leave all other settings for JVM, Version etc remain the same and press create
- Your new DDC instance will appear associated to the new provisioned machine. Please note the grey bullet as DDC is not yet running.
Deploy the Pipeline to DDC Instance
Once we have the infrastructure (machine and instance) provisioned, it's time to deploy the HelloWorld pipeline and run it.
Before running the pipeline, you need to set policies let's adjust a the runtime policy to make the example work as expected.
- Under polices select on add a pipeline
- Select the previously published pipeline, the HelloWorld pipeline will appear, to configure the pipeline runtime policy select default or create add a new DDC pipeline policy
- We want to control how often the Hello Source will run, with a duration that is not too fast, and in batches that are visually acceptable.
- Click on Hello Source, and set Duration to 1s (1 second), and Batch to 2 (if the default is not already set this way)
- Click on Save & Exit to save and finalize the policy edition.
The default policy thread, duration and batch values will be inefficient for some sources, please set them accordingly.
Now it's time to start the real deployment of the pipeline and run it.
- Choose an instance and apply changes
- Datumize software is automatically being provisioned.
Check Expected Output
Let's double-check if the HelloWorld pipeline is running as expected. You should wait until the DDC instance bullet goes green.
- For this tutorial, you can simply go into the DDC installation directory and check the logs.
- For additional information, please check Debugging deployments guide.
2020-02-11 09:31:14,320 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
2020-02-11 09:31:14,825 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
2020-02-11 09:31:15,326 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
2020-02-11 09:31:15,827 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
2020-02-11 09:31:16,328 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
2020-02-11 09:31:16,830 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
2020-02-11 09:31:17,331 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
2020-02-11 09:31:17,833 INFO com.datumize.ddc.component.LoggerSink [Sandbox-Helloworld | LoggerSink_dxz84865z_ | 0] Record: InputRecord [topic=null, data=Hello world!, retries=0, errorFlag=false]
Success! You can see that the message you indicated in Hello Source is printed. The records are being logged in batches every indicated duration. You have completed your first tutorial.