How to use stubs to inject sample data into your pipeline, and debug the pipeline using breakpoints, stubs, discover and console. Please keep in mind that console debugging is currently supported for helloworld, filesource, filechunksource, pcap, in combination with any processor, and a sink of filesink, or loggersink only. Additional resources will be added in future releases.

Objectives

In this section you will learn:

  • How to create a Stub to be used as data sample.
  • How to use Breakpoints to debug the pipeline.
  • How to use add a Stub to improve testing and pipeline reliability.
  • Run data through your pipeline an analyze using the discover and console.


Overview

Data pipelines might get really complex. Always test your pipelines before deployment.

Once a pipeline has been crafted, and before it is ready to deploy to any environment, it is always a good idea to make sure the pipeline works as you want it to. This can be achieved in a number of ways working with a handful of functionalities provided by Datumize Zentral. Please make sure to understand these concepts:

  • Stub: is a data excerpt, a sample of data to be used on the pipeline for testing purposes and is meaningful to at least one component. 
  • Breakpoints: a common concept in computing that defines a temporary stop in a program execution. For a pipeline, it sets a control point to allow executing the pipeline in a controlled way. 
  • Debugging: in computing, the process of carefully analyzing the behavior of your code before being released. For a pipeline, it means to understand how the data is captured, transformed and eventually copied, analyzing the transformations done at each component of the pipeline.


Create or loading a Stub from the Resources Management Page

In the resources management page it is possible to load a Stub, from a previously exported Stub resource, or by creating a new Stub Resource. In the panel on the left of the Resources landing page you will find all of the resources available for your project. If you have created a stub you will find it here in the list.


If, however you want to create a new Stub with the Resources manager, click on New Resource>Stub to activate the New Stub creation tool.

The New Stub creation tool will allow for a sample, or real data extract to be loaded into the application to use for testing, designing, and debugging using the application. 


You can either drag a file of the appropriate data type to the editor or browse for a data file on your local machine with the supported resource type. Once the file is loaded and selected, you will want to name the Stub resource for easy retrieval in the next step. 


Inserting a Stub into a pipeline

After saving and loading the data file, this will now be another resource that can be called from the pipeline editor. 


Clicking on the Source for example, will open a new window for the component editor. In it you will find the Stubs manager, where you can select to Add a Stub as well as Activate a stub. Once your Stub is selected from the previous step, activate the stub, to flag it for use in your current pipeline. Note: You will want to deactivate the Stub resource before deploying into a production scenario, unless you want to use the stub as an additional data resource (uncommon).


Using the Stub in the Debugger

Once you are ready to debug your project navigate to the Discover and Console tabs at the lower portion of the screen. From the Console select the run from beginning function to cycle through your pipeline. You will be able to run from various points in the pipeline by using breakpoints to create more manageable error logs should your pipeline be complex. See the section; Using the Discover and Console for more information about the functions of the debugger.


Inserting Breakpoints

Breakpoints are a useful tool when testing as it allows you to set points in the pipeline flow that will stop the console debugger at any point you might need, either at the input or output of what is essentially a step. When the console, run from : beginning button is used, it will run the debugger through each of the components and look to trigger errors. It is useful to debug each component, potentially separately to identify the root of a misconfiguration, hence setting a Breakpoint at either the end of on component, before the stream, or after.

Simply click on the component, and then toggle the necessary breakpoint on or off. Depending on the component, the breakpoint symbol will appear to alert you to a breakpoint. Once the breakpoints are set, it is time to use the Discover and Console tools.


Using the debugger Discover and Console tools

The pipeline debugger has two distinct tabs, a Discover tab, with an Input and an Output screen, and a Console tab with multiple tools that can help debug each stage of the pipeline. The Console enables you to have a visual status of all of the debugging properties your pipeline flow will run through.

Once a pipeline has been created and it is possible to utilize the Discover and Console tools, selecting the Run from: Beginning which will initiate a debug session. To get a better sense of how this operates, creating a simple Hello World source and Logger Sink joined with a single partition stream, and you will be able to see the Discover console working. By editing the message in the Hello World source to Hello World test, we can validate the debugger operating, and the source functioning as expected. Alternatively, any source component can utilize the appropriate Stub data source. The discover console will run and display the Message in the Input, the Output will be blank in this example.

In the Console tab you will now be able to view all of the Events, Trace, Debug, Info, Warnings, Errors, Fatal issues, and the connection status of the debugger. You can view each tab independently or All at once.

Each subtab in the Console plays an important role in helping debug your pipeline and flow, and will help you better understand the limitations, misconfigurations or data issues in your pipeline. Additionally, once a pipeline is configured correctly, it will serve to be the last check before the pipeline is ready to be published for inclusion in a deployment. Should you have any questions regarding the debugging tool, please Contact Datumize.