A step by step tutorial for capturing live web service traffic using Datumize Data Collector (DDC) and Datumize Data Dumper (DDD), then process each web service call to extract metrics, and produce a file with persisted data to be visualized. This tutorial provides the basic concepts to understand how to using Datumize products to utilize network traffic by doing network sniffing and deep packet inspection.

This tutorial currently is only supported in Java version 11.

The Sniffing Web Services Sandbox contains just one source that will be the files generated from the traffic capture. This traffic capture will contain Web Service traffic on it generated by a client-server API. Take in to account that this is a traffic generator and not real webservices. From this traffic file the DDC will be then be able to parse the traffic into something understandable for human beings.


In this tutorial you will learn: 

  • Use Datumize Data Dumper (DDD) to capture network traffic
  • Design a pipeline that captures network traffic in PCAP format.
  • How to parse data from Web Services.
  • Sink the data to a file into a CSV format. 


Make sure to comply with the following requirements.

List of requirements

Understand Datumize Sandbox

If not, please check Datumize Sandbox section.

Download and Start Datumize Sandbox

Download the Datumize Traffic Generator traffic-generator.zip

  1. Unzip the traffic-generator.zip
  2. Change directory to inside the folder of the generator and execute the start:

    bash start-traffic-generator.sh
  3. Now the traffic generator will be up and running.

Check Web Service traffic generation

Check the traffic that we have in this machine. In this scenario we simulate real Web Service traffic from one port in the localhost to another port in localhost. To check the current traffic we execute tcpdump:

sudo tcpdump -i any -A -vvv | grep POST

The output of this will be something like this:

.	...	..POST /api/call HTTP/1.1
.	.v.	.vPOST /api/call HTTP/1.1
.	...	..POST /api/call HTTP/1.1
.	.v.	.vPOST /api/call HTTP/1.1
.	...	..POST /api/call HTTP/1.1

This shows us that we have traffic in the machine and you will be able to capture it by using the DDD and process it by using the DDC.

Configure and Deploy DDD

In order to configure a DDD you must first create a machine.

First of all, create a new Deployment plan and then go to Infrastructure tab and click on Add machine. Once your machine is created and bootstrapped you can proceed with configuring the DDD.

Bootstrap the machine as shown in the Bootstrap Datumize software installation.

Click the Add instance button:

Select DDD and give your DDD instance a name.

Other fields can be left with the default values.

The Deploy machine button should now be activated.

Once deployed and after Apply Changes your DDD instance should look like this:

Check output of DDD

The datumize main folder is always /opt/datumize/, so all can be found under that one. Check the generated pcaps:

ls -alt /opt/datumize/pcap

-rw-r--r-- 1 datumize datumize  778070 May 22 15:18 2020-05-22_15-18-08.pcap
-rw-r--r-- 1 datumize datumize  811149 May 22 15:18 2020-05-22_15-17-48.pcap
-rw-r--r-- 1 datumize datumize  766058 May 22 15:17 2020-05-22_15-17-28.pcap
-rw-r--r-- 1 datumize datumize  812000 May 22 15:17 2020-05-22_15-17-08.pcap
-rw-r--r-- 1 datumize datumize  774565 May 22 15:17 2020-05-22_15-16-48.pcap
-rw-r--r-- 1 datumize datumize 1042549 May 22 15:16 2020-05-22_15-16-28.pcap

If your DDC pipeline is not deployed and running your DDD will fill the output folder with pcaps, you can stop the DDD instance and restart it later.

If you want to fully understand how the traffic generator works, please expand this section.

  1. This step is done by chef through the Zentral deployment step and is optional it is recommended to skip to Configure and deploy DDD . As an optional step you can start the product DDD manually by executing this:

    # /opt/datumize/dtzdump/dtzdump.sh start sandbox
    DTZ_DUMP_HOME defined: /opt/datumize/dtzdump
    NIC: any
    User: root:root
    Rotate: 20
    Size: 5120
    Buffer size: 8192
    RAM file: /opt/datumize/pcap
    Sleep: 5
    Log: /opt/datumize/dtzdump/log/dtzdump.log
    Output: /opt/datumize/pcap
    Backup: /opt/datumize/dtzdump/backup
    Percent Limit: 50
    Capture started [116]
    Processing started [123]
  2. After that, check that the pcaps are being captured: 

    # ls -alt /opt/datumize/pcap/in
    -rw-r--r-- 1 root root 498309 Feb  6 11:26 2020-02-06_11-26-34.pcap
    drwxr-xr-x 2 root root   4096 Feb  6 11:26 .
    -rw-r--r-- 1 root root 782681 Feb  6 11:26 2020-02-06_11-26-14.pcap
    -rw-r--r-- 1 root root 774934 Feb  6 11:26 2020-02-06_11-25-54.pcap
    drwxr-xr-x 1 root root   4096 Feb  6 11:25 ..
  3. If you do a less of one of the capture files, you will be able to see the traffic inside. Execute this command: 

    less 2020-02-06_11-26-34.pcap
  4. You will be able to see something similar to the following example. For example, here you can identify the headers of the request and response of the webservice traffic. The content of this request is not identifiable because it is gziped but demonstrates the power of the DDC: 

    POST /api/call HTTP/1.1
    id: d789550d4c37-2-1580988734948-ovwc
    Content-Length: 1176
    Content-Encoding: gzip
    Host: localhost:8090
    Connection: Keep-Alive
    User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_242)
    Accept-Encoding: gzip,deflate
    ^_<8B>^H^@^@^@^@^@^@^@<AD>Z]n<9B>G^L<BC><8A><A1><F7>&˿<E5>^R<90><9D><B7><9E> =<80>^Z<A9><85>^AG*,<DB>ho<9F>5
    '^AJ<BA><FC><88>}<93>^M<D9><C4>r<C9><E1><CC>p<F7><9F><FE><FE><FA>p<F3>rz<BC><DE>_η;<F8><D0>v7<A7><F3><97><CB><F1><FE><FC><E7><ED><EE><B7>Ͽ<FE>2v7ק<C3><F9>xx<B8><9C>O<B7><BB>^?N<D7>ݧ<BB><FD><E5><AF><D3><E3><E1>i<FE><CD><F5><A7><CF>7^?<<9F><BF><DC><EE><8E><F7>/<BB><B7><DF>^^^<E1>n<FF>rxx>݁<89><8C>^Om<FF><F1><DF>^_<F7>^_^?<FA><C6><DB>G|<FB>2     ^M<F4><BE><8C><DF>?π<FF><8D>}8^^<BD><D8>,<DD><FD>o^h^X<82><AD>^P<FA><EB><F3><C3>ӽ^W|^L<E5>l<F0><A1><C3>^V^^ESC^Yܣ<B8>)g4)ľ><FF><EE>^7^K
    <B1><D1>6<C4>^F^W^?<FF>'t^D<E7>̚<BE><AD><82>-a<A1>3J<FE><C2>aaƩ[K^G^^̕^F^K2<CE>&<96>F5^\^B<EE><F5>T3^N<DC><F2>cL^L<DC>$կ<9B>Z<BA>΁+e^^!^Kvʏ1U^_~k<B5>^F^Dy4^W<ED>K<EF><9B>x<96>z<BA><BF>[<EB>^K<B1>^E<9B>居FǕIs^B,ux^D<E7><D4>4^M.(8*#4B^Wᑏ^MmT<EE>;<E6><A9>><DB>wk^M^A+<E3>;<AA><B5><81><98>GU<98>,gi<97>u^^<F9>J^W<B1><85>7<8E<8D>9}r@Ņ<B1><A7><U+0600>t<97><A9><94>^P<FD><9D>QF<9A><E6><E8><E8>+<C7>*<A2>c<BE>ɸ^E<A4><B6>8ɔ0]j*<B8>2t׼(<C2>h<F0><94><C9><C3><D8><E0>;<A0><D0>BMD^L<90>^F6<9A>ڠB<92><83>Z<DB>0A<A5><97><D0><<CE>8^OKs^G<9A><92>u<AD><DF>^B<F9>^A<FE>J<B1>*<EA> <C6>^V˻^^<Z<A9><C1><DF>^A6<B1><BC><FC>7+Iѐ?4JǶ^
    ^@^A:ESC^@^A:^ZHTTP/1.1 200 
    vary: accept-encoding
    Content-Encoding: gzip
    Content-Type: text/plain;charset=UTF-8
    Transfer-Encoding: chunked
    Date: Thu, 06 Feb 2020 11:32:14 GMT
  5. Once you check it and you know that the capture is working.

Build the Web Services Pipeline

If you feel lost about using Datumize Zentral, please refer to the available resources in our Getting Started section, including manuals and videos.

The table below summarizes the components used in the pipeline.

Component Type



SourceFilePcapSourceSource to actively request an API from intervals of time (polling).
ProcessorHTTPDialogGroupProcessorGroups the HTTP dialogs.
ProcessorHTTPAssemblerProcessorAssembly all the packets after grouping.
ProcessorWSParserProcessorParses the Web Service dialog into objects to be easy to manipulate.
ProcessorCookProcessorExecute an operation on the input data to obtain an output data (called dish)
ProcessorSerializerProcessorTranslates data structures into a format that can be stored. There are multiple formats available.
ProcessorComposerProcessorPre-preparation to finally sink the data into some file format.
SinkFileSinkStores records into a JDBC compatible database.

Drag the required components from the Pallet to the Workbench and join with a Single Memory Stream.

The table below summarizes the properties to configure the components.

FilePcapSource component:

Field NameValueRequired

Directory Base



Filternot set*

File Pattern*.pcap*

File Suffix On Success

not set*

File Suffix On Errornot set*

File SortNAME*

File Age0s*

HTTPDialogGroupProcessor component:

Field NameValueRequired

Server Pattern







Client Patternnot set*

HTTPAssemblerProcessor component:

Field NameValueRequired


http.rq.resource contains '/api/call'


WSParserProcessor component:

Field NameSelectedContent typeRequired

Parsers (add element)Xml map incomplete deserializerxml*

Parsers (add element)Xml map incomplete desarializerplain*

CookProcessor component:

Please note the code in the cook processor is very important for correctly enabling this tutorial.

Field NameValueRequired

Operation Parameters

not set


if (input.containsKey('response')) {
    rq = input.get('request')
    ops = rq.get('operations').asList()
    output.count = ops.size()

    if (ops.size() > 0 && ops.get(1) != null && ops.get(1).get('operation') != null){
            output.firstop = ops.get(1).get('operation').get('func').value()

    rs = input.get('response')
    rsops = rs.get('operations').asList()
    if (rsops.get(1) != null && rsops.get(1).get('operation') != null && rsops.get(1).get('operation').get('result') && rsops.get(1).get('operation').get('result').get('value') != null ){
        output.firstresult = rsops.get(1).get('operation').get('result').get('value').value()


SerializerProcessor component:

Field NameValueRequired

SerializerMap json serializer


ComposerProcessor component:

Field NameSelectedValueRequired

RulesRule recordsMax records: 20*

Key ExtractorFixed extractorKey: k*

Window-not set

Time Rate-not set

Header-not set

Footer-not set

Combiner-not set

FileSink component:

Field NameValueRequired

Directory Base



File Pattern%{uuid}*

Directory Pattern%{Year}%{Month}%{Day}/%{Hour}%{Minute}*

Closed File Suffixnot set*

SerializerNo serializer*

Test the Pipeline

To test the pipeline you have created please see the guide for Testing a Pipeline for a more in depth overview of the steps necessary to test your pipeline.

For this tutorial, in the pipeline editor page you can select Run from: Beginning. Your Input and Output should show your Input, and Output results if successful. 

Deploy the Pipeline to DDC Instance

In Zentral, you will only need to have one machine, one instance and one Pipeline. For more on using the Infrastructure management tool please see the Zentral guide to Infrastructure Deployment

This pipeline will be deployed with the default  DDC Runtime Policy.

Check the Expected Output

  1. As long  as the output is configured as a csv, you will have to check the output folder where it contains the resulting csv's:

    ls -alt /opt/datumize/resources/
    # ls -alt /opt/datumize/resources/csv/20200206/1127/
    total 28
    drwxr-xr-x 4 root root  4096 Feb  6 11:29 ..
    drwxr-xr-x 2 root root  4096 Feb  6 11:28 .
    -rw-r--r-- 1 root root   234 Feb  6 11:28 73d96893-7915-49c9-836d-2cf1198d0c30
    -rw-r--r-- 1 root root 15426 Feb  6 11:27 26d430d9-be5a-4b3d-8f95-602fe9f9bfa0
  2. Check inside the csv by using less:

    # less /opt/datumize/resources/csv/20200206/1127/73d96893-7915-49c9-836d-2cf1198d0c3

    The file contains the following that is the result of the webservices dialogs captured by the DDD and processed by the DDC:


Stop All


  1. By using Zentral you can decide to stop the process at any time just by 


  1. Now, to finish, you can stop all the services that have ran in the machine. First stop the DDD and then the DDC:

    /opt/datumize/dtzdump/dtzdump.sh stop sandbox
    /opt/datumize/ddc/bin/ddc stop ws
  2. To stop the traffic generator, just execute this in the its folder:

    bash stop-traffic-generator.sh