Tutorial: Web Services & Network Sniffing
A step by step tutorial for capturing live web service traffic using Datumize Data Collector (DDC) and Datumize Data Dumper (DDD), then process each web service call to extract metrics, and produce a file with persisted data to be visualized. This tutorial provides the basic concepts to understand how to using Datumize products to utilize network traffic by doing network sniffing and deep packet inspection.
Overview
This tutorial currently is only supported in Java version 11.
The Sniffing Web Services Sandbox contains just one source that will be the files generated from the traffic capture. This traffic capture will contain Web Service traffic on it generated by a client-server API. Take in to account that this is a traffic generator and not real webservices. From this traffic file the DDC will be then be able to parse the traffic into something understandable for human beings.
Objectives
In this tutorial you will learn:
- Use Datumize Data Dumper (DDD) to capture network traffic
- Design a pipeline that captures network traffic in PCAP format.
- How to parse data from Web Services.
- Sink the data to a file into a CSV format.
Requirements
Make sure to comply with the following requirements.
Understand Datumize Sandbox
If not, please check Datumize Sandbox section.
Download and Start Datumize Sandbox
Download the Datumize Traffic Generator traffic-generator.zip
- Unzip the traffic-generator.zip
Change directory to inside the folder of the generator and execute the start:
bash start-traffic-generator.sh
BASHNow the traffic generator will be up and running.
Check Web Service traffic generation
Check the traffic that we have in this machine. In this scenario we simulate real Web Service traffic from one port in the localhost to another port in localhost. To check the current traffic we execute tcpdump:
sudo tcpdump -i any -A -vvv | grep POST
The output of this will be something like this:
. ... ..POST /api/call HTTP/1.1
. .v. .vPOST /api/call HTTP/1.1
. ... ..POST /api/call HTTP/1.1
. .v. .vPOST /api/call HTTP/1.1
. ... ..POST /api/call HTTP/1.1
This shows us that we have traffic in the machine and you will be able to capture it by using the DDD and process it by using the DDC.
Configure and Deploy DDD
In order to configure a DDD you must first create a machine.
First of all, create a new Deployment plan and then go to Infrastructure tab and click on Add machine. Once your machine is created and bootstrapped you can proceed with configuring the DDD.
Bootstrap the machine as shown in the Bootstrap Datumize software installation.
Click the Add instance button:
Select DDD and give your DDD instance a name.
Other fields can be left with the default values.
The Deploy machine button should now be activated.
Once deployed and after Apply Changes your DDD instance should look like this:
Check output of DDD
The datumize main folder is always /opt/datumize/, so all can be found under that one. Check the generated pcaps:
ls -alt /opt/datumize/pcap
-rw-r--r-- 1 datumize datumize 778070 May 22 15:18 2020-05-22_15-18-08.pcap
-rw-r--r-- 1 datumize datumize 811149 May 22 15:18 2020-05-22_15-17-48.pcap
-rw-r--r-- 1 datumize datumize 766058 May 22 15:17 2020-05-22_15-17-28.pcap
-rw-r--r-- 1 datumize datumize 812000 May 22 15:17 2020-05-22_15-17-08.pcap
-rw-r--r-- 1 datumize datumize 774565 May 22 15:17 2020-05-22_15-16-48.pcap
-rw-r--r-- 1 datumize datumize 1042549 May 22 15:16 2020-05-22_15-16-28.pcap
If your DDC pipeline is not deployed and running your DDD will fill the output folder with pcaps, you can stop the DDD instance and restart it later.
If you want to fully understand how the traffic generator works, please expand this section.
This step is done by chef through the Zentral deployment step and is optional it is recommended to skip to Configure and deploy DDD . As an optional step you can start the product DDD manually by executing this:
# /opt/datumize/dtzdump/dtzdump.sh start sandbox DTZ_DUMP_HOME defined: /opt/datumize/dtzdump VARIABLES Instance: NIC: any User: root:root Filter: Rotate: 20 Size: 5120 Extra: Buffer size: 8192 RAM file: /opt/datumize/pcap Sleep: 5 Log: /opt/datumize/dtzdump/log/dtzdump.log Output: /opt/datumize/pcap Backup: /opt/datumize/dtzdump/backup Percent Limit: 50 Capture started [116] Processing started [123]
CODEAfter that, check that the pcaps are being captured:
# ls -alt /opt/datumize/pcap/in -rw-r--r-- 1 root root 498309 Feb 6 11:26 2020-02-06_11-26-34.pcap drwxr-xr-x 2 root root 4096 Feb 6 11:26 . -rw-r--r-- 1 root root 782681 Feb 6 11:26 2020-02-06_11-26-14.pcap -rw-r--r-- 1 root root 774934 Feb 6 11:26 2020-02-06_11-25-54.pcap drwxr-xr-x 1 root root 4096 Feb 6 11:25 ..
CODEIf you do a less of one of the capture files, you will be able to see the traffic inside. Execute this command:
less 2020-02-06_11-26-34.pcap
CODEYou will be able to see something similar to the following example. For example, here you can identify the headers of the request and response of the webservice traffic. The content of this request is not identifiable because it is gziped but demonstrates the power of the DDC:
POST /api/call HTTP/1.1 id: d789550d4c37-2-1580988734948-ovwc Content-Length: 1176 Content-Encoding: gzip Host: localhost:8090 Connection: Keep-Alive User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_242) Accept-Encoding: gzip,deflate ^_<8B>^H^@^@^@^@^@^@^@<AD>Z]n<9B>G^L<BC><8A><A1><F7>&˿<E5>^R<90><9D><B7><9E> =<80>^Z<A9><85>^AG*,<DB>ho<9F>5 '^AJ<BA><FC><88>}<93>^M<D9><C4>r<C9><E1><CC>p<F7><9F><FE><FE><FA>p<F3>rz<BC><DE>_η;<F8><D0>v7<A7><F3><97><CB><F1><FE><FC><E7><ED><EE><B7>Ͽ<FE>2v7ק<C3><F9>xx<B8><9C>O<B7><BB>^?N<D7>ݧ<BB><FD><E5><AF><D3><E3><E1>i<FE><CD><F5><A7><CF>7^?<<9F><BF><DC><EE><8E><F7>/<BB><B7><DF>^^^<E1>n<FF>rxx>݁<89><8C>^Om<FF><F1><DF>^_<F7>^_^?<FA><C6><DB>G|<FB>2 ^M<F4><BE><8C><DF>?π<FF><8D>}8^^<BD><D8>,<DD><FD>o^h^X<82><AD>^P<FA><EB><F3><C3>ӽ^W|^L<E5>l<F0><A1><C3>^V^^ESC^Yܣ<B8>)g4)ľ><FF><EE>^7^K <E7>c<A3>X%<E7><C1><B9>ɨ<F7>ll^C<A9><84>^N<8E><8D>2<BA><9B>E/4^Sh<A5><CA><C3>RÙ<F4>|<A1>#",<y^G<D3>|l^P<B7>)j<B1><B9>^CS<BA>ج<E9><C2>&Ccq<D3><E8>Ş)^W<B7>2<AB> <B1><D1>6<C4>^F^W^?<FF>'t^D<E7>̚<BE><AD><82>-a<A1>3J<FE><C2>aaƩ[K^G^^̕^F^K2<CE>&<96>F5^\^B<EE><F5>T3^N<DC><F2>cL^L<DC>$կ<9B>Z<BA>+e^^!^Kvʏ1U^_~k<B5>^F^Dy4^W<ED>K<EF><9B>x<96>z<BA><BF>[<EB>^K<B1>^E<9B>居FǕIs^B,ux^D<E7><D4>4^M.(8*#4B^Wᑏ^MmT<EE>;<E6><A9>><DB>wk^M^A+<E3>;<AA><B5><81><98>GU<98>,gi<97>u^^<F9>J^W<B1><85>7<8E<8D>9}r@Ņ<B1><A7><U+0600>t<97><A9><94>^P<FD><9D>QF<9A><E6><E8><E8>+<C7>*<A2>c<BE>ɸ^E<A4><B6>8ɔ0]j*<B8>2t(<C2>h<F0><94><C9><C3><D8><E0>;<A0><D0>BMD^L<90>^F6<9A>ڠB<92><83>Z<DB>0A<A5><97><D0><<CE>8^OKs^G<9A><92>u<AD><DF>^B<F9>^A<FE>J<B1>*<EA> <C6>^V˻^^<Z<A9><C1><DF>^A6<B1><BC><FC>7+Iѐ?4JǶ^ ^]<C2>^K<FA>^V<8E>^O/<C6>+<9D>^G<D0><F4>^L<ED><83>^V<92>&^R<A2>th<92>V"<C9>ѩ^GPz<8E><99><F2>J<BE>^Fݷ)}<AB><A7><FB><D7>S<9F>%<9A>g<AA>H<9D>^V<96><9A>j<9E>*2P<C9>H~^Gқ<A6>{^LL<DB><D2><E8>h<E2>k^]_^X<A9>O襁^K<FB>m<EB><9E>\<AD>$<84>C<B7>I}<CF><CE>_\<F4><C5>^V^@^M<DF>(<F5>^M<A7>Z<9B>G<EB>^C1<9F>^L<B9>m-%08<C4><EF>ܕ>z<9C>wm <F9>j^G<E6><92>:<8A>l]"<FF>߹<B1><E7><F0>[赡v<CC>3<F5>Q^Z<E3><91>0ESC<DA><F3>K:<9F>蔯<DB>l<C3>D#<90><A5><9C>^QA<FD><EA><F5>}|)AL<94>u^L|b^_<D6><C9>7j<8A>^<9B><F4> ^X^EU^F{^@^@^A^A^H ^@^A:ESC^@^A:^ZHTTP/1.1 200 vary: accept-encoding Content-Encoding: gzip Content-Type: text/plain;charset=UTF-8 Transfer-Encoding: chunked Date: Thu, 06 Feb 2020 11:32:14 GMT a ^_<8B>^H^@^@^@^@^@^@^@ 200 <AC>[<DB>n\G^N<FC>^UC<EF>:n<DE>I@v<9E><BC>_<90>|<80>6<D2>.^L8R`<D9>F<F2><F7>˃@<B9>h<D8>^6Ԁ^^<C6><C6>H<C3><E9>&<8B>UE<9E><9B>^_~<FB><E5>ӛo<F7><9F><9F>>>><BC><BB><82>c\<BD><B9>^?<F8><F9><F1><EE><E
CODE- Once you check it and you know that the capture is working.
Build the Web Services Pipeline
If you feel lost about using Datumize Zentral, please refer to the available resources in our Getting Started section, including manuals and videos.
The table below summarizes the components used in the pipeline.
Component Type | Name | Description |
---|---|---|
Source | FilePcapSource | Source to actively request an API from intervals of time (polling). |
Processor | HTTPDialogGroupProcessor | Groups the HTTP dialogs. |
Processor | HTTPAssemblerProcessor | Assembly all the packets after grouping. |
Processor | WSParserProcessor | Parses the Web Service dialog into objects to be easy to manipulate. |
Processor | CookProcessor | Execute an operation on the input data to obtain an output data (called dish) |
Processor | SerializerProcessor | Translates data structures into a format that can be stored. There are multiple formats available. |
Processor | ComposerProcessor | Pre-preparation to finally sink the data into some file format. |
Sink | FileSink | Stores records into a JDBC compatible database. |
Drag the required components from the Pallet to the Workbench and join with a Single Memory Stream.
The table below summarizes the properties to configure the components.
FilePcapSource component:
Field Name | Value | Required | |
---|---|---|---|
Modify | |||
Directory Base | /opt/datumize/pcap | * | |
default | |||
Filter | not set | * | |
File Pattern | *.pcap | * | |
File Suffix On Success | not set | * | |
File Suffix On Error | not set | * | |
File Sort | NAME | * | |
File Age | 0s | * |
HTTPDialogGroupProcessor component:
Field Name | Value | Required | |
---|---|---|---|
Modify | |||
Server Pattern | 8090 | * | |
default | |||
Timeout | 2s | * | |
Precision | 0ms | * | |
Partitions | 1 | * | |
Client Pattern | not set | * |
HTTPAssemblerProcessor component:
Field Name | Value | Required | |
---|---|---|---|
Modify | |||
Filter | http.rq.resource contains '/api/call' | * |
WSParserProcessor component:
Field Name | Selected | Content type | Required | |
---|---|---|---|---|
Modify | ||||
Parsers (add element) | Xml map incomplete deserializer | xml | * | |
Parsers (add element) | Xml map incomplete desarializer | plain | * |
CookProcessor component:
Please note the code in the cook processor is very important for correctly enabling this tutorial.
Field Name | Value | Required | |
---|---|---|---|
Modify | |||
Operation Parameters | not set | * | |
Operation |
CODE
| * | |
default | |||
Language | groovy | * |
SerializerProcessor component:
Field Name | Value | Required | |
---|---|---|---|
Modify | |||
Serializer | Map json serializer | * |
ComposerProcessor component:
Field Name | Selected | Value | Required | |
---|---|---|---|---|
Modify | ||||
Rules | Rule records | Max records: 20 | * | |
default | ||||
Key Extractor | Fixed extractor | Key: k | * | |
Window | - | not set | ||
Time Rate | - | not set | ||
Header | - | not set | ||
Footer | - | not set | ||
Combiner | - | not set |
FileSink component:
Field Name | Value | Required | |
---|---|---|---|
Modify | |||
Directory Base | /opt/datumize/output | * | |
default | |||
File Pattern | %{uuid} | * | |
Directory Pattern | %{Year}%{Month}%{Day}/%{Hour}%{Minute} | * | |
Closed File Suffix | not set | * | |
Serializer | No serializer | * |
Test the Pipeline
To test the pipeline you have created please see the guide for Testing a Pipeline for a more in depth overview of the steps necessary to test your pipeline.
For this tutorial, in the pipeline editor page you can select Run from: Beginning. Your Input and Output should show your Input, and Output results if successful.
Deploy the Pipeline to DDC Instance
In Zentral, you will only need to have one machine, one instance and one Pipeline. For more on using the Infrastructure management tool please see the Zentral guide to Infrastructure Deployment
This pipeline will be deployed with the default DDC Runtime Policy.
Check the Expected Output
As long as the output is configured as a csv, you will have to check the output folder where it contains the resulting csv's:
ls -alt /opt/datumize/resources/
CODE# ls -alt /opt/datumize/resources/csv/20200206/1127/ total 28 drwxr-xr-x 4 root root 4096 Feb 6 11:29 .. drwxr-xr-x 2 root root 4096 Feb 6 11:28 . -rw-r--r-- 1 root root 234 Feb 6 11:28 73d96893-7915-49c9-836d-2cf1198d0c30 -rw-r--r-- 1 root root 15426 Feb 6 11:27 26d430d9-be5a-4b3d-8f95-602fe9f9bfa0
CODECheck inside the csv by using less:
# less /opt/datumize/resources/csv/20200206/1127/73d96893-7915-49c9-836d-2cf1198d0c3
CODEThe file contains the following that is the result of the webservices dialogs captured by the DDD and processed by the DDC:
100,sub,21985.0 100,div,0.9670221843003413 100,sub,10669.0 100,div,0.7495874063729846 100,add,54708.0 100,div,4.801460823373174 100,sub,34699.0 100,div,0.7584804068202213 100,div,0.39869337979094077 100,add,58887.0 100,multi,234696.0
CODE
Stop All
Automatically
- By using Zentral you can decide to stop the process at any time just by
Manually
Now, to finish, you can stop all the services that have ran in the machine. First stop the DDD and then the DDC:
/opt/datumize/dtzdump/dtzdump.sh stop sandbox /opt/datumize/ddc/bin/ddc stop ws
CODETo stop the traffic generator, just execute this in the its folder:
bash stop-traffic-generator.sh
CODE