A step by step tutorial to capture technical metrics from different Wi-Fi Access Points, simulated as a HTTP API endpoints. This tutorial is also useful to understand how to capture data from the Internet of Things (IoT) world, as many devices can be utilized using similar pipelines. DDC is used to collect technical data and transform into motion metrics (position, movement) that have a business purpose. Metrics will be written to a file.

Objectives

This tutorial contains different resources that you can reuse and extend for additional learning. Feel free to experiment!

The Wi-Fi and IoT Sandbox contains a HTTP API endpoint (any IoT device) we will extract mac, ap_mac and time_stamp and write to a file sink. This tutorial is the base for our more advanced tutorials where we will show you how to enrich the data with processors. In this tutorial you will learn: 

  • How to collect data from a HTTP API source.
  • Write data to a file sink.



Requirements

To complete this tutorial you will need access to our sandbox sources, please contact us if you would like access.

**username** and **password** values need to be replaced with you sandbox details.

Make sure to comply with the following requirements.

Check the API image

 API Source Data

Endpoints for the API image that produces WiFI Device and Access Point data.

Login and password information.

Please replace **username** and **password** with details received in Sandbox welcome email.

username**username**
password**password**

You can test the endpoints with curl or postman.

Login Endpoint

curl --location --request POST 'https://sandbox.datumize.net/api/users/login' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'username=**username**' \
--data-urlencode 'password=**password**' \
-c, --cookie-jar dtzcookie
CODE

Devices Endpoint

curl --location --request GET 'https://sandbox.datumize.net/api' \
-b, --cookie dtzcookie
CODE


A query to the Devices API outputs mac addresses with access point mac address and current timestamp

{"data":[ 
		  {"mac":"3c:2e:ff:b0:0a:44","ap_mac":"14:57:9f:d7:e5:20","time_stamp":1574715521000},
		  {"mac":"38:53:9c:22:03:09","ap_mac":"50:5d:ac:b8:f8:20","time_stamp":1574715521000}
]}
CODE

Set Log Level to Debug

We recommend setting your instance Log policy to Debug.

Please check Setting DDC Log Policy guide.




Build the Wi-Fi Pipeline

If you feel lost about using Datumize Zentral, please refer to the available resources in our Getting Started section, including manuals and videos.

The table below summarizes the components used in the pipeline.

Component TypeNameDescription
SourceHTTPApiPollingSourceSource to actively request an API from intervals of time (polling).
ProcessorExtractorProcessor
ProcessorFlattenerAdapter
ProcessorComposerProcessorComposer with configurable partitioner.
SinkFileSinkWrites data to file.

Drag the required components from the Pallet to the Workbench and join with a Single Memory Stream.

The table below summarizes the properties to configure the HTTPApiPolling Source component.


Field NameValueRequired
Auth MethodCookieAuthMethod*

user**username***

password**password***

login-endpointhttps://sandbox.datumize.net/api/users/login*

login-body{"username" : "{{username}}", "password" : "{{password}}"}*

login-content-typeapplication/json*

cookie-domainnull*

cookie-path/*

login-TTL5m*
Request BuilderRequest builder simple get*

endpointhttps://sandbox.datumize.net/api*
DeserializerJson map deserializer*

The table below summarizes the properties to configure the Extractor Processor component.


Field Name ValueRequired
ExtractorField to Map Extractor*

Pathbody/data*

Defaultnull

The table below summarizes the properties to configure the Composer Processor component.


Field Name ValueRequired
RulesRecords Rule*

Max records300*

And finally, the table below summarizes the properties to configure the File Sink component.


Field Name ValueRequired

Directory Base

/opt/datumize/basic*

Serializer

Map to CSV Serializer

Fields

Enter fields in the order you would like them write to CSV.






Deploy the Pipeline to DDC Instance

In Zentral, you will only need to have one machine, one instance and one pipeline.

The table below summarizes the properties to be defined in the DDC Runtime Policy for the HTTPApiPolling  Source component.

Field NameValue
Execution Policy Type
TypeAlways
Flush After DataFalse
Scheduling Policy Type
TypeSLEEP
Duration30s
Cron******
Batch1
Threads1
Error Policy Type
TypeDISCARD
Max-Retries0

We will set the HTTPApiPollingSource policy with a duration of 30s. We also need to set Batch and Threads to 1.




Check Expected Output

If you browse the Directory base you will notice CSV files containing mac, ap_mac and time_stamp each file contains 300 records. Folders are based on write time.