Tutorial: Wi-Fi and IoT Data
A step by step tutorial to capture technical metrics from different Wi-Fi Access Points, simulated as a HTTP API endpoints. This tutorial is also useful to understand how to capture data from the Internet of Things (IoT) world, as many devices can be utilized using similar pipelines. DDC is used to collect technical data and transform into motion metrics (position, movement) that have a business purpose. Metrics will be written to a file.
Objectives
This tutorial contains different resources that you can reuse and extend for additional learning. Feel free to experiment!
The Wi-Fi and IoT Sandbox contains a HTTP API endpoint (any IoT device) we will extract mac, ap_mac and time_stamp and write to a file sink. This tutorial is the base for our more advanced tutorials where we will show you how to enrich the data with processors. In this tutorial you will learn:
- How to collect data from a HTTP API source.
- Write data to a file sink.
Requirements
To complete this tutorial you will need access to our sandbox sources, please contact us if you would like access.
**username** and **password** values need to be replaced with you sandbox details.
Make sure to comply with the following requirements.
Check the API image
Endpoints for the API image that produces WiFI Device and Access Point data.
Login and password information.
Please replace **username** and **password** with details received in Sandbox welcome email.
username | **username** |
password | **password** |
You can test the endpoints with curl or postman.
Login Endpoint
curl --location --request POST 'https://sandbox.datumize.net/api/users/login' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'username=**username**' \
--data-urlencode 'password=**password**' \
-c, --cookie-jar dtzcookie
Devices Endpoint
curl --location --request GET 'https://sandbox.datumize.net/api' \
-b, --cookie dtzcookie
A query to the Devices API outputs mac addresses with access point mac address and current timestamp
{"data":[
{"mac":"3c:2e:ff:b0:0a:44","ap_mac":"14:57:9f:d7:e5:20","time_stamp":1574715521000},
{"mac":"38:53:9c:22:03:09","ap_mac":"50:5d:ac:b8:f8:20","time_stamp":1574715521000}
]}
Set Log Level to Debug
We recommend setting your instance Log policy to Debug.
Please check Setting DDC Log Policy guide.
Build the Wi-Fi Pipeline
If you feel lost about using Datumize Zentral, please refer to the available resources in our Getting Started section, including manuals and videos.
The table below summarizes the components used in the pipeline.
Component Type | Name | Description |
---|---|---|
Source | HTTPApiPollingSource | Source to actively request an API from intervals of time (polling). |
Processor | ExtractorProcessor | Extractor Processor is a processor component aimed at applying an extractor to input content |
Processor | FlattenerAdapter | Flattener Processor is a processor component aimed at flattening a collection returning each element of the collection as a distinct record. |
Processor | ComposerProcessor | Composer with configurable partitioner. |
Sink | FileSink | Writes data to file. |
Drag the required components from the Pallet to the Workbench and join with a Single Memory Stream.
The table below summarizes the properties to configure the HTTPApiPolling Source component.
Field Name | Value | Required | |
---|---|---|---|
Auth Method | CookieAuthMethod | * | |
user | **username** | * | |
password | **password** | * | |
login-endpoint | https://sandbox.datumize.net/api/users/login | * | |
login-body | {"username" : "{{username}}", "password" : "{{password}}"} | * | |
login-content-type | application/json | * | |
cookie-domain | null | * | |
cookie-path | / | * | |
login-TTL | 5m | * | |
Request Builder | Request builder simple get | * | |
endpoint | https://sandbox.datumize.net/api | * | |
Deserializer | Json map deserializer | * |
The table below summarizes the properties to configure the Extractor Processor component.
Field Name | Value | Required | |
---|---|---|---|
Extractor | Field to Map Extractor | * | |
Path | body/data | * | |
Default | null |
The table below summarizes the properties to configure the Composer Processor component.
Field Name | Value | Required | |
---|---|---|---|
Rules | Records Rule | * | |
Max records | 300 | * |
And finally, the table below summarizes the properties to configure the File Sink component.
Field Name | Value | Required | |
---|---|---|---|
Directory Base | /opt/datumize/basic | * | |
Serializer | Map to CSV Serializer | ||
Fields | Enter fields in the order you would like them write to CSV. |
Deploy the Pipeline to DDC Instance
In Zentral, you will only need to have one machine, one instance and one pipeline.
- For more on using the Infrastructure management tool please see the Zentral guide to Deployment Infrastructure.
- Please check the Pipeline Deployment guide
The table below summarizes the properties to be defined in the DDC Runtime Policy for the HTTPApiPolling Source component.
Field Name | Value |
---|---|
Execution Policy Type | |
Type | Always |
Flush After Data | False |
Scheduling Policy Type | |
Type | SLEEP |
Duration | 30s |
Cron | ****** |
Batch | 1 |
Threads | 1 |
Error Policy Type | |
Type | DISCARD |
Max-Retries | 0 |
We will set the HTTPApiPollingSource policy with a duration of 30s. We also need to set Batch and Threads to 1.
Check Expected Output
If you browse the Directory base you will notice CSV files containing mac, ap_mac and time_stamp each file contains 300 records. Folders are based on write time.