Difference between revisions of "IoT Data Ingestion"
Eric.liang (talk | contribs) (Created page with " Category:Editor") |
Angie.huang (talk | contribs) m (Updates) |
||
(13 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
+ | = Azure = | ||
+ | In this section, we will introduce steps to insert data from an Azure IoT Hub to different Azure data stores. We assume that device messages are already received by an Azure IoT Hub. If not, you can refer to [http://ess-wiki.advantech.com.tw/view/Protocol_Connector#Azure Protocol Converter with Node-RED section], which is designed to ingest IoT data from edge devices to the Azure IoT Hub. | ||
+ | When an Azure IoT Hub receives messages, we use Azure Stream Analytics job to dispatch the received data to other Azure services. In this case, we will ingest data into Azure CosmosDB and Azure Blob Storage. | ||
− | [[ | + | The IoT Hub we're using to receive messages here is named '''"azurestart-hub"'''. All Azure resources used in this section are stored in single Azure resource group named '''"azurestart-rg"'''. |
+ | |||
+ | [[File:IoTDataIngestion-az.png|center|750px|iot data ingest flow]] | ||
+ | |||
+ | == Create storage to save ingested IoT data == | ||
+ | |||
+ | Step 1: Login to [https://portal.azure.com Azure portal] and create an Azure CosmosDB | ||
+ | |||
+ | *Search "azure cosmos db" and create new one | ||
+ | |||
+ | [[File:Azure-cosmos-new.png|center|1000px|create cosmos DB]] | ||
+ | |||
+ | *ID: '''azurestart-cosmos''' | ||
+ | *API: select '''"SQL"''' | ||
+ | *Resource Group: click '''"Use existing"''' and select '''"azurestart-rg"''' | ||
+ | |||
+ | [[File:Azure-cosmos-detail.png|center|450px|cosmos DB settings]] | ||
+ | |||
+ | Step 2: Create an Azure Blob Storage | ||
+ | |||
+ | *Search “storage account” and create new one | ||
+ | |||
+ | [[File:Azure-blob-new.png|center|1000px|create blob storage]] | ||
+ | |||
+ | *Name: '''azurestartblob''' | ||
+ | *Secure transfer required: '''Disabled''' | ||
+ | *Resource Group: click '''"Use existing"''' and select '''"azurestart-rg"''' | ||
+ | |||
+ | [[File:Azure-blob-detail.png|center|450px|blob storage settings]] | ||
+ | |||
+ | == Connect Azure IoT hub to Azure storage == | ||
+ | |||
+ | Step 1: Login to Azure portal and create a Stream Analytics job | ||
+ | |||
+ | *Search "stream analytics" and create new one | ||
+ | |||
+ | [[File:Azure-sa-new.png|center|1000px|create sa]] | ||
+ | |||
+ | *Job name: '''azurestart-sa''' (for example) | ||
+ | *Resource Group: click '''"Use existing"''' and select '''"azurestart-rg"''' | ||
+ | *Hosting environment: '''Cloud''' | ||
+ | |||
+ | [[File:Azure-sa-detail.png|center|450px|sa settings]] | ||
+ | |||
+ | Step 2: Set input of stream analytics job | ||
+ | |||
+ | *Open "azurestart-sa" created at last step: '''Resource groups''' → '''azurestart-rg''' → '''azurestart-sa''' | ||
+ | |||
+ | [[File:Azure-sa.png|center|1000px|open sa]] | ||
+ | |||
+ | Step 2.1: Add IoT hub input | ||
+ | |||
+ | *Click '''"Inputs"''' | ||
+ | |||
+ | [[File:Azure-sa-input.png|center|1000px|open sa input]] | ||
+ | |||
+ | *'''"Add stream input"''' → '''"IoT Hub"''' | ||
+ | |||
+ | [[File:Azure-sa-input-add-iothub.png|center|1000px|sa add input: iot hub]] | ||
+ | |||
+ | *Input alias: '''iothub''' | ||
+ | *Choose '''"Select IoT Hub from your subscriptions"''' | ||
+ | *IoT hub: '''azurestart-hub''' | ||
+ | *Endpoint: '''Messaging''' | ||
+ | *Shared access policy name: '''iothubowner''' | ||
+ | *Encoding: '''UTF-8''' | ||
+ | |||
+ | [[File:Azure-sa-input-add-iothub-detail.png|center|450px|sa input detail: iot hub]] | ||
+ | |||
+ | Step 3: Set outputs of stream analytics job | ||
+ | |||
+ | *Open "azurestart-sa" created at last step: '''Resource groups''' → '''azurestart-rg''' → '''azurestart-sa''' | ||
+ | |||
+ | [[File:Azure-sa.png|center|1000px|open sa]] | ||
+ | |||
+ | Step 3.1: Add blob storage output | ||
+ | |||
+ | *Click '''"Outputs"''' | ||
+ | |||
+ | [[File:Azure-sa-output.png|center|1000px|open sa ouput]] | ||
+ | |||
+ | *'''"Add"''' → '''"Blob storage"''' | ||
+ | |||
+ | [[File:Azure-sa-output-add-blob.png|center|1000px|sa add output: blob]] | ||
+ | |||
+ | *Output alias: '''blob''' | ||
+ | *Choose '''"Select Blob storage from your subscriptions"''' | ||
+ | *Storage account: '''azurestartblob''' | ||
+ | *Container: choose '''"Create new"''' and input '''"iot-data-ingestion"''' | ||
+ | *Path pattern: '''{date}/{time}''' | ||
+ | *Date format: '''YYYY-MM-DD''' | ||
+ | |||
+ | [[File:Azure-sa-output-add-blob-detail.png|center|450px|sa output detail: blob]] | ||
+ | |||
+ | Step 3.2: Add cosmos DB output | ||
+ | |||
+ | *'''"Add"''' → '''"Cosmos DB"''' | ||
+ | |||
+ | [[File:Azure-sa-output-add-cosmosdb.png|center|1000px|sa add output: cosmos DB]] | ||
+ | |||
+ | *Output alias: '''cosmosdb''' | ||
+ | *Choose '''"Select Cosmos DB from your subscriptions"''' | ||
+ | *Account id: '''azurestart-cosmos''' | ||
+ | *Database: choose '''"Create new"''' and input '''"iot-data-ingestion"''' | ||
+ | *Collection name pattern: '''historic-data''' | ||
+ | |||
+ | [[File:Azure-sa-output-add-cosmosdb-detail.png|center|450px|sa output detail: cosmos DB]] | ||
+ | |||
+ | Step 4: Set query of stream analytics job and start it | ||
+ | |||
+ | *Open "azurestart-sa" created at last step: '''Resource groups''' → '''azurestart-rg''' → '''azurestart-sa''' | ||
+ | |||
+ | [[File:Azure-sa.png|center|1000px|open sa]] | ||
+ | |||
+ | *Click '''"Edit query"''' | ||
+ | |||
+ | [[File:Azure-sa-query.png|center|1000px|open sa query]] | ||
+ | |||
+ | *Copy following stream analytics query statements, paste to the text area and click '''"Save"''' | ||
+ | <pre> SELECT * INTO [blob] FROM [iothub] | ||
+ | SELECT * INTO [cosmosdb] FROM [iothub] | ||
+ | </pre> | ||
+ | |||
+ | [[File:Azure-sa-query-detail.png|center|1000px|sa query detail]] | ||
+ | |||
+ | *Back to '''"azurestart-sa"''' page, click '''"Start'''", then select '''"Now"''' for job output start time. | ||
+ | *Click '''"start"''' to run stream analytics job, it may take a few seconds. | ||
+ | |||
+ | [[File:Azure-sa-start.png|center|1000px|sa start]] | ||
+ | |||
+ | == Verify the saved data in Azure storage == | ||
+ | |||
+ | Make sure the Azure IoT hub is still received data from IoT Devices. | ||
+ | |||
+ | Step 1: Confirm data in Azure blob storage | ||
+ | |||
+ | *Go to "azurestart-blob": '''Resource groups''' → '''azurestart-rg''' → '''azurestart-blob''' | ||
+ | |||
+ | [[File:Azure-blob-1.png|center|1000px|blob storage list-1]] | ||
+ | |||
+ | *'''"Containers"''' → '''"iot-data-ingestion"''', we can see a folder in container is named as pattern "YYYY-MM-DD". The ingested data will be saved in files under folder {YYYY-MM-DD}/{HH} | ||
+ | |||
+ | [[File:Azure-blob-2.png|center|1000px|blob storage list-2]] | ||
+ | |||
+ | *Click the context menu icon '''"…"''' → '''"Edit"''' | ||
+ | |||
+ | [[File:Azure-blob-display-1.png|center|1000px|blob storage display-1]] | ||
+ | |||
+ | *We can see the received messages are saved line by line in the selected .json file | ||
+ | |||
+ | [[File:Azure-blob-display-2.png|center|1000px|blob storage display-2]] | ||
+ | |||
+ | Step 2: Confirm data in Azure cosmos DB | ||
+ | |||
+ | *Go to "azurestart-cosmos": '''Resource groups''' → '''azurestart-rg''' → '''azurestart-cosmos''' | ||
+ | |||
+ | [[File:Azure-cosmos.png|center|1000px|azure cosmos]] | ||
+ | |||
+ | *'''Data Explorer''' → '''historic-data''' → '''Documents''': Each document in Cosmos DB represents one message received by Azure IoT hub. Click “id” to show the message content. | ||
+ | |||
+ | [[File:Azure-cosmos-display.png|center|1000px|azure cosmos display]] | ||
+ | |||
+ | = AWS = | ||
+ | |||
+ | In this section, you can get experience using the AWS IoT Rule Engine to insert data to different AWS storage. You can refer to refer to [http://ess-wiki.advantech.com.tw/view/Protocol_Connector#AWS Protocol Converter with Node-RED section] which is designed to ingest IoT data to AWS IoT. | ||
+ | |||
+ | |||
+ | |||
+ | When you receive IoT data from AWS IoT. You can use Rule Engine to connect to another AWS service. In this case we will send IoT data to AWS S3 and DynamoDB | ||
+ | |||
+ | [[File:IoTDataIngestion.png|center|IoTDataIngestion.png]] | ||
+ | |||
+ | Step 1. Go to the AWS IoT console and click '''"Act"''' and click '''"Create a rule"'''. | ||
+ | |||
+ | [[File:2018-03-08 153047.png|center|1000px|2018-03-08_153047.png]] | ||
+ | |||
+ | Step 2. Enter {your rule name} and {description} | ||
+ | |||
+ | [[File:2018-03-09 134506.png|center|1000px|2018-03-09_134506.png]] | ||
+ | |||
+ | Step 3. Configure the rule as follows: Attribute : * Topic Filter: {your AWS IoT publish Topic}. The topic which used in Protocol Converter is '''“protocol-conn/{Device Name}/{Handler Name}”'''. In this case, we use wildcard # to get all message More Topic information can be found at | ||
+ | |||
+ | [[File:2018-03-09 150248.png|center|1000px|2018-03-09_150248.png]] | ||
+ | |||
+ | Step 4. Click '''"Add action" '''to store message in S3 bucket. Select '''“Store messages in an Amazon S3 bucket”'''→click'''“Configure action”''' more rule engine information can be found at [[File:2018-03-09 150946.png|center|1000px|2018-03-09_150946.png]] | ||
+ | |||
+ | Step 5. Under "configure action", you need to choose a S3 bucket. If you don’t have any one, you can click '''“Create a new resource” '''to create one. In this case, we store data to json format and assort by data and hour. You can use SQL wildcard '''parse_time ()''' and '''timestamp() '''to assort store folder and using '''newuuid() '''as filename. | ||
+ | |||
+ | ${parse_time("yyyy-MM-dd", timestamp())}/${parse_time("HH", timestamp())}/${newuuid()}.json | ||
+ | |||
+ | More AWS IoT SQL Reference information can be found at | ||
+ | |||
+ | You need to choose a role which has permission can access AWS S3. After finishing the settings, click "Update". [[File:2018-03-09 151301.png|center|1000px|2018-03-09_151301.png]] Step 6. Click'''“add action”''' to connect DynamoDB [[File:2018-03-09 155336.png|center|1002px|2018-03-09_155336.png]] Step 7. Select'''“Insert a message into a DynamoDB Table”'''→click'''“configure action”''' [[File:2018-03-09 160050.png|center|1003px|2018-03-09_160050.png]] Step 8. Choose a DynamoDB table. If don't have DynamoDB table, you can click'''“Create a new resource” '''to create a table. In this case. We create a DynamoDB table called '''”IoT_Data_Ingestion”''' with a primary key ''' “uuid”.''' Step 9. Choose '''“IoT_Data_Ingestion” '''and enter Hash key value and Write message data to this column. Hash key value '''${newuuid()} '''Write message data to this column payload | ||
+ | |||
+ | you need to choose a role which has permission can access AWS DynamoDB. After setting click update. [[File:2018-03-09 160213.png|center|1004px|2018-03-09_160213.png]] Step 10. After finishing setting click '''“Create rule”''' [[File:2018-03-09 160808.png|center|1005px|2018-03-09_160808.png]] Step 11. Now you can publish your message and check S3 and DynamoDB. [[File:2018-03-09 161209.png|center|1006px|2018-03-09_161209.png]] [[File:2018-03-09 161526.png|center|1007px|2018-03-09_161526.png]] |
Latest revision as of 17:43, 14 March 2018
Contents
Azure
In this section, we will introduce steps to insert data from an Azure IoT Hub to different Azure data stores. We assume that device messages are already received by an Azure IoT Hub. If not, you can refer to Protocol Converter with Node-RED section, which is designed to ingest IoT data from edge devices to the Azure IoT Hub.
When an Azure IoT Hub receives messages, we use Azure Stream Analytics job to dispatch the received data to other Azure services. In this case, we will ingest data into Azure CosmosDB and Azure Blob Storage.
The IoT Hub we're using to receive messages here is named "azurestart-hub". All Azure resources used in this section are stored in single Azure resource group named "azurestart-rg".
Create storage to save ingested IoT data
Step 1: Login to Azure portal and create an Azure CosmosDB
- Search "azure cosmos db" and create new one
- ID: azurestart-cosmos
- API: select "SQL"
- Resource Group: click "Use existing" and select "azurestart-rg"
Step 2: Create an Azure Blob Storage
- Search “storage account” and create new one
- Name: azurestartblob
- Secure transfer required: Disabled
- Resource Group: click "Use existing" and select "azurestart-rg"
Connect Azure IoT hub to Azure storage
Step 1: Login to Azure portal and create a Stream Analytics job
- Search "stream analytics" and create new one
- Job name: azurestart-sa (for example)
- Resource Group: click "Use existing" and select "azurestart-rg"
- Hosting environment: Cloud
Step 2: Set input of stream analytics job
- Open "azurestart-sa" created at last step: Resource groups → azurestart-rg → azurestart-sa
Step 2.1: Add IoT hub input
- Click "Inputs"
- "Add stream input" → "IoT Hub"
- Input alias: iothub
- Choose "Select IoT Hub from your subscriptions"
- IoT hub: azurestart-hub
- Endpoint: Messaging
- Shared access policy name: iothubowner
- Encoding: UTF-8
Step 3: Set outputs of stream analytics job
- Open "azurestart-sa" created at last step: Resource groups → azurestart-rg → azurestart-sa
Step 3.1: Add blob storage output
- Click "Outputs"
- "Add" → "Blob storage"
- Output alias: blob
- Choose "Select Blob storage from your subscriptions"
- Storage account: azurestartblob
- Container: choose "Create new" and input "iot-data-ingestion"
- Path pattern: {date}/{time}
- Date format: YYYY-MM-DD
Step 3.2: Add cosmos DB output
- "Add" → "Cosmos DB"
- Output alias: cosmosdb
- Choose "Select Cosmos DB from your subscriptions"
- Account id: azurestart-cosmos
- Database: choose "Create new" and input "iot-data-ingestion"
- Collection name pattern: historic-data
Step 4: Set query of stream analytics job and start it
- Open "azurestart-sa" created at last step: Resource groups → azurestart-rg → azurestart-sa
- Click "Edit query"
- Copy following stream analytics query statements, paste to the text area and click "Save"
SELECT * INTO [blob] FROM [iothub] SELECT * INTO [cosmosdb] FROM [iothub]
- Back to "azurestart-sa" page, click "Start", then select "Now" for job output start time.
- Click "start" to run stream analytics job, it may take a few seconds.
Verify the saved data in Azure storage
Make sure the Azure IoT hub is still received data from IoT Devices.
Step 1: Confirm data in Azure blob storage
- Go to "azurestart-blob": Resource groups → azurestart-rg → azurestart-blob
- "Containers" → "iot-data-ingestion", we can see a folder in container is named as pattern "YYYY-MM-DD". The ingested data will be saved in files under folder {YYYY-MM-DD}/{HH}
- Click the context menu icon "…" → "Edit"
- We can see the received messages are saved line by line in the selected .json file
Step 2: Confirm data in Azure cosmos DB
- Go to "azurestart-cosmos": Resource groups → azurestart-rg → azurestart-cosmos
- Data Explorer → historic-data → Documents: Each document in Cosmos DB represents one message received by Azure IoT hub. Click “id” to show the message content.
AWS
In this section, you can get experience using the AWS IoT Rule Engine to insert data to different AWS storage. You can refer to refer to Protocol Converter with Node-RED section which is designed to ingest IoT data to AWS IoT.
When you receive IoT data from AWS IoT. You can use Rule Engine to connect to another AWS service. In this case we will send IoT data to AWS S3 and DynamoDB
Step 1. Go to the AWS IoT console and click "Act" and click "Create a rule".
Step 2. Enter {your rule name} and {description}
Step 3. Configure the rule as follows: Attribute : * Topic Filter: {your AWS IoT publish Topic}. The topic which used in Protocol Converter is “protocol-conn/{Device Name}/{Handler Name}”. In this case, we use wildcard # to get all message More Topic information can be found at
Step 4. Click "Add action" to store message in S3 bucket. Select “Store messages in an Amazon S3 bucket”→click“Configure action” more rule engine information can be found atStep 5. Under "configure action", you need to choose a S3 bucket. If you don’t have any one, you can click “Create a new resource” to create one. In this case, we store data to json format and assort by data and hour. You can use SQL wildcard parse_time () and timestamp() to assort store folder and using newuuid() as filename.
${parse_time("yyyy-MM-dd", timestamp())}/${parse_time("HH", timestamp())}/${newuuid()}.json
More AWS IoT SQL Reference information can be found at
You need to choose a role which has permission can access AWS S3. After finishing the settings, click "Update". Step 6. Click“add action” to connect DynamoDB Step 7. Select“Insert a message into a DynamoDB Table”→click“configure action” Step 8. Choose a DynamoDB table. If don't have DynamoDB table, you can click“Create a new resource” to create a table. In this case. We create a DynamoDB table called ”IoT_Data_Ingestion” with a primary key “uuid”. Step 9. Choose “IoT_Data_Ingestion” and enter Hash key value and Write message data to this column. Hash key value ${newuuid()} Write message data to this column payload you need to choose a role which has permission can access AWS DynamoDB. After setting click update. Step 10. After finishing setting click “Create rule” Step 11. Now you can publish your message and check S3 and DynamoDB.