A Data Analytics Pipeline with Azure and akenza
The first step in the IoT journey is gathering data and gaining basic insights from visualizations. As the project grows, the requirements change and the need for more advanced analytics to uncover the potential of the data arises. This typically includes running complex analytical workloads, joining different data sources, and applying machine learning (ML) techniques.
Akenza serves as the data ingestion layer. It allows users to easily connect and manage a variety of IoT devices using the most common IoT technologies with the cloud and forward the data to the application layer. Thereby it abstracts many of the challenging technicalities of the Internet of Things. With the akenza rule engine and query APIs, basic data analytics such as simple stream processing or time series aggregations can be implemented. For more complex scenarios, we recommend leveraging the power of a tool specifically designed for large-scale data analytics, such as Azure Stream Analytics or Azure Databricks.
Requirements for data analytics typically include:
Scalability: The ability to cope with large data volumes (high data velocity and high data volume)
Cost-efficiency: Low-cost and on-demand analytics
Complex processing: Advanced data processing functionality such as aggregations, time-based window functions, backfilling
Different use cases: Typically data is used by different personas for a variety of different use cases
Data processing tiers
The data journey differs based on the use case and data consumer applications. A common distinction is the following three data processing tiers:
- Hot path 🔥
Process, analyze and display data in real time. Usually, strict latency requirements. e.g. alerting, stream processing
- Warm path 🏖
Process, analyze, and display near-real-time data. e.g. stream processing, time series analysis (hourly or daily aggregation)
- Cold path ❄️
Long-term storage of data. Often time and computation-intensive analyses and batch processing. e.g. historical analyses
Per tier, there are different requirements, tools and users of the data. Normally, a mix of different tools and technologies is required to satisfy the requirements of an IoT solution.
Akenza recommends the following integration options with Azure:
- Hot path 🔥
Typically, we recommend directly using akenza components for the hot path: akenza websocket API, akenza data push with data flow (e.g. Kafka, Azure Events Hub, Webhook), akenza notifications with rule engine (SMS, MS Teams, Mail)
Azure components: Azure Functions, Azure Stream Analytics, Azure HDInsight (Spark, Storm)
- Warm path 🏖
For the warm path a combination of akenza and Azure components based on the project requirements should be used: akenza historical REST API (raw or aggregated), akenza Grafana connector, akenza dashboard builder, akenza data push with data flow (e.g. Kafka, Azure Events Hub, Webhook)
Azure components: Azure Stream Analytics, Azure Time Series Insights
- Cold path ❄️
For the cold path an IoT data analytics pipeline with Azure Databricks or Azure Synapse is recommended.
A reference architecture using Azure components is depicted in the following figure:
Data journey - hot path 🔥
The hot path is usually the easiest to implement in an IoT data analytics pipeline, often with the use of no- or low-code components. Our recommendation is to use the akenza Rule Engine or REST API for implementation. Azure services such as Azure Functions and Azure Stream Analytics can also be utilized in the hot path for data enrichment and real-time alerting and monitoring.
Data journey - warm path 🏖️
For the warm path, the implementation can vary based on the use case and may involve downsampling of the data based on temporal or spatial attributes. Azure components such as Azure Stream Analytics and Azure Time Series Insights can be used for processing data.
This type of data analytics pipeline is suitable for analyzing real-time data.
As an example, see below an hourly aggregation of temperature data:
Data journey - cold path ❄️
The cold path can be challenging to implement, and requires careful planning with regards to standards, tools, and automation. Ideally, ownership of the tooling should reside with the data science team if possible. Often, linking the cold and warm/hot paths is a business requirement. An API abstraction layer can be utilized to leverage the outputs. Alternatively, an external visualization tool can be used to directly display the resulting data.
Building an efficient IoT data analytics pipeline with Azure and akenza
The different data processing tiers become relevant as the IoT project grows. In most cases, different tools are needed to cover the different requirements:
Real-time monitoring and alerting on the Hot Path 🔥
Downsampling/aggregations on the Warm Path 🏖️
Analyses on large data sets on the Cold Path ❄️
The Azure ecosystem provides an easy entry point to build scalable solutions quickly (e.g. Azure Stream Analytics, Azure Databricks). Akenza can keep the IoT part as simple as possible with advanced connectivity and device management, as well as data processing and integrates well with Azure.