WHITE PAPER
MQTT Platform for AI: Empowering AI with Real-Time Data →

From Data to Decisions: Building Effective IoT Data Stacks

EMQX Team
Oct 22, 2024
From Data to Decisions: Building Effective IoT Data Stacks

The Internet of Things (IoT): A Gateway to a Fully Connected World

With the vision of creating a fully interconnected world, IoT technology has gained increasing attention, especially after the global rollout of 5G in 2018.

IoT applications differ fundamentally from human-centered applications. While we typically interact with apps on devices like personal computers and smartphones, even cutting-edge technologies like AR/XR depend on mobile devices for computing power. IoT, however, operates in a broader ecosystem—its applications span a variety of devices and components depending on the use case.

As IoT continues to grow, so does the complexity of managing the data it generates. To effectively handle the massive streams of data from millions of connected devices, it's essential to understand the underlying data stack. In this blog, we’ll break down the data stack for IoT applications and explore how it works across different real-world scenarios.

  • Smart environments (factories, homes, buildings): These involve various types of devices in one location. The challenge is to merge data from different sources to create valuable insights.
  • Distributed networks (electrical grids, solar farms, wind farms): Devices of the same type are spread over large areas, all monitoring specific conditions remotely.
  • Autonomous devices (connected cars, drones, robots): These devices are designed to operate independently but still require oversight and sometimes remote control.

The Must-Have Qualities of an Ideal IoT Data Stack

An optimal IoT data stack should fulfill several core expectations from business users:

  • Adaptability: The system must be flexible enough to support various devices and continuously evolve to meet new business demands. This is crucial as industrial device standards are still in flux.
  • Scalability: Most IoT projects begin as small pilot programs, gradually expanding in scope. Since testing environments rarely match production workloads, the system should scale easily to manage increasing demands with minimal disruption.
  • Completeness: The stack should encompass the entire journey of data, from the edge devices to the cloud. This includes processes like data ingestion, transmission, storage, and analysis.

Now, let’s dive into the different layers that make up an IoT system.

Data Connectivity: The Foundation of IoT Systems

Data connectivity is the essential first step for IoT scenarios such as smart factories, homes, buildings, and distributed device networks. Before establishing this connection, it’s crucial to consider several key factors that will shape the technology and products you choose.

Network Capability The first priority is ensuring that your endpoint devices and backend message servers can successfully establish communication. To do this, you’ll need to verify compatibility between the network protocols and ensure that encryption capabilities (like SSL support) are aligned between your devices and servers.

Messaging Protocol How will messages be sent and received between endpoints and servers? While you could develop custom messaging behaviors, most businesses rely on established industrial standards. The most widely adopted messaging protocol in IoT is MQTT (Message Queuing Telemetry Transport), which operates using a publish/subscribe model. Its robust software ecosystem makes it a preferred choice, so selecting devices that support MQTT is highly recommended.

Explore the world of MQTT through this comprehensive guide: MQTT Guide 2024: Beginner to Advanced

Data Protocol In MQTT messages, you can transmit data in various formats. Although custom data protocols may be developed to minimize network traffic, they are typically harder to maintain and evolve over time. Using a widely accepted format like JSON ensures better flexibility and easier programming.

If you’re working on an industrial IoT (IIoT) project, you might encounter OPC UA standards, which are gaining popularity among major vendors. However, you’ll still need to manage non-OPC UA devices, legacy equipment from different vendors, and even multimedia devices like cameras. In such cases, using an edge gateway becomes critical for integrating these diverse systems.

The Edge Gateway: A Key Component for IoT Integration

An edge gateway acts as a bridge between various devices and the cloud, enabling seamless communication and data flow. By managing different protocols and reducing the amount of data sent to the cloud, edge gateways enhance overall system efficiency.

EMQ provides an effective edge gateway solution: NeuronEX. Its key features include:

  • Network Proxy: NeuronEX can ingest data from various devices connected via field buses and transmit this data to the cloud using TCP or WebSocket connections.
  • Data Protocol Translation: NeuronEX supports hundreds of field buses and can seamlessly translate data protocols, eliminating the need for developers to manage multiple SDKs for data ingestion.
  • Rule-Based Edge Computing: By performing data pre-processing at the edge, NeuronEX reduces network traffic and accelerates the overall data processing workflow. It features a user-friendly rule-based interface for setting up processing flows and can integrate with ML inference runtimes on the edge server.

Learn more about NeuronEX gateway: NeuronEX: Industrial Edge Data Hub

The Message Broker: The Backbone of IoT Communication

A message broker that supports the publish/subscribe (pub/sub) model serves as a central data hub in many IoT scenarios. This model is especially beneficial in IoT applications, as it simplifies connection management by allowing devices to communicate without needing to know about each other directly. Instead of establishing one-to-one connections, devices can publish messages to specific topics, while others subscribe to those topics to receive updates. This decoupling reduces the complexity of connections and enhances scalability.

Moreover, message brokers provide Quality of Service (QoS) features, which ensure that messages are delivered reliably, even in scenarios with intermittent connectivity or high message traffic. By managing message delivery and storage, they help to minimize data loss and maintain communication integrity across diverse devices.

EMQX Platform, the flagship product from EMQ, is a cloud-native distributed MQTT data platform that supports both MQTT 3.1 and 5.0 protocols. Beyond functioning as an MQTT message broker, EMQX Platform incorporates essential enterprise features such as data integration, rule engine, and data encryption. Additionally, it allows users to deploy the EMQX Platform via cloud services or through on-premise installations.

More information about the EMQX Platform: Fully Managed MQTT Broker on a Dedicated Infrastructure

Data Storage: Managing the Volume of IoT Data Efficiently

IoT applications can generate vast amounts of data, especially with numerous devices requiring high-frequency data ingestion. Even with preprocessing using tools like NeuronEX or EMQX, the volume sent to the cloud can be overwhelming, necessitating a hierarchical data storage approach.

The hierarchical approach below to storage is generally sufficient for managing the diverse data needs of IoT applications:

First Layer: Operational Data Store Near-term operational data is typically stored in a time series database. These databases offer excellent data compression and support window-based analysis, making them ideal for setting alerts on critical metrics. For anomaly detection using machine learning, incorporating a feature store is beneficial; some time series databases, like Apache IoTDB, now include this functionality. Additionally, when selecting an operational data store, consider its data visualization capabilities. Intuitive visualizations can provide users with valuable insights, so ensure compatibility with tools like Tableau or Apache Superset.

Second Layer: Data Lake or Lake House Archiving data periodically from the operational data store to a data lake or lake house is a best practice. Modern lake houses often support time series tables, allowing you to leverage existing systems within your organization effectively.

Third Layer: Object Storage For scenarios involving cameras or the generation of diagnostic reports, managing file storage becomes crucial. EMQX Platform can facilitate file transmission from endpoints to the cloud and integrate this data with S3 object storage.

As IoT scenarios continue to grow, the future of IoT data stacks will prioritize several key areas:

  • AI and Machine Learning Integration for predictive analytics and automation.
  • Greater Focus on Interoperability across different IoT platforms and devices.
  • Edge Computing Capabilities for connected cars, drones, and robots.

It's crucial to stay informed about these future trends. Be sure to review the roadmaps of your selected products to determine whether vendors are committed to investing in these features.

A well-architected IoT data stack is critical for turning IoT-generated data into valuable insights. By carefully considering each layer of the stack—from data ingestion to analytics—businesses can ensure they are prepared to scale their IoT initiatives and drive innovation in their industries.

Talk to an Expert
Contact Us →