Connecting Millions of Cars using EMQX MQTT and Upstash Kafka
Table of Contents
Introduction
Vehicle connectivity is set to shape the future of automotive transportation and smart mobility. In the upcoming decade, we anticipate a diverse array of transportation modes and interconnected infrastructure, forming a cohesive, integrated system. This evolution demands vehicles equipped with fast and reliable messaging capabilities to facilitate seamless communication within this interconnected ecosystem. Here are some of the connected car features:
- Real-Time Traffic Updates: Empower drivers with real-time traffic information, including congestion, accidents, and road closures, enabling informed route decisions.
- Electric Mobility Services: Implement e-mobility features such as battery charging, preconditioning, and efficient driving gamification.
- Advanced Driver Assistance Systems (ADAS): Enhance road safety with ADAS, featuring systems like lane departure warnings, forward collision warnings, and automatic emergency braking to help drivers avoid accidents.
- Driver Behavior Monitoring: Employ connected car data to monitor and analyze driver behavior. Provide feedback to drivers on safe driving practices, encouraging responsible behavior on the road.
- Voice-Activated Controls: Integrate voice-activated controls for various in-car functions, allowing drivers to perform tasks hands-free. This feature enhances safety and provides a more convenient driving experience.
- Emergency Assistance and Crash Response: Implement automatic crash detection and emergency response systems. Connected cars can transmit critical information to emergency services, enabling faster response times and potentially saving lives.
- Environmental Impact Monitoring: Monitor and report the environmental impact of vehicles, including emissions data. Provide drivers with insights into their carbon footprint and encourage eco-friendly driving habits.
With robust connectivity and data infrastructure, organizations can implement these connected car features, delivering enhanced value to drivers and passengers, elevating the overall driving experience, and positively impacting brand perception. MQTT and Kafka, stand ready to assist in building these features for connected car services and other IoT initiatives.
Architectural Challenges Building a Connected Car Platform
Developing a connected car platform involves creating the software infrastructure essential for new connected car services. This infrastructure encompasses the technology needed to establish connections between vehicles and the cloud, facilitate the transmission of data and events between the vehicle and the cloud.
The construction of a connected car platform introduces distinct technical challenges. The mobility of vehicles and the proliferation of connected devices simultaneously give rise to unique architectural considerations, including:
- Data Volume and Variety: Managing large volumes of diverse data generated by connected vehicles, including sensor data, multimedia streams, and diagnostic information.
- Move millions of MQTT messages
- End-to-end Multi QoS Guarantee
- Standardization and Interoperability: Adherence to industry standards for communication and data exchange to promote interoperability among diverse vehicles and systems.
- Network Connectivity Challenges: Vehicles on cellular networks may traverse blind spots, leading to interruptions in the connection between a vehicle and the cloud. Reconnecting may result in slow response times and message loss.
- Scalability Challenges: The cloud platform supporting the system must handle millions of simultaneous connections reliably to ensure a consistent customer experience, even during peak usage periods.
- Network Latency Issues: Similar to blind spots, network speed and latency can create variations in data flow between the vehicle and the cloud. A responsive user experience should mitigate the impact of network latency.
- Bidirectional Data Movement: Connected cars must efficiently transmit data between the vehicle and the cloud in both directions. Traditional client request/response architectures are inadequate for platforms communicating with millions of connected cars.
- Integration Complexity: Effortlessly incorporate IoT data into cloud services and enterprise systems, such as Kafka, SQL, NoSQL, and time series databases
- Integration with various databases, streaming platforms or cloud services
- Transform and process the data at scale
- Cost Management: Effectively managing the costs associated with the infrastructure, data storage, and communication networks, especially as the scale of connected devices increases.
Design Decisions
EMQX MQTT Platform
EMQX is a large-scale distributed MQTT messaging platform that offers "unlimited connections, seamless integration, and anywhere deployment." It provides the following capabilities:
- Enable Connectivity Across Any Scale of Devices
- Distributed MQTT Broker: Construct a global MQTT access network spanning multiple clouds, facilitating communication among devices, systems, and apps from diverse network endpoints.
- Effortless Scale: Connect hundreds of millions of IoT devices to the cloud seamlessly and reliably using the world's most scalable distributed MQTT broker.
- Versatile Protocol Support: Provide all-encompassing access to IoT devices through open standards like MQTT, OPC UA, CoAP/LwM2M, WebSocket, or industry-specific proprietary protocols.
- Prioritize Security and Privacy: Establish secure communication channels using MQTT over TLS/SSL or QUIC, implementing diverse authentication mechanisms such as LDAP, JWT, PSK, X.509 certificates, and more.
- Reliably Transfer IoT Data Anywhere
- Topic-based Pub/Sub Messaging: Facilitate seamless bidirectional data exchange among devices, systems, and applications through topic-based publish/subscribe messaging.
- Handle Millions of MQTT Messages: Efficiently manage a high volume of MQTT messages per second by leveraging a broker cluster equipped with a high-performance, real-time processing engine.
- Achieve Low Latency: Ensure message routing and delivery with soft real-time runtime, guaranteeing latency as low as 1ms.
- End-to-end Multi QoS Guarantee: Provide a diverse set of end-to-end Quality of Service (QoS) options, including at most once, at least once, and exactly once, to ensure reliable message delivery.
- Real-time Processing of MQTT Data
- SQL-based Real-time Data Processing: Employ SQL for real-time processing of data streams, enabling dynamic filtering, transformation, and analysis for actionable insights.
- Event Triggers for MQTT Client: Utilize the rule engine to respond to events such as client connections/disconnections, topic subscriptions/unsubscriptions, and message publications/deliveries.
- Rich Built-in Functions: Leverage a variety of predefined operations within the Rule Engine to enhance the processing and manipulation of data streams effectively and efficiently.
- Complex JSON Processing: Seamlessly integrate the JSON-specific processing tool jq to effortlessly convert and process intricate JSON data.
- Integration of IoT Data
- Integration with Various Databases: EMQX integrates seamlessly with databases like MySQL, PostgreSQL, Redis, MongoDB, InfluxDB, and ClickHouse, simplifying data sinking with minimal effort.
- Integration with Streaming Platforms: Connect to popular streaming platforms such as Kafka/Confluent, Pulsar, and RabbitMQ, facilitating message routing between the MQTT broker and enterprise systems.
- Integration with Cloud Services: Seamlessly integrate with cloud services like AWS Kinesis, Azure EventHub, and GCP Pub/Sub, enhancing data processing and analytics capabilities using major cloud platforms.
- Batch Data Processing: Utilize the built-in buffer layer to achieve efficient batch data processing, reducing the load on the target system and ensuring reliable and high-throughput data integration.
Upstash Kafka
- Rest API
- Upstash provides a REST API in addition to TCP-based Kafka clients, allowing access to Kafka topics through HTTP. The REST API proves especially beneficial in constrained environments, like mobile or edge devices, serving as a lightweight substitute for native Kafka clients.
- By leveraging the REST API, you can avoid the manual handling of Kafka clients and connections. This approach offers a user-friendly and straightforward means of interacting with Kafka topics, streamlining the process without the intricacies associated with native client implementations.
- Webhook API for Kafka
- A webhook is a custom HTTP callback, which can be triggered by some event from another service.
- Upstash Kafka Webhook API allows publishing of these events directly to a user-defined topic without using a third-party infrastructure or service.
- Schema Registry
- Schema Registry serves as a central hub to handle and validate schemas for message data related to Kafka topics. It also manages serialization and deserialization of data over the network. This aids producers and consumers in maintaining data consistency and compatibility as schemas change.
- Connectors
- Kafka Connect is a tool for streaming data between Apache Kafka and other systems without writing a single line of code. Via Kafka Sink Connectors, you can export your data into any other storage. Via Kafka Source Connectors, you can pull data to your Kafka topics from other systems.
- Kafka Connectors can be self-hosted but it requires you to set up and maintain extra processes/machines. Upstash provides hosted versions of connectors for your Kafka cluster. This will get the burden of maintaining an extra system from you and also it will be more performant since it will be close to your cluster.
- Examples of connectors available are MongoDB Source/Sink Connector, Debezium MongoDB Source Connector, Google BigQuery Sink Connector, and Snowflake Sink Connector
- Monitoring
- AKHQ is a GUI for monitoring & managing Apache Kafka topics, topics data, consumer groups, etc. You can connect and monitor your Upstash Kafka cluster using AKHQ.
- Kafka UI for Apache Kafka is a simple tool that makes your data flows observable, helps find and troubleshoot issues faster, and delivers optimal performance. Its lightweight dashboard makes it easy to track key metrics of your Kafka clusters - Brokers, Topics, Partitions, Production, and Consumption.
- Conduktor is a quite powerful application to monitor and manage Apache Kafka clusters. You can connect and monitor your Upstash Kafka cluster using Conduktor. Conduktor is free for development and testing.
Architecture and Data Flow
The data flow for connected cars involves the seamless transfer of information between the cars and the backend systems. Here's a high-level overview of the data flow, incorporating EMQX, Upstash, Kafka, and other relevant IT systems:
Data Generation at Connected Cars
- Sensor Data: Connected cars generate a variety of data from onboard sensors, including GPS, accelerometers, cameras, and other relevant sensors.
- Telemetry Data: The cars transmit telemetry data such as speed, fuel consumption, engine health, and other performance metrics.
Data Ingestion through EMQX (MQTT)
EMQX acts as the MQTT broker, facilitating communication between connected cars and backend systems.
- MQTT Protocol: Connected cars use the MQTT protocol to publish data to an EMQX MQTT broker.
- Topics: Different topics may be used for different types of data (e.g., GPS data, telemetry data) to organize and route information efficiently.
- Quality of Service (QoS): Depending on the reliability requirements, the MQTT QoS level can be adjusted to ensure the successful delivery of messages.
Upstash Kafka for Event Streaming
- Event Streaming: Use Kafka for high-throughput event streaming. This is particularly useful for scenarios where data needs to be reliably persisted, processed asynchronously, or shared among multiple microservices.
- Topics and Partitions: Kafka topics can be used to categorize different types of events, and partitions can help parallelize the processing of events.
Data Processing and Transformation
- Real-time Processing (Optional): If real-time processing is required, systems like Apache Kafka Streams or Apache Flink can process and analyze incoming data streams in real-time.
- Data Transformation: Transform and enrich raw data as needed, converting it into a format suitable for storage, analytics, or further processing.
Data Storage
Store processed and transformed data in appropriate databases based on the data type and requirements. For example:
- Use relational databases (e.g., PostgreSQL) for structured data.
- Use NoSQL databases (e.g., MongoDB) for unstructured or semi-structured data.
Real-time Analytics and Reporting
- Analytics Engine: Implement a real-time analytics engine to derive insights from the aggregated and processed data.
- Reporting Services: Develop reporting services or dashboards for users or administrators to visualize the analytics results.
This data flow provides a high-level overview, and the specifics may vary based on the requirements of your connected car system. It's important to continuously assess and update the architecture to accommodate evolving needs and technologies.
Closing Notes
In our design, IoT enabled connected cars seamlessly communicate through EMQX, ensuring efficient data exchange using MQTT. Upsatsh Kafka provides a robust foundation for reliable event streaming. Security is paramount, with end-to-end encryption and access controls safeguarding sensitive information. The system is designed to scale horizontally, ensuring flexibility and fault tolerance for future growth. Real-time processing extracts valuable insights from incoming data, contributing to dynamic analytics. Continuous monitoring, logging, and API-driven integrations enhance the system's reliability and adaptability, ensuring it stays at the forefront of technological advancements.