A single IoT device, by itself, does not generate much data. As your application scales to millions of devices, the volume of machine-generated data becomes enormous. As the volume of data increases, so does the complexity of the data storage architecture required to support it.
The data storage layer is often one of the primary causes of performance problems as an application scales. A properly designed data storage architecture is one of the most critical components of an IoT platform.
This article describes a data storage architecture that is proven to support IoT applications with millions of devices. This is the architecture used at Losant, and our IoT platform provides the foundation for some of the largest organizations in the world.
When operating at this scale, there is no longer a one-size-fits-all solution to data storage. We have to begin segmenting data and utilizing data storage technology that is optimized for each segment. At Losant, we segment data into three buckets:
- Hot Storage: optimized for IoT data with real-time query support.
- Warm Storage: optimized for large data volumes with generic query support.
- Cold Storage: optimized for cost.
A hot storage database backs the user-facing side of your IoT application. Hot storage databases are optimized for performance, so data can be instantly queried and displayed on dashboards or custom user interfaces.
Most of the data generated by IoT devices is time-series data. This means a well-designed hot storage database is also optimized for time-series aggregations like min, max, mean, standard deviation, and others. For example, Losant’s Dashboards have several blocks that display aggregated data, like the Gauge and Time Series Graph. The aggregations required to populate these dashboard blocks are performed directly in Losant’s time-series database.
Due to the performance requirements of a hot storage database, storing this type of data can be costly. To control this cost, most hot storage architectures implement data retention. This means data will only be available in the hot storage database for a certain amount of time. At Losant, the most common data retention limits are either six or twelve months, depending on the specific requirements of the customer.
Before data reaches the retention limit and is deleted, it can be copied to warm or cold storage. Losant provides Application Archiving that automatically copies data from hot storage to a customer-owned cold storage option. If a customer requires warm storage, data can be copied in real-time using Losant Workflows.
Warm storage databases are optimized for scale, meaning they can potentially store an indefinite amount of data. The main difference between warm storage and cold storage is the ability to easily query the data. Warm storage databases, often called data lakes or data warehouses, typically provide some kind of generic query support to explore the data. These databases, however, are not optimized for IoT data and usually don’t offer the powerful aggregations required for time-series queries.
One of the primary use cases for warm storage is for offline analytics and AI/ML. These kinds of analytics can be performed using data from hot storage as long as the amount of data doesn’t exceed data retention limits. For example, Losant Notebooks support batch analytics on any amount of data stored in hot storage. That said, training an AI/ML model can require data spanning several years. In these cases, you’ll need to train your models using data from warm or cold storage.
Storing an indefinite amount of data is a technical challenge for any database. Fortunately, this problem has been solved by every major cloud vendor. At Losant, we recommend customers adopt the data warehouse offering from their preferred cloud vendor and stream data to it in real-time using a Workflow. As an added benefit, most cloud vendors have direct integrations between their data warehouse offerings and their AI/ML offerings.
Below are the data warehouses provided by each major cloud vendor:
Cold storage is optimized for cost. Whereas hot storage and warm storage are typically provided through databases, cold storage is usually implemented as cloud buckets or file storage. Removing the database engine drastically reduces the cost of storage, but sacrifices the ability to quickly and easily query and explore the data.
The primary purpose of cold storage is for archiving and backups. In many cases, compliance requirements dictate the necessity of cold storage since data must be stored for a certain amount of time, sometimes decades depending on the industry. Cold storage provides a cost-effective way to meet these compliance requirements.
At Losant, we provide Application Archiving, which automatically copies data from hot storage to a customer-owned cold storage bucket from one of the major cloud vendors:
Understanding Your IoT Platform’s Data Storage Architecture
Building an application using an IoT platform greatly reduces time to market, but you must ensure the platform you choose is architected to meet the scalability needs of your application. At Losant, we’ve been refining our hot storage database over several years and billions of data points. By segmenting data between hot, warm, and cold, we understand where one storage technique should end and another should begin.
If you’d like to discuss your application and see how we’d recommend architecting your use case on top of our platform, please contact us.