This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:iot-reloaded:iot_data_analysis [2024/07/19 19:22] – agrisnik | en:iot-reloaded:iot_data_analysis [2025/05/17 11:56] (current) – agrisnik | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== IoT Data Analysis ====== | ||
| + | IoT systems are built to provide better insights into different processes and systems to make better decisions. The insights are provided by measuring the statuses of the systems or process elements represented by data. Unfortunately, | ||
| + | Today, IoT systems produce a vast amount of data, which is very hard to use manually. Thanks to modern hardware and software developments, | ||
| + | As various resources have stated, IoT, in most cases, complies with the so-called big 5Vs of Big Data, where just one correspondence is needed to solve a Big Data problem. As has been explained by Jain et al. ((Jain, A., Mittal, S., Bhagat, A., Sharma, D.K. (2023). Big Data Analytics and Security Over the Cloud: Characteristics, | ||
| + | |||
| + | === Volume === | ||
| + | |||
| + | This characteristic is the most obvious and refers to the size of the data. In most practical applications of IoT systems, large volumes of data are reached through intensive production and collection of sensor data. It usually rapidly populates existing operational systems and requires dedicated IoT data collection systems to be upgraded or developed from scratch (which is more advisable). | ||
| + | |||
| + | === Variety === | ||
| + | |||
| + | Jain explained that Big Data is highly heterogeneous regarding source, kind, and nature. Having different systems, processes, sensors, and other data sources, variety is usually a distinctive feature of practical IoT systems. For instance, a system of intelligent office buildings would need data from a building management system, appliances and independent sensors, and external sources like weather stations or forecasts from appropriate external weather forecast APIs (Application programming interfaces). Additionally, | ||
| + | |||
| + | === Veracity === | ||
| + | |||
| + | Unfortunately, | ||
| + | |||
| + | === Velocity === | ||
| + | |||
| + | Data velocity characterises the data bound to the time and its importance during a specific period or at a particular time instant. A good example might be any real-time system like an industrial process control system, where reactions or decisions must be made during a fixed period, requiring data at particular time instants. In this case, data has a flow nature of a specific density. | ||
| + | |||
| + | === Value === | ||
| + | |||
| + | Since IoT systems and their data analysis subsystems are built to add value to their owners, the development and ownership costs should not exceed the returned value. A system is of low or no value if it does not apply. | ||
| + | |||
| + | ====== ====== | ||
| + | Dealing with Big Data requires specific hardware and software infrastructure. While there is a certain number of typical solutions and a lot more customised, some of the most popular are explained here: | ||
| + | |||
| + | === Relational DB-based systems === | ||
| + | |||
| + | Those systems are based on well-known relational data models and appropriate database management systems like MS SQL Server, Oracle Server, MySQL, etc. There are some advantageous features of those systems, for instance: | ||
| + | * Advantages of SQL (Structured Querying Language): enabling easy data manipulation while maintaining a relatively good expressiveness of the data model. | ||
| + | * A well-designed set of software tools and interfaces enabling integration with many different systems. | ||
| + | * A lot of built-in data processing routines (stored procedures) provide higher development productivity. | ||
| + | * Enables asynchronous reactions to events by triggering internal events. | ||
| + | * Data reading might be scaled out using multiple entities, while writing might be scaled up using more productive servers. | ||
| + | Unfortunately, | ||
| + | |||
| + | <figure RelationalDBMS> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | === Complex Event Processing (CEP) systems === | ||
| + | |||
| + | CEP systems are very application-tailored, | ||
| + | Some of the most common drawbacks to be considered are: | ||
| + | * It might be scaled up only by introducing higher productivity hardware, which is limited by the application-specific design. To some extent, the design might be more flexible if microservices and containerisation are applied. | ||
| + | * Due to the factors mentioned above and the complexity, the maintenance costs are usually higher than a universal design (figure 2). | ||
| + | |||
| + | <figure CEP_systems> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | === NoSQL systems === | ||
| + | |||
| + | As the name suggests, the main characteristic is higher flexibility in data models, which overcomes the limitations of highly structured relational data models (figure {{ref> | ||
| + | It also provides a means for scalability out and up, enabling high future tolerance and resilience. A typical approach uses a key-value or key-document approach, where a unique key indexes incoming data blocks or documents (JSON, for instance). | ||
| + | Some other designs might extend the SQL data models by others – object models, graph models, or the mentioned key-value models, providing highly purpose-driven and, therefore, productive designs. However, the complexity of the design raises problems of data integrity as well as the complexity of maintenance (figure 3). | ||
| + | |||
| + | <figure NoSQL_systems> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | === In-memory data grids === | ||
| + | |||
| + | This is probably the most productive type of system, providing high flexibility, | ||
| + | * Hazelcast ((http:// | ||
| + | * JBOSS Infinispan ((https:// | ||
| + | * IBM eXtreme Scale ((https:// | ||
| + | * Gigaspace XAP Elastic caching edition ((www.gigaspaces.com/ | ||
| + | * Oracle Coherence | ||
| + | * Terracotta enterprise suite ((www.terracotta.org/ | ||
| + | * Pivotal Gemfire | ||
| + | |||
| + | |||
| + | <WRAP excludefrompdf> | ||
| + | This chapter is devoted to the main groups of algorithms for numerical data analysis and interpretation, | ||
| + | |||
| + | * [[en: | ||
| + | * [[en: | ||
| + | * [[en: | ||
| + | * [[en: | ||
| + | * [[en: | ||
| + | * [[en: | ||
| + | * [[en: | ||
| + | </ | ||