A big data analytics framework for IoT applications in the cloud

Đăng ngày 4/25/2019 3:43:09 AM | Thể loại: | Lần tải: 0 | Lần xem: 6 | Page: 12 | FileSize: 0.74 M | File type: PDF
A big data analytics framework for IoT applications in the cloud. This article proposes BDAaaS, a flexibly adaptive cloud-based framework for real-time Big Data analytics. The framework collects and analyzes data for IoT applications reusing existing components such as IoT gateways, Message brokers and Big Data Analytics platforms which are deployed automatically.
c
VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55
A Big Data Analytics Framework
for IoT Applications in the Cloud
Linh Manh Pham
University of Grenoble Alpes, Grenoble, France
Abstract
The Internet of Things (IoT) is an evolution of connected networks including million chatty embedded devices. A
huge amount of data generated day by day by things must be aggregated and analyzed with technologies of the
“Big Data Analytics”.
It requires coordination of complex components deployed both on premises and Cloud
platforms.
This article proposes BDAaaS, a flexibly adaptive cloud-based framework for real-time Big Data
analytics. The framework collects and analyzes data for IoT applications reusing existing components such as IoT
gateways, Message brokers and Big Data Analytics platforms which are deployed automatically. We demonstrate
and evaluate BDAaaS with the implementation of a smart-grid use case using dataset originating from a practical
source.
The results show that our approach can generate predictive power consumption fitting well with real
consumption curve, which proves its soundness.
2015 Published by VNU Journal of Sciences.
Manuscript communication: received 28 April 2015, revised 20 June 2015, accepted 25 June 2015
Correspondence: Linh Manh Pham, Linh.M.Pham@ieee.org
Keywords:
Big Data Analytics, Cloud Computing, Event Stream Processing, Internet of Things.
1. Introduction
deployed on Cloud [4].
Moreover,
deploying
a BDA infrastructure often requires engineers
Millions of chatty embedded devices such as
wireless sensors, RFID, mobile sensors have been
operating in the connected networks of Internet of
Things (IoT). According to Forbes, the number
of connected things will be approximately 41
billions by the end of 2020 [1]. IoT fully
benefits from economic models oered by up-
to-date technologies from Cloud computing (i.e.
pay-as-you-go style), which improves the quality
of service delivered to customers and helps
them to satisfy their legal and contractual duties.
However, associated IoT services require the
collecting of huge amount of data produced
by swarms of sensors using dedicated gateways
and the analysis these data using “Big Data
with skills in various technologies of Cloud
computing, Big Data, IoT as well as knowledge
in diverse business domains such as smart grid,
healthcare, supply chain, etc. Gartner forecasts
that the projects in Big Data will globally create
4.4 million IT jobs by 2015 [5]. However,
this number is underestimated with the billions
of “chatty” connected things which will be
integrated continuously in the next few years.
This practice will create a new kind of job
which requires people who are both business and
domain experts to design and deploy complex
analytic workflows and to interpret big data
results. These experts need the ecient tools to
coordinate all the phases in an automatic manner.
Analytics” platforms (BDA for short). The BDA
The
BDAaaS
is
a
framework
to
provide
oers models (e.g. Map-Reduce, ESP, CEP [2])
specifications
for
generating
specific
cloud-
and
technologies
(e.g.
Hadoop,
Storm
[3]),
based
PaaS
of
real-time
BDA
applications.
L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55
45
The BDAaaS project targets the people who are
We
perform
experiments
which
validate
notexpertofClouddeploymentandconfiguration
the soundness and eciency of BDAaaS
(i.e.
D&C
process).
It
enables
to
build
using dataset from practical sources. These
complex workflows of standard analytics, sensor
experiments
are
deployed
on
a
hybrid
data sources and data viz components and to
Cloud
environment
comprising
a
private
deploy them both on the IoT gateways and on
OpenStack [6] hosting center,
the public
virtual servers hosted by one or several Cloud
Microsoft Azure [7] and Amazon EC2 [8]
platforms.
Moreover,
the framework aims to
Clouds.
design and deploy rapidly SaaS for BDA. The
main motivation of BDAaaS is not only to
ease the provisioning of such real-time BDA
applications but also to collect and analyze data
from IoT gateways to Cloud hosting. The
objectives of the framework are to
The rest of the article is organized as follows.
Section 2 presents the overall architecture and
components of BDAaaS. The proof-of-concept
implementation of BDAaaS using dataset from
practical sources is demonstrated in smart-grid
use case of Section 3. Section 4 discusses
ease the D&C works of the components
involved in the collecting and filtering
workflows from the IoT gateways to the
streaming processors deployed on a Cloud
the validating experiment and results on the
aforementioned use case. After highlighting
various related works in Section 5, we conclude
and present future works in Section 6.
platform.
provide
statistical/numerical
libraries
for
2. The BDAaaS Framework
topologies of streaming processors fitting
The
BDAaaS
framework
is
designed
with
domain specific concerns (e.g.
smart-grid
a
flexible
architecture
described
as
a
set
of
consumption, prediction, etc.).
abstract components.
When a BDAaaS instance
Moreover, the core of the framework aims to
be agnostic to the protocols (MQTT, MQTT-SN,
M3DA, STOMP, AMQP, CoAP, WebSockets,
WebRTC, XMPP IoT, future HTTP 2.0), the
data formats (BSON, JSON, XML), the data
models (oBIX, OpenADR, ESTI M2M, IPSO
SmartObject), thegateways(ESH,OM2M,Kura,
IoTSys), the brokers (Mosquitto, RabbitMQ,
Kafka), the sensor data stores (MongoDB,
Cassandra, HDFS, TempoDB, InfuxDB, SciDB,
Graphite), the oine/batch Big Data analytics
platforms (Hadoop, Spark), the real-time Big
Data analysis platforms (Storm, S4, Samza,
Spark streaming, MUPD8), the data visualization
and dashboard frameworks (Grafana, Graphite,
OpenEnergyMonitor).
In summary, we make the following
contributions in this article:
is deployed on the Cloud, each of these
abstract components will be specialized into a
concrete component. Therefore the components
of framework can be replaced easily to use
new gateway protocols, data stores and so
on. The overall novel architecture of the
BDAaaS framework is depicted in Figure 1.
This architecture is inspired by the lambda
architecture [9] which is a data-processing
architecture designed to deal with massive data
from multiple sources. A lambda system contains
typically three layers: batch processing aiming
to perfect accuracy, speed (or real-time stream
processing) for minimizing latency, and a serving
layer for responding to queries. The architecture
of BDAaaS is composed of the following abstract
components:
IoT Gateway: this component collects data
fromvarioussensornetworks(e.g. enOcean,
We
propose
the
novel
architecture
of
Zigbee,
DASH7,
6LowPAN,
KNX)
and
BDAaaS, a generic, improvable cloud-based
publish them to the brokers.
They store
PaaS framework for real-time BDA.
temporally the data when the networks such
46
L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55
Fig. 1: The BDAaaS novel framework implementing lambda architecture.
as
Wifi
public
hotspots,
ADSL,
1234G,
Real-time Event Stream Processor:
This
SMS, Satellite, Ham radio are not available.
component aims at analyzing in real-time
Gatewayscanbeeithermobile(frompeople,
the stream of sensor data incoming from the
cars to sounding balloons) or static (vending
IoTgateways thanks tothe brokersto predict
machines, interactive digital signage).
and classify the data.
Message Broker or Message-as-a-Service:
this component implements the distributed
publish-subscribe patterns with various QoS
properties. These include fault-tolerance,
high-availability, elasticity, causality,
deterministic delivery time, etc.
D&C Manager: deploys, configures,
adapts the components in a hybrid cloud
infrastructure and on physical machines
(gateways).
DSL-based Plug-in: for Eclipse and Visual
NoSQL Store: stores temporal data of
sensors as well as calculated aggregation
data and prediction models.
Studio to design easily the BDA workflows
like components to deploy, sensors to use,
topologies to reuse, etc.
Batch Processor: This component
operating at Batch layer is to retrieve
oine data from the storages and perform
batch processing.
To put more flexible to our framework,
a D&C manager is used to orchestrate the
system autonomously. We developed such
an orchestrator which is described in [10]. It
is a middleware for configuring, installing,
Machine Learning Processor:
calculates
and
managing
complex
legacy
application
oine
unsupervised
models
of
oine
stacks deployed on the Cloud virtual machines
big
data
to
parameter
the
event
stream
and
on
physical
machines
which
dynamically
processing.
evolve
over
time.
Designed
based
on
SoA
HƯỚNG DẪN DOWNLOAD TÀI LIỆU

Bước 1:Tại trang tài liệu slideshare.vn bạn muốn tải, click vào nút Download màu xanh lá cây ở phía trên.
Bước 2: Tại liên kết tải về, bạn chọn liên kết để tải File về máy tính. Tại đây sẽ có lựa chọn tải File được lưu trên slideshare.vn
Bước 3: Một thông báo xuất hiện ở phía cuối trình duyệt, hỏi bạn muốn lưu . - Nếu click vào Save, file sẽ được lưu về máy (Quá trình tải file nhanh hay chậm phụ thuộc vào đường truyền internet, dung lượng file bạn muốn tải)
Có nhiều phần mềm hỗ trợ việc download file về máy tính với tốc độ tải file nhanh như: Internet Download Manager (IDM), Free Download Manager, ... Tùy vào sở thích của từng người mà người dùng chọn lựa phần mềm hỗ trợ download cho máy tính của mình  
6 lần xem

A big data analytics framework for IoT applications in the cloud. This article proposes BDAaaS, a flexibly adaptive cloud-based framework for real-time Big Data analytics. The framework collects and analyzes data for IoT applications reusing existing components such as IoT gateways, Message brokers and Big Data Analytics platforms which are deployed automatically..

Nội dung

VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 A Big Data Analytics Framework for IoT Applications in the Cloud Linh Manh Pham University of Grenoble Alpes, Grenoble, France Abstract The Internet of Things (IoT) is an evolution of connected networks including million chatty embedded devices. A huge amount of data generated day by day by things must be aggregated and analyzed with technologies of the “Big Data Analytics”. It requires coordination of complex components deployed both on premises and Cloud platforms. This article proposes BDAaaS, a flexibly adaptive cloud-based framework for real-time Big Data analytics. The framework collects and analyzes data for IoT applications reusing existing components such as IoT gateways, Message brokers and Big Data Analytics platforms which are deployed automatically. We demonstrate and evaluate BDAaaS with the implementation of a smart-grid use case using dataset originating from a practical source. The results show that our approach can generate predictive power consumption fitting well with real consumption curve, which proves its soundness. 2015 Published by VNU Journal of Sciences. Manuscript communication: received 28 April 2015, revised 20 June 2015, accepted 25 June 2015 Correspondence: Linh Manh Pham, Linh.M.Pham@ieee.org Keywords: Big Data Analytics, Cloud Computing, Event Stream Processing, Internet of Things. 1. Introduction deployed on Cloud [4]. Moreover, deploying Millions of chatty embedded devices such as wireless sensors, RFID, mobile sensors have been operating in the connected networks of Internet of Things (IoT). According to Forbes, the number of connected things will be approximately 41 billions by the end of 2020 [1]. IoT fully benefits from economic models offered by up-to-date technologies from Cloud computing (i.e. pay-as-you-go style), which improves the quality of service delivered to customers and helps them to satisfy their legal and contractual duties. However, associated IoT services require the collecting of huge amount of data produced by swarms of sensors using dedicated gateways and the analysis these data using “Big Data a BDA infrastructure often requires engineers with skills in various technologies of Cloud computing, Big Data, IoT as well as knowledge in diverse business domains such as smart grid, healthcare, supply chain, etc. Gartner forecasts that the projects in Big Data will globally create 4.4 million IT jobs by 2015 [5]. However, this number is underestimated with the billions of “chatty” connected things which will be integrated continuously in the next few years. This practice will create a new kind of job which requires people who are both business and domain experts to design and deploy complex analytic workflows and to interpret big data results. These experts need the efficient tools to coordinate all the phases in an automatic manner. Analytics” platforms (BDA for short). The BDA The BDAaaS is a framework to provide offers models (e.g. Map-Reduce, ESP, CEP [2]) specifications for generating specific cloud-and technologies (e.g. Hadoop, Storm [3]), based PaaS of real-time BDA applications. L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 45 The BDAaaS project targets the people who are • We perform experiments which validate notexpertofClouddeploymentandconfiguration the soundness and efficiency of BDAaaS (i.e. D&C process). It enables to build using dataset from practical sources. These complex workflows of standard analytics, sensor experiments are deployed on a hybrid data sources and data viz components and to deploy them both on the IoT gateways and on virtual servers hosted by one or several Cloud Cloud environment comprising a private OpenStack [6] hosting center, the public Microsoft Azure [7] and Amazon EC2 [8] platforms. Moreover, the framework aims to Clouds. design and deploy rapidly SaaS for BDA. The main motivation of BDAaaS is not only to ease the provisioning of such real-time BDA applications but also to collect and analyze data from IoT gateways to Cloud hosting. The objectives of the framework are to • ease the D&C works of the components involved in the collecting and filtering workflows from the IoT gateways to the streaming processors deployed on a Cloud platform. The rest of the article is organized as follows. Section 2 presents the overall architecture and components of BDAaaS. The proof-of-concept implementation of BDAaaS using dataset from practical sources is demonstrated in smart-grid use case of Section 3. Section 4 discusses the validating experiment and results on the aforementioned use case. After highlighting various related works in Section 5, we conclude and present future works in Section 6. • provide statistical/numerical libraries for 2. The BDAaaS Framework topologies of streaming processors fitting The BDAaaS framework is designed with domain specific concerns (e.g. smart-grid a flexible architecture described as a set of consumption, prediction, etc.). Moreover, the core of the framework aims to be agnostic to the protocols (MQTT, MQTT-SN, M3DA, STOMP, AMQP, CoAP, WebSockets, WebRTC, XMPP IoT, future HTTP 2.0), the data formats (BSON, JSON, XML), the data models (oBIX, OpenADR, ESTI M2M, IPSO SmartObject), thegateways(ESH,OM2M,Kura, IoTSys), the brokers (Mosquitto, RabbitMQ, Kafka), the sensor data stores (MongoDB, Cassandra, HDFS, TempoDB, InfuxDB, SciDB, Graphite), the offline/batch Big Data analytics platforms (Hadoop, Spark), the real-time Big Data analysis platforms (Storm, S4, Samza, Spark streaming, MUPD8), the data visualization and dashboard frameworks (Grafana, Graphite, OpenEnergyMonitor). In summary, we make the following contributions in this article: abstract components. When a BDAaaS instance is deployed on the Cloud, each of these abstract components will be specialized into a concrete component. Therefore the components of framework can be replaced easily to use new gateway protocols, data stores and so on. The overall novel architecture of the BDAaaS framework is depicted in Figure 1. This architecture is inspired by the lambda architecture [9] which is a data-processing architecture designed to deal with massive data from multiple sources. A lambda system contains typically three layers: batch processing aiming to perfect accuracy, speed (or real-time stream processing) for minimizing latency, and a serving layer for responding to queries. The architecture of BDAaaS is composed of the following abstract components: • IoT Gateway: this component collects data fromvarioussensornetworks(e.g. enOcean, • We propose the novel architecture of Zigbee, DASH7, 6LowPAN, KNX) and BDAaaS, a generic, improvable cloud-based publish them to the brokers. They store PaaS framework for real-time BDA. temporally the data when the networks such 46 L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 Fig. 1: The BDAaaS novel framework implementing lambda architecture. as Wifi public hotspots, ADSL, 1234G, • Real-time Event Stream Processor: This SMS, Satellite, Ham radio are not available. Gatewayscanbeeithermobile(frompeople, cars to sounding balloons) or static (vending machines, interactive digital signage). • Message Broker or Message-as-a-Service: this component implements the distributed publish-subscribe patterns with various QoS properties. These include fault-tolerance, high-availability, elasticity, causality, deterministic delivery time, etc. • NoSQL Store: stores temporal data of sensors as well as calculated aggregation data and prediction models. • Batch Processor: This component operating at Batch layer is to retrieve offline data from the storages and perform batch processing. component aims at analyzing in real-time the stream of sensor data incoming from the IoTgateways thanks tothe brokersto predict and classify the data. • D&C Manager: deploys, configures, adapts the components in a hybrid cloud infrastructure and on physical machines (gateways). • DSL-based Plug-in: for Eclipse and Visual Studio to design easily the BDA workflows like components to deploy, sensors to use, topologies to reuse, etc. To put more flexible to our framework, a D&C manager is used to orchestrate the system autonomously. We developed such an orchestrator which is described in [10]. It is a middleware for configuring, installing, • Machine Learning Processor: calculates and managing complex legacy application offline unsupervised models of offline stacks deployed on the Cloud virtual machines big data to parameter the event stream and on physical machines which dynamically processing. evolve over time. Designed based on SoA L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 47 principal, the orchestrator consists of a simple by analyzing in real time electric consumptions DSL (i.e. Domain Specific Language) for collected in houses. This PaaS is composed describing configuration properties and of well-known open-source components, as in inter-dependencies of service components; a the case of OpenHAB [12], Mosquitto [13], distributed configuration protocol ensuring the RabbitMQ [14], Storm and Cassandra [15], dynamic resolution of these inter-dependencies; a runtime system that guarantees the correct which implements a part of BDAaaS framework. We consider BDAaaS as the premise of a real- deployment and management of distributed time BDA PaaS cloud framework. In the application across multiple Clouds, physical short term, the infrastructure will be adjusted devices or virtual machines. The components dynamically according to the load of sensor can be changed or replaced flexibly using the messages’ throughput by the orchestrator. For orchestrator’s plug-in mechanism. According instance, additional cloud instances can be added to this way, a component can be developed as a plug-in at design phase and be plugged into the orchestrator’s core at runtime. The plug-in needs to conform corresponding interface for specific to the speed layer of the architecture when the response time goes over a threshold. The detailed architecture of the BDAaaS infrastructure for the use case is shown in Figure 2 and its components type of components. It supports popular IaaS are described as follows. Cloud platforms such as Amazon EC2, Microsoft Azure, OpenStack and VMware vSphere [11]. The BDAaaS framework is described as a deployment plan of the orchestrator. The components of the framework are then deployed automaticallyatruntimeinCloudvirtualmachine instances or in embedded Linux boards that can be installed in real houses, stores, offices and buildings. Next section discusses in details about a smart-grid use case of BDAaaS. 3. The Smart-grid Usecase 3.1. IoT Gateways For the sensor data collection, we have chosen the OpenHAB framework which provides an integration platform for sensors and actuators of the home automation. The OpenHAB platform is based on the Eclipse Equinox OSGi platform [16]. The communication paradigm among the inner components of OpenHAB is Publish-Subscribe [17]. OpenHAB allows the users to specify DSL-based rules which will be parsed by its rule engine to update the actuator’s commands upon sensors state changes using the A smart grid is a nation-wide electrical grid OSGi Event Admin internal broker [18]. The that uses IoT technologies for monitoring in real time the behaviors of millions of electric Event-Condition-Action (ECA) paradigm is used by OpenHAB for executing the home automation consumers in order to adjust the electric actions. The OpenHAB rule engine evaluates production of power suppliers including coal plants, nuclear plants, solar panel installations, and executes ECA rules which are written in a DSL based on Eclipse XText and XTend. wind turbines. The goal is to optimize ECA rules are triggered on sensor value changes, the efficiency, reliability, economics, and command emission and timer expiration. Events sustainability of the production and distribution (e.g. state changes and commands) can be of electricity for both suppliers and consumers. “imported” or “exported” using bindings for For instance, the grid helps consumers to avoid MQTT, XMPP, Twitter, etc. OpenHAB can be charging electric batteries of vehicles at peak load in counterpart of low energy price. installed and run on embedded boards, some of which are Raspberry Pi, Beaglebone Black and For validation our BDA PaaS, we have Intel Galileo. developed an infrastructure for simple smart-grid application in which the electricity supplier can forecast the electric load in the next minutes For the smart-grid use case, we have developed a new OpenHAB plugin (called binding) in order to replay the sensor log files containing the smart- 48 L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 Fig. 2: The BDAaaS’s components for the smart-grid use case. plug measurements (e.g. timestamped load and system. It is designed to deploy a processing work) of each house. OpenHAB-BDAaaS is the packagingofOpenHABfortheBDAapplications chain in a distributed infrastructure such as a Cloud platform (IaaS). Storm can be applied including the plug-in and the data files. This successfully to the analysis of real-time data and package can be deployed on both embedded events for sensor networks (real-time resource boards and virtual machines with one instance per forecasting, consumption prediction), log file house. 3.2. MQTT Brokers MQ Telemetry Transport (MQTT) [19] is a transport data protocol for M2M networks. It is devised for supporting low-bandwidth and unreliable networks, as illustrated by satellite links or sensor networks. MQTT follows the publish-subscribe pattern between the sensors and one or more sinks like M2M gateways, etc. MQTT is now an OASIS standard. The main robust and open-source implementations of MQTT brokers are Mosquitto and RabbitMQ. 3.3. Speed Layer for Real-time Analytics For the speed layer of the lambda architecture, we have chosen the Apache Storm platform. Storm is a real-time event-stream processing system (monitoring and DDoS attack detection), finance (risk management), marketing and social networks (trend, advertising campaign). Initially developed by Twitter, its challengers are Apache S4 (Yahoo!), Spark Streaming, Millwheel (Google), and Apache Samza (LinkedIn). For the use case, we have developed a new Storm input components (called spout) in order to generate sensor tuples from the MQTT brokers by subscribing on the MQTT topics with one spout per house. 3.4. Historical Data Storage In the speed layer, the Storm topology needs to maintain some execution ongoing state. This is the case for the sliding window average of sensor values. To do this we use Storm with Cassandra for our real-time power consumption prediction. L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 49 # An Azure VM VM_AZURE { alias: VM Azure; installer: iaas; children: Storm_Cluster , Cassandra; } # A BeagleBone Black BOARD_BEAGLEBONE { alias: BeagleBone Black; installer: embedded; children: OpenHAB; } # Storm Cluster for ESP Storm_Cluster { alias: Storm Cluster; installer: bash; imports: Nimbus.port, Nimbus.ip; children: Nimbus, Storm_Supervisor; } # OpenHAB: A Home Automation Bus OpenHAB { alias: OpenHAB; installer: puppet; exports: ip, brokerChoice = Mosquitto; imports: Mosquitto.ip, Mosquitto.port; } ... Fig. 3: Components of the smart-grid use case under the orchestrator’s DSL. Cassandra is an open source distributed database encapsulated message dispersion, etc. Thus it management system (NoSQL solution). It is is a just-right solution for our application. The created to handle large amounts of data spread configuration of components in smart-grid use out across many nodes, while providing a case under the orchestrator’s DSL are excerpted highly available service with no single point and shown in Figure 3. We see that some of failure. Cassandra’s data model allows components may have their children which are incremental modifications of rows. 3.5. Visualization Dashboard For the forecast visualization, we have developed a simple dashboard displaying charts of current and predicted consumptions for suppliers and consumers. The dashboard is a simple HTML5 webapp using the Grafana, Bootstrap and AngularJS libraries. The webapp gets the data from the historical storage and subscribes to real-time updates through a websocket. 3.6. D&C Manager As mentioned, DSL of our chosen Cloud orchestrator is a hierarchical language which allows sysadmins to describe naturally multi-tier complex applications such as Java EE, OSGi, IoT, etc. Smart grid is a multi-tier BDA application implementing two layers of lambda architecture (speed and serving ones) and other layers including sensor data collecting, OpenHAB in the case of BOARD BEAGLEBONE and Storm Cluster, Cassandra in the case of VM AZURE, respectively. The components at sublayer, in turn, may contain subcomponents which are either stubs or not. Puppet and Bash is some of many choices for configuration tools. The components can exchange the configuration information using export/import variables. This mechanism help resolving inter-dependencies among components dynamically at runtime. For instance, likeshowninFigure3and4, aninstance of Nimbus exports its port and IP which will be imported by a specific instance of Storm cluster. It is worth noting that an instance can be either exporter or importer like OpenHAB instance in Figure 3. This exchangeable mechanism is also detailed in [10]. Overall, we can see how is easy to describe and replace components with the hierarchy, which conforms to the design principle of the BDAaaS itself. 50 L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 # A VM Azure with Nimbus instance of VM_AZURE { name: vm-azure-nimbus -1; # A VM EC2 for Message Broker instance of VM_EC2 { name: vm-ec2-mosquitto -1; instance of Nimbus { name: nimbus-storm -1; port: 6627; } } instance of Mosquitto { name: mosquitto -1; port: 1883; } } # A BeagleBone Board for OpenHAB instance of BOARD_BEAGLEBONE { name: board-bb-openhab -1; # A VM OpenStack with Cassandra instance of VM_OpenStack { name: vm-openstack -cassandra -1; instance of OpenHAB { name: openhab -1; } } instance of Cassandra { name: cassandra -1; } } ... Fig. 4: Instances of components of the smart-grid use case under the orchestrator’s DSL. 4. Validating Experiments In order to evaluate our work, we implement the smart-grid use case on a hybrid Cloud environment. Each house is represented by an OpenHAB process which publishes the given dataset related to the house (approximately 2.5 GB per house). The 2.5 GB dataset files are preloaded on microSD cards. These gateways are deployed on Beaglebone Black [20] embedded boards. The publication is done with the OpenHABs MQTT binding. The MQTT of deploying on Beaglebone Black, OpenHAB processes publishing the dataset can also be deployed and run on any virtual machines or containers on the Cloud. In either case, the D&C process is performed automatically by the chosen orchestrator. Figure 4 shows a short extract about instances of the experiment under the orchestrator’s DSL. We can see how this hierarchical language is successful in describing the distributed multi-tier complex applications. 4.1. The Dataset topic hierarchy contains a distinct topic for The smart-grid application uses dataset based each house. The Storm topology’s spout (see on practical records collecting from smart plugs, Section 4.2) subscribes to the MQTT topics and which are deployed in private households of 40 then retrieves data by sending them as a stream houses located in Germany. Those data are of Storm tuples over the analysis chain. The collected roughly every second for each sensor aggregated data and results are finally stored in each smart plug. It is worth noting that the in an Apache Cassandra database which is in data set is gathered in an real-world, uncontrolled our OpenStack private Cloud for safety. The environment, which implies the possibility of cluster of MQTT brokers, containing the topics, and the publishers are hosted on EC2 public producing imperfect data and measurements. A measurement scenario is described as follows. Cloud virtual machines. The Storm cluster The topmost entity is a house, identified by a has 3 worker nodes, each corresponding to a unique house ID. Every house combines one virtual machine instance on the Cloud of Azure or more households, identified by a unique for taking advantage of computing strength of household ID within a house. One or more our granted Microsoft infrastructure. Instead smart plugs are installed in every household, L.M. Pham / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 2 (2015) 44–55 51 Fig. 5: The Storm topology used in the smart-grid use case. each identified by an unique plug ID within a household. Every smart plug contains one or more types of sensors (e.g. load or work). The collected data stored preliminarily in a comma-separated file consisting of 4055 millions of measurements for 2125 plugs distributed across 40 houses. An one-month period of power consumption is recorded with the first timestamp equal to 1377986401 (01/01/2013, 00:00:00) and the last timestamp equal to 1380578399 (30/09/2013, 23:59:59). All events in the data file are sorted by the timestamp value with the same timestamp ordered randomly. The fields included in this file are: • id an unique identifier of a gauge • timestamp timestamp of a gauge • value value of the measurement • houseID an unique identifier of a house Our mission is to replay these data and spray it out by a plug-in of OpenHAB (see Section. 3.1). Thegoaloftheanalysisisthereal-timeprediction of power consumption in several thousands of houses. For the sake of the demonstration, data induced from smart plugs are stored in individual files (one per house). Those files are replayed by OpenHAB threads (each one emulates an IoT gateway). Each OpenHAB thread sends (i.e. publishing) the data recorded in the file using the MQTT protocol. The data are then sent to the machines running the real-time analysis platform through MQTT brokers (Mosquitto and RabbitMQ). The real-time analysis platform is a Storm topology which analyzes the power consumption measurements and computes the overall load in the next half-hour.