VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 2 (2016) 10-22
An Efficient Implementation of Advanced Encryption
Standard on the Coarse-grained Reconfigurable Architecture
Hung K. Nguyen*, Xuan-Tu Tran
SIS Laboratory, VNU University of Engineering and Technology,
144 Xuan Thuy road, Cau Giay district, Hanoi, Vietnam
Abstract
The Advanced Encryption Standard (AES) is currently considered as one of the best symmetric-key block
ciphers. The hardware implementation of the AES for hand-held mobile devices or wireless sensor network
nodes is always required to meet the strict constraints in terms of performance, power and cost. Coarse-grained
reconfigurable architectures are recently proposed as the solution that provides high flexibility, high performance
and low power consumption for the next-generation embedded systems. This paper presents a flexible, high-
performance implementation of the AES algorithm on a coarse-grained reconfigurable architecture, called
MUSRA (Multimedia Specific Reconfigurable Architecture). First, we propose a hardware-software partitioning
method for mapping the AES algorithm onto the MUSRA. Second, the parallel and pipelining techniques are
considered thoughtfully to increase total computing throughput by efficiently utilizing the computing resources
of the MUSRA. Some optimizations at both loop transformation level and scheduling level are performed in
order to make better use of instruction-, loop- and task- level parallelism. The proposed implementation has been
evaluated by the cycle-accurate simulator of the MUSRA. Experimental results show that the MUSRA can be
reconfigured to support both encryption and decryption with all key lengths specified in the AES standard. The
performance of the AES algorithm on the MUSRA is better than that of the ADRES reconfigurable processor,
Xilinx Virtex-II, and the TI C64+ DSP.
Received 24 November 2015, revised 06 January 2015, accepted 13 January 2016
Keywords: Coarse-grained Reconfigurable Architecture (CGRA), Advanced Encryption Standard (AES), Reconfigurable
Computing, Parallel Processing.
1.
Introduction*
network. The Advanced Encryption Standard
(AES), which has been standardized by the
The fast development of the communication
technology enables the information to be easily
shared globally via the internet, especially with
the Internet of Things (IoT). However, it also
raises the requirement about the secure of the
information, especially the sensitive data such
as password, bank account, personal
information, etc. One method to protect the
sensitive data is using symmetric-key block
cipher before and after sending it over the
________
National Institute of Standard and Technology
(NIST) [1], is currently considered as one of the
best symmetric-key block ciphers. With the
block size of 128 bits and the variable key
length of 128 bits, 192 bits or 256 bits, the AES
has been proved to be a robust cryptographic
algorithm against illegal access.
The hardware implementation of the AES
for modern embedded systems such as hand-
held mobile devices or wireless sensor network
(WSN) nodes always gives designers some
challenges such as reducing chip area and
* Corresponding author. E-mail.: kiemhung@vnu.edu.vn
10
H.K. Nguyen, X.T. Tran / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 2 (2016) 10-22
11
power
consumption,
increasing
application
hardware tasks to be mapped onto the same
performance,
shortening
time-to-market,
and
hardware platform, thus reducing the area and
simplifying the updating process. Besides, these
power consumption of the design [8].
systems
are
often
designed
not
only
for
a
specific application but also for multiple
applications. Such sharing of resources by
several applications makes the system cheaper
P
Data Memory
Instruction
Memory
and
more
versatile.
Application
Specific
Integrated
Circuits
(ASICs),
Digital
Signal
Processors (DSPs), and Application-Specific
Instruction Set Processors (ASIPs), have been
AMBA AHB
used for implementing the mobile multimedia
systems. However, none of them meets all of
the
above
challenges
[2].
The
software
AHB/CGRA Interface
implementation of the AES algorithm by using
processors (e.g. [3]) are usually very flexible
and usually targets at the applications at where
DPLL
CGRA
IP cores
flexibility
has
a
higher
priority
than
the
implementation efficiency in terms
of
power
Figure 1. System-level application model of CGRA.
consumption, area, and performance. In contrast,
the ASIC implementation of the AES algorithm
(e.g. [4]) usually offers the optimized
performance and power consumption. However,
the drawback of ASIC is lower flexibility.
Moreover, the high price for designing and
manufacturing the chip masks is becoming
increasingly an important factor that limits the
application scope of ASIC. Recently, a very
promising solution is the reconfigurable
computing systems (e.g. Zynq-7000 [5],
ADRES [6], etc.) that are integrated many
heterogeneous processing resources such as
software programmable microprocessors (P),
hardwired IP (Intellectual Property) cores,
reconfigurable hardware architectures, etc. as
shown in Figure 1. To program such a system, a
target application is first represented
intermediately as a series of tasks that depends
on each other by a Control and Data Flow
Graph (CDFG) [7], and then partitioned and
mapped onto the heterogeneous computational
and routing resources of the system. Especially,
computation-intensive kernel functions of the
application are mapped onto the reconfigurable
hardware so that they can achieve high
performance approximately equivalent to that
of ASIC while maintaining a degree of
flexibility close to that of DSP processors. By
dynamically reconfiguring hardware,
reconfigurable computing systems allow many
The reconfigurable hardware is generally
classified into the Field Programmable Gate
Array (FPGA) and coarse-grained dynamically
reconfigurable architecture (CGRA). A typical
example of the FPGA-based reconfigurable
SoC is Xilinx Zynq-7000 devices [5]. Generally,
FPGAs support the fine-grained reconfigurable
fabric that can operate and be configured at bit-
level. FPGAs are extremely flexible due to their
higher reconfigurable capability. However, the
FPGAs consume more power and have more
delay and area overhead due to greater quantity
of routing required per configuration [9]. This
limits the capability to apply FPGA to mobile
devices. To overcome the limitation of the
FPGA-like fine-grained reconfigurable devices,
we developed and modeled a coarse-grained
dynamically reconfigurable architecture, called
MUSRA (Multimedia Specific Reconfigurable
Architecture) [10]. The MUSRA is a high-
performance, flexible platform for a domain of
applications in multimedia processing. In
contrast with FPGAs, the MUSRA aims at
reconfiguring and manipulating on the data at
word-level. The MUSRA was proposed to
exploit high data-level parallelism (DLP),
instruction-level parallelism (ILP) and TLP
(Task Level Parallelism) of the computation-
intensive loops of an application. The MUSRA
also supports the capability of dynamic
12
H.K. Nguyen, X.T. Tran / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 2 (2016) 10-22
reconfiguration
by
enabling
the
hardware
The
MUSRA
is
composed
of
a
fabrics to be reconfigured into different
functions even if the system is working.
In this paper, we proposed a solution for
implementing the AES algorithm on the
platform of the MUSRA-based system. The
AES algorithm is firstly analyzed and optimized,
and then HW/SW (Hardware/Software)
partitioned and scheduled to be executed on the
MUSRA-based system. The experimental
results show that our proposal achieves the
throughput of 29.71 instructions per cycle in
average. Our implementation has been
compared to the similar works on ADRES
reconfigurable processor [6], Xilinx Virtex-II
[11], and TI C64+ DSP [3]. Our
implementation is about 6.9 times, 2.2 times,
and 1.6 times better than that of TI C64+ DSP,
Xilinx Virtex-II, and ADRES, respectively.
The rest of the paper is organized as follows.
The MUSRA architecture and the AES
algorithm are presented in Section 2 and
Section 3, respectively. Section 4 presents the
mapping the AES algorithm onto the MUSRA-
based system. In Section 5, simulation results
and the evaluation of the AES algorithm on the
MUSRA-based system in terms of flexibility
and performance are reported and discussed.
Finally, conclusions are given in Section 6.
Reconfigurable Computing Array (RCAs),
Input/Output FIFOs, Global Register File
(GRF), Data/Context memory subsystems, and
DMA (Direct Memory Access) controllers, etc.
(Figure 2). Data/Context memory subsystems
consist of storage blocks and DMA controllers
(i.e. CDMAC and DDMAC). The RCA is an
array of 88 RCs (Reconfigurable Cells) that
can be configured partially to implement
computation-intensive tasks. The input and
output FIFOs are the I/O buffers between the
data memory and the RCA. Each RC can get
the input data from the input FIFO or/and GRF,
and store the results back to the output FIFO.
These FIFOs are all 512-bit in width and 8-row
in depth, and can load/store sixty-four bytes or
thirty-two 16-bit words per cycle. Especially,
the input FIFO can broadcast data to every RC
that has been configured to receive the data
from the input FIFO. This mechanism aims at
exploiting the reusable data between several
iterations. The interconnection between two
neighboring rows of RCs is implemented by a
crossbar switch. Through the crossbar switch,
an RC can get results that come from an
arbitrary RC in the above row of it. The Parser
decodes the configuration information that has
been read from the Context Memory, and then
2.
MUSRAArchitecture
generates the control signals that ensure the
execution of RCA accurately and automatically.
2.1. Architecture Overview
RC (Figure 3) is the basic processing unit of
RCA. Each RC includes a data-path that can
execute
signed/unsigned
fixed-point
8/16-bit
AHB/CGRA Interface
operations with two/three source operands, such
Input DMA
DDMAC
as arithmetic and logical operations, multiplier,
and multimedia application-specific operations
IN_FIFO
GRF
(e.g.
barrel
shift,
shift
and
round,
absolute
Crossbar Switch
differences, etc.). Each RC also includes a local
CDMAC
RC
00
RC
01
RC
07
register called LOR. This register can be used
Context
Memory
Context
Parser
RC
10
Crossbar Switch
RC
11
RC
17
Data
Memory
either to adjust operating cycles of the pipeline
or to store coefficients when a loop is mapped
RC
70
Crossbar Switch
RC
71
RC
77
onto the RCA. A set of configuration registers,
which stores configuration information for the
RCA
IN_FIFO
RC, is called a layer. Each RC contains two
Output DMA
layers that can operate in the ping-pong fashion
to reduce the configuration time.
Figure 2. MUSRA architecture.

An Efficient Implementation of Advanced Encryption Standard on the Coarse-grained Reconfigurable Architecture

Đăng ngày | Thể loại: | Lần tải: 0 | Lần xem: 0 | Page: 13 | FileSize: M | File type: PDF
0 lần xem

An Efficient Implementation of Advanced Encryption Standard on the Coarse-grained Reconfigurable Architecture. This paper presents a flexible, highperformance implementation of the AES algorithm on a coarse-grained reconfigurable architecture, called MUSRA (Multimedia Specific Reconfigurable Architecture). First, we propose a hardware-software partitioning method for mapping the AES algorithm onto the MUSRA. Second, the parallel and pipelining techniques are considered thoughtfully to increase total computing throughput by efficiently utilizing the computing resources of the MUSRA.. Giống những thư viện tài liệu khác được bạn đọc giới thiệu hoặc do sưu tầm lại và chia sẽ lại cho các bạn với mục đích nghiên cứu , chúng tôi không thu phí từ thành viên ,nếu phát hiện tài liệu phi phạm bản quyền hoặc vi phạm pháp luật xin thông báo cho website ,Ngoài tài liệu này, bạn có thể download tiểu luận miễn phí phục vụ học tập Có tài liệu download sai font không xem được, có thể máy tính bạn không hỗ trợ font củ, bạn tải các font .vntime củ về cài sẽ xem được.

Nội dung


1134916

Tài liệu liên quan