Optimization of Area and Power Consumption in Carry Select Adder by Using BEC

N. Mageshwari  
As-Salam College of Engineering and Technology  
Aduthurai, Tamilnadu, India  
magenatarajan@gmail.com

N. Sasipriya  
As-Salam College of Engineering and Technology  
Aduthurai, Tamilnadu, India  
sasipriyanagappan@gmail.com

R. Ramya  
As-Salam College of Engineering and Technology  
Aduthurai, Tamilnadu, India  
chanmeya.17@gmail.com

Abstract: In this paper, a high-performance adder is designed for low power application. Carry Select Adder (CSLA) is known to be the fastest adder among the conventional adder structures. It is used in many data processing units for realizing faster arithmetic operations. From the structure of the CSLA, it is clear that there is scope for reducing power consumption and delay in the CSLA. This work uses a simple and efficient gate-level modification to significantly reduce the power and delay of the CSLA. Based on this modification, 16-bit CSLA architecture has been developed and compared with the regular CSLA architecture. The results analysis shows that the proposed CSLA structure is better than the regular CSLA.

Keywords: Application-Specific Integrated Circuit (ASIC), Conventional Adder, CSLA, Low power, FPGA.

I. INTRODUCTION

Power dissipation is one of the most important design objectives in integrated circuits. Designers of next-generation systems want to integrate more features and get higher performance within the same or smaller area and power budget. Furthermore, low power dissipation during test application is becoming increasingly important in today's VLSI systems design and a major goal in the future development of VLSI design [1] – [3]. The design of high-speed and low-power VLSI architectures needs efficient arithmetic processing units, which are optimized for the performance parameters namely, Delay and power consumption. Towards this end, high-speed, low-power and area efficient addition and multiplication has always been a fundamental requirement of high-performance processors and systems. As adders are the most widely used components in VLSI circuits. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. The major speed limitation of adders arises from the huge carry propagation delay encountered in the conventional adder circuits. The CSLA is used in many computational systems to alleviate the problem of carrying propagation delay by independently generating multiple carries and then select a carry to generate the sum [4].

In CSLA, multiple pairs of Ripple Carry Adders (RCA) are used to generate partial sum and carry by considering carry input Cin = 0 and Cin = 1 then the final sum and carry are selected by the multiplexers (mux). The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin = 1 in the regular CSLA and the variable sized block has modified as uniformly sized block to achieve high speed and low power consumption [5]–[8]. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. The details of the BEC logic are discussed in Section IV.

This brief is structured as follows. A literature review is discussed in section II. Section III contains with the delay and area evaluation methodology of CSLA blocks. Section IV presents the proposed work off in this paper. The Carry Select Adder (CSLA) has been chosen for comparison with the proposed design as it has a more balanced delay, and requires lower power[8], [10]. The ASIC implementation details and results are analyzed in Section V. Finally, the work is concluded in Section VI.

II. LITERATURE REVIEW

The adder system described increases the speed of the addition process by reducing the carry-propagation time to the minimum commensurate with economical circuit design. The problem of carrying propagation delay is overcome by independently generating multiple radix carries and using these carriers to select between simultaneously generated sums in this adder system, the addend and augend are divided into sub added and sub augend sections that are added twice to
produce two sub-sums [4]. One addition is done with a carry digit forced into each section, and the other addition combines the operands without the forced carry digit. This paper describes a fast digital adder that derives its speed from a complex logical structure without requiring an excessive amount of additional hardware. Addend, augend and true sum digits are designated by A, B, and S respectively followed by a subscript to indicate a digital position.

![Fig. 1 16 – BIT SQRT Carry Select Adder](image)

Carries are indicated by C and a subscript to indicate the digital position from which carry is generated. A carry save adder is used to compute the sum of three or more bits in binary format. It is widely used in the final stages of fast multipliers for summing the partial products to give out the final value [5], [11].

The advantage of carrying save adder is that the sum is computed faster than the conventional RCA. The carry save adder is better than the conventional carry select adder, in terms of the area while slower than carry select adder. The internal logic schematic of a carry select adder constructed using the conventional variable sized block ripple carries adder (RCA). Each block is divided into 2-2-3-4-5 bit size block. The RCA uses multiple full adders to perform an addition operation. As shown in Fig. 1. The speed of operation was slow and also consumes more power in [9].

### III. DELAY AND AREA EVALUATION METHODOLOGY OF CSLA BLOCKS

The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder in (HA), and FA are evaluated and listed in [9].

The group2 (Fig. 2(a)) has one 4-b RCA which has 3 FA and 1 HA for Cin = 0. Instead of another 4-b RCA with Cin = 1 a 5-b BEC is used which adds one to the output from 4-b RCA. Delay values are obtained through AOI implementation [9]. The arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC’s. Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay.

The area count of group2 is determined as follows:

\[
\text{Gate count} = 86 (FA + HA + MUX + BEC) \\
FA = 39 (3*13) \ HA = 6 (1*6) \\
\text{AND} = 1 \\
\text{NOT} = 1 \\
\text{XOR} = 20 (4*5) \\
\text{MUX} = 20 (5*4)
\]

![Fig. 2a Delay and Area Evaluation of Group2-MCSLA](image)
Similarly, the estimated maximum delay and area of the other groups of the modified CSLA are evaluated and listed in Table I.

<table>
<thead>
<tr>
<th>Group</th>
<th>Delay</th>
<th>Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>Group 1</td>
<td>11</td>
<td>97</td>
</tr>
<tr>
<td>Group 2</td>
<td>19</td>
<td>86</td>
</tr>
<tr>
<td>Group 3</td>
<td>22</td>
<td>86</td>
</tr>
<tr>
<td>Group 4</td>
<td>77</td>
<td>86</td>
</tr>
</tbody>
</table>

From above table, it is clear that the modified CSLA saves up to 102 gates than the regular CSLA, with decreasing in gate delays. To further evaluate the performance, it had resorted to ASIC implementation and simulation.

IV. OUR CONTRIBUTIONS

In this paper, our contribution is replace BEC instead of the RCA with constant carry cin=1 and the variable block size is modified as fixed block size. Then, the proposed CSLA is compared with the regular SQRT carry select adder. The results prove that our adder is advantageous than the conventional adder in terms of speed and low power consumption. Hence, this makes it a good choice to replace the carry select adder structure, in the final stages of fast multipliers. This improves the speed of operation of the high-performance VLSI circuits.

A. Binary Excess Conversion

As stated above the main idea of this work is to use BEC instead of the RCA with Cin = 1 in order to reduce the area and power consumption of the regular SQRT CSLA. To replace the n-bit RCA, a n+1 -bit BEC is required. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with a large number of bits are designed. The Boolean expressions of the 4-bit BEC are listed as below (note the functional symbols ~ NOT, & AND, ^
XOR). As shown in Fig. 3, it can be seen that the carry out (C0) of the block is calculated in parallel along with X2 by using a parallel chain of AND gates, whereas a series pattern of carrying propagation is used in RCA structure, which reduces the delay of incrementing in CSLA when compared with the conventional RCA.

\[
\begin{align*}
\text{X0} &= \sim B0 \\
\text{X1} &= B0 \oplus B1 \\
\text{X2} &= B2 \oplus (B0 \& B1) \\
\text{X3} &= B3 \oplus (B0 \& B1 \& B2) \\
\text{C1} &= B3 \& B2 \\
\text{C2} &= B0 \& B1 \\
\text{C0} &= C1 \& C2
\end{align*}
\]

Fig. 3 4-BIT Block of BEC

B. Uniform Sized Block

Another one modification have been done in this paper, that is variable sized block are modified as uniform Sized block. Fig. 4 shows our contribution of 16-bit carries select adder, which equally divides the word size of the adder into blocks of 4-bit each. The least significant 4-bits are added using conventional RCA, while other blocks are added in parallel along with the given BEC. Once all the interim sums and carries are calculated, the final sums are computed using multiplexers having minimal delay. The multiplexer block receives the two sets of 5-bit input (four sum bits and one carry bit each) and selects the final sum based on the select input from the previous stage. Use of the basic unit with the 10-to-5 multiplexer thus achieves fast incrementing action with reduced device count. Thus, the proposed CSLA excels the regular CSLA circuit in terms of speed by reducing the carry propagation latency.

ASIC IMPLEMENTATION RESULTS

The design proposed in this paper has been developed using Verilog- HDL and synthesized in Quartus II 8.1 Web Edition [12]. Table II exhibits the simulation results of both the conventional and proposed adder structures in terms of delay, area, and power. The area indicates that total cell area of the design and the total power is the sum of dynamic power, internal power, net power and leakage power. The delay is the critical path delay of the adder circuits. Fig. 5 shows that the proposed CSLA has low power and also less delay when compared with regular SQRT CSLA.

CONCLUSION

16-bit CSLA architecture is designed for reducing the power and improving speed by making a gate level modification and uniform block size. The proposed structure proves to be an easier solution for improving the speed of carrying select adder. The proposed unit is also found to consume less power. The proposed carry select adder can be used to speed up the final addition in parallel multiplier circuits and other architectures which use adder circuits. The structure has been synthesized with Quartus II 8.1 Web Edition. This shows that the design can be very well incorporated into complex VLSI Designs and Real time processor in order to increase the operating speed of the circuits.
TABLE II
COMPARISON OF REGULAR SQRT CSLA AND MODIFIED CSLA.

<table>
<thead>
<tr>
<th>Word size</th>
<th>Adder</th>
<th>Area (µm²)</th>
<th>Delay(ns)</th>
<th>Power (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Regular SQRT CSLA</td>
<td>32</td>
<td>23.026</td>
<td>6.24</td>
<td></td>
</tr>
<tr>
<td>Modified CSLA</td>
<td>50</td>
<td>16.06</td>
<td>6.18</td>
<td></td>
</tr>
</tbody>
</table>

Fig. 5 Comparison of Performance factors of CSLA

REFERENCES