# Design of energy-efficient multiplier based on 3:2 compressor

# Inamul Hussain, Saurabh Chaudhury

Department of Electrical Engineering, National Institute of Technology Silchar, Assam, India

#### **Article Info** ABSTRACT A multiplier circuit is one of the most important functional blocks of many Article history: nano-electronic, control and automation applications. In this work, an energy-Received Jun 29, 2020 efficient multiplier is reported based on a 3:2 compressor. The multiplier has Revised Jul 20, 2020 been designed in three different parts. In the first part, a partial product (PP) Accepted Aug 4, 2020 generator is used. In the second part, the partial products are reduced which is termed as PPP (partial product processing). Whereas in the third step final addition is performed. PPs are produced by using AND gates. The PPP is Keywords: designed in two-phase. In the first phase, the Wallace tree logarithm has been used to reduce the PPs. Whereas, in the second phase the PPs are reduced by CSA using energy-efficient half adder and 3:2 compressor. At last, in the third step, Multiplier by using a carry-save adder final addition has been computed. The Partial Products performance analysis of the designed multiplier is evaluated and compared PDP with other multiplier circuits. The multiplier shows performance PPP improvements by 20.55%-46% for the power supply variation from 1.2 V to 0.6 V. All the simulations and analyses have been carried out by using the Wallace Synopsys EDA tool.

This is an open access article under the <u>CC BY-SA</u> license.



# Corresponding Author:

Inamul Hussain Department of Electrical Engineering National Institute of Technology Silchar Silchar, Assam, India, 788010 Email: ihinamul07@gmail.com

# 1. INTRODUCTION

A multiplier is an essential block of many nano-electronic applications, control and automation applications [1]. When there requires multiplication, a multiplier module (circuit) is required [2]. Thus, it has applications in computer vision, computer-aided design, image processing, DSP processors, MAC, communications systems, filters, IoT applications, VLSI circuits and systems, [3]. So, the speed of operation, power consumption and complexity of these applications depends on the core multiplier modules up to some extent [4, 5]. In the field of nano-electronic research, it is one of the trending research objectives to design low power and high-speed multiplier circuits in nanotechnology for VLSI applications [6]. Low power and high-speed multipliers have designed by adopting different approaches and techniques [7]. Each design has own advantages and disadvantages. In the literature, the architectures and techniques are observed in [8-16]. Different approaches have adopted by either algorithm or new architecture. It has been studied these designs have drawbacks with respect to each other. Three major issues that are observed as high power consumption, low speed of operation and complexity of architecture. Thus, in this work, and energy-efficient Multiplier has been designed by using an algorithm and espousing a new structural module.

The multiplier reported here is of 4x4. It is designed in three steps. In the first step, partial product is generated by computing two 4 bit numbers, in the second step the reduction of the partial products and in the final step fast addition to yield the find products of 4x4 multiplier. In the 1st step AND gates have been used to get the partial products, where partial products are generated by multiplying each bit of a number by each

bit of another number. The PP is reduced by using a Wallace tree algorithm. It is because the Wallace tree algorithm is one of the oldest and superior techniques that have been used to reduce the multiplication complexity [8]. Another reason for widespread popularity is due to the speed of operation [9]. The final step is the fast addition which is done by a carry-save adder (CSA). The main reason to use a carry-save adder is that it is one of the fast adders. The performance of the designed multiplier is compared with two existing Wallace multipliers reported in [5] and [11]. Since for PPP, the Wallace tree algorithm has been used, so the designed multiplier is also known as Wallace multiplier In the rest of the paper, the word multiplier or Wallace multiplier will indicate the same meaning unless quoted. The rest of the structure of this manuscript given as Section 2 explains the detail of the designed multiplier, Section 3 is used to discuss the performance analysis and discussion, whereas the final Section 4 concludes this article followed by references.

# 2. METHODOLOGY

In any MxN multiplication, the first step is the partial product generations which are the results of the products of two numbers M and N. Then the partial product is added up to get the final result. In Wallace multiplier, the process is the same but done in three steps [8, 11, 12]. Initially, the partial products (PP) are produced by multiplying two inputs which are followed by partial product processing (PPP). At last, final addition (FA) is performed by a fast adder [13]. In the PPP, the PPs are categorised in stages by distributing the partial products by using the Wallace tree algorithm. Each stage contains a number of rows which are calculated by as shown in (1) [5], where i<sup>th</sup> is the stage and S<sub>i</sub> gives the number of rows.

$$S(i + 1) = 2(Si/3) + Si \mod 3$$
 (1)

In this work, the multiplier has been designed in three steps as mentioned above, where partial products (PP) are generated by using the inputs, then PP processing (PPP) and finally final addition (FA) by a fast adder. For PP generations AND gates are used. The PPP is designed in two phases. In the first phase, the Wallace tree logarithm has been used to reduce the PPs, whereas, in the second phase the PPs are added by using energy-efficient half adder and 3:2 compressor. Since the proposed multiplier is 4x4 bits, two 4-bits inputs will give 16 numbers of partial products which are reduced by PPP. And the final addition is done once the reduction of partial products is successfully achieved. The details descriptions of three designing steps are given.

#### 2.1. 1st step:

The 1st step is a partial product generation. The partial products are generated by using AND gates. The AND gate used here is of CMOS type, it is because it outperforms other types of AND gates [17]. Since the proposed multiplier is of 4x4 bits, so there will be 16 partial products and thus 16 CMOS AND gates are required to produce the same. Let the two numbers be input1 =a3a2a1a0 and input2=b3b2b1b0. So, the partial products will be p00=a0b0, p01=b0a1, p02=b0a2, p03=b0a3, p10=b1a0, p11=b1a1, p12=b1a2, p13=b1a3, p20=b2a0, p21=b2a1; p22=b2a2, p23=b2a3, p30=b3a0, p31=b3a1, p32=b3a2 and p33=b3a3. The same has been shown in Figure 1. Similarly, to design an NxN multiplier, there will be N2 partial products thus required the same number of AND gates.



Figure 1. All the partial products of two numbers input1 and input2

#### 2.2. 2nd Step:

The 2nd step is partial product reduction (PPP). Here, the partial products are reduced by using the Wallace algorithm, where the PPs are rearranged in a fashion of a tree-like parallel structure. It is achieved by categorised the above rows in stages by the Wallace tree algorithm. Until the last stage contains two rows of partial products, the algorithm has been repeated. As shown in (1) has been used for this purpose. A flow chart for the same has been shown in Figure 2.



Figure 2. Flowchart for Wallace multiplier

For more illustration how PPP (partial product processing) has been done is shown in Figure 3. It is seen that in 1st stage it contains 4 rows, 2nd stage contains 3 rows but the 3rd stage contains only two rows of PPs. Thus, the algorithm will stop at the 3rd stage i.e., it will be the final stage. The partial products of each stage have been computed by using half adders and 3:2 compressor except the final stage. The reason behind using compressors is they are energy efficient for multiplications [17-22]. If there are two rows of PP half adder (HA) is used, whereas for 3 rows PP 3:2 compressor is used. The compressor can reduce three rows of partial products into two. The HA used here is reported in [18], whereas, the 3:2 compressor is designed by using the architecture mentioned in [17] and [23].



Figure 3. PPP by wallace tree algorithm (black dots indicate PP)

# 2.3. 3<sup>rd</sup> step:

The third step is for fast addition. In the PPP, when any stage contains only two rows, the Wallace algorithm stops there, i.e., there won't be any further stage(s). As shown in Figure 3, stage 3 contains only two rows of partial products, thus it will be the final stage. So, an addition is required to process these two rows of partial products. Since, it the last stage, so fast addition is required to improve the performance of the multiplier that will ensure less delay. Thus, fast adders are used in this step. The fast adder used here is a 4-bit carry-save adder (CSA), as CSA is one of the fastest adders [9].

#### 3. RESULTS AND DISCUSSION

The designed Wallace multiplier is the combination of three steps discussed above. The performances are evaluated by simulating the multiplier by using the Synopsis EDA tool at room temperature. The technology node used here is 90 nm CMOS pdk technology. The multiplier is the combination of step1, step2 and step3 mentioned above. The results have been shown in Table 1. The parameters such as power, delay, power-delay (PDP) and energy-delay product (EDP) have been calculated. The performance than compared with conventional [5] and Hussain [11]. It is observed that though the conventional Wallace multiplier has the lowest power consumption, the proposed design has the best delay, PDP and EDP. The delay has been minimised by using the 3:2 compressor and CSA. Since it has the lowest delay and moderate power consumption than the other Wallace multiplier taken for consideration, it has the best PDP and EDP. Thus it could be commented that the designed multiplier is the best energy efficient multiplier as compared to conventional [5] and Hussain [11]. The effects of delay, power, PDP and EDP are also observed by varying the power supply.

Table 1. Simulation results with 90nm CMOS technology

| Parameters | Conventional [6] | Hussain [12] | Proposed |
|------------|------------------|--------------|----------|
| Power (µW) | 6.75             | 7.12         | 7.49     |
| Delay (ns) | 25               | 20.7         | 17.9     |
| PDP (fJ)   | 168.75           | 147.38       | 134.07   |
| EDP (zJs)  | 4.22             | 3.05         | 2.4      |

To comment on the performance of the multiplier, it is simulated at 32 nm CMOS technology to evaluate the performance parameters. The results have been shown in Table 2. It is found that the designed multiplier has better performances as compared to the other two multipliers. Though the conventional Wallace multiplier consumes least power, but the proposed has best delay, PDP and EDP as compared to [5] and [11]. From the results, it is clear that the designed multiplier has the best EDP i.e. it is energy-efficient. It is achieved by using the algorithm and the structural optimisation in the design. The structural optimisation has been achieved by using the proper circuit module. The low power compressor and HA that is used in PPP with CSA which is used in the Final addition speed up the operation. To establish the validate of findings and significance of the results, power, delay, PDP and EDP analysis has been carried out against power supply. The same has been discussed below.

Table 2. Simulation results with 32nm CMOS technology

| Parameters | Conventional [5] | Hussain [11] | Proposed |
|------------|------------------|--------------|----------|
| Power (nW) | 435.05           | 499.25       | 510.11   |
| Delay (ns) | 21               | 18.47        | 16.97    |
| PDP (fJ)   | 9.136            | 9.221        | 8.657    |
| EDP (zJs)  | 0.192            | 0.170        | 0.146    |

## **3.1.** Power analysis against power supply

Power is one of the most important considerable parameters of any digital circuits and systems. Hence, the total power has been calculated and its effect has been also observed by varying the supply voltage from 0.6 V to 1.2 V. The lowest power supply considered here is 0.6 V because of the minimum threshold voltage required for 90nm CMOS technology. Whereas, 1.2 V is the highest voltage level considered here as per the ITRS roadmap. The total power consumption reported here is the summation of static (when the circuit is on steady-state) and dynamic (when the circuit is in transition state). The results have been shown in Table 3. A comparison graph by varying the input power supply is shown in Figure 4.

| Table 3. Power (µW | ) analysis | against po | ower supply (V) |
|--------------------|------------|------------|-----------------|
|--------------------|------------|------------|-----------------|

| Power supply (V) | Conventional [5] | Hussain [11] | Proposed |
|------------------|------------------|--------------|----------|
| 1.2              | 6.75             | 7.12         | 7.49     |
| 1                | 5.69             | 5.94         | 6.05     |
| 0.8              | 5.03             | 4.74         | 4.65     |
| 0.6              | 4.684            | 4.1          | 3.99     |



Figure 4. Power (µW) Vs variation of power supply (V)

# 3.2. Delay analysis against power supply:

Speed is another most important performance evaluating parameters of modern nano-electronic applications. So, the delay has been calculated and its effects against power supply are also observed. The delay is determined when the input reached one-half of the power supply voltage level (50% Vdd) and the latest output signal reached the same voltage level. Thus the worst-case delay has been recorded. The delay is calculated at 0.6 V, 0.8 V, 1 V and 1.2 V respectively and the same has been listed in Table 4. The effects of delay with input power supply variations are shown in Figure 5. It is observed that the performance of delay is best in the case of the designed multiplier because of architecture. The compressor used in the partial product processing unit reduced the data flow path and eventually speed up the operation. Another reason is the use of a carry-save adder at the final step, where operation starts without the carry bit (which is an inherited property of CSA), thus reduce the delay.



Figure 5. Delay (ns) Vs variation of power supply (V)

#### 3.3. PDP and EDP analysis against power supply:

In digital circuits and systems for nano-electronic applications delay and power can't define the performance of the circuits. Thus power-delay product is calculated which is also known as a Figure of merit that indicates the energy efficiency of the circuits or systems. On the other hand, low PDP circuits may also

perform slowly, so energy-delay product (EDP) is another metric that is used to evaluate the performance. Thus, PDP and EDP are calculated. By varying the voltage supply from 0.6V to 1.2V the same has been also evaluated. The PDP and EDP analysis against power supply are shown in Tables 5 and 6 respectively. Comparison graphs for PDP and EDP against power supply are shown in Figures 6 and 7 respectively. It is seen that the proposed multiplier dominates in both cases. So, it could be commented that the designed multiplier is energy-efficient for VLSI circuits and systems.

| Table 5. PDP (fJ) analysis against power supply |                  |              |          |  |
|-------------------------------------------------|------------------|--------------|----------|--|
| Power Supply (V)                                | Conventional [5] | Hussain [11] | Proposed |  |
| 12                                              | 168 75           | 147 38       | 134.07   |  |

|    | - • · · · · · · · · · · · · · · · · · · |        |        |        |
|----|-----------------------------------------|--------|--------|--------|
|    | 1.2                                     | 168.75 | 147.38 | 134.07 |
|    | 1                                       | 157.61 | 125.93 | 121.61 |
|    | 0.8                                     | 164.98 | 119.92 | 98.58  |
|    | 0.6                                     | 179.91 | 117.75 | 96.96  |
| 12 |                                         |        |        |        |

Table 6. EDP (zJs) analysis against power supply Conventional [5] Hussain [11] Power Supply (V) Proposed 1.2 4.22 3.05 2.4 4.37 2.44 2.67 1 0.8 5.41 3.03 2.09 0.6 6.91 3.38 2.36



Figure 6. PDP (fJ) Vs variation of power supply (V)

Figure 7. EDP (zJs) Vs variation of power supply (V)

# 4. CONCLUSION

In this article, a 4x4 multiplier is reported which is designed in three steps for nano-electronics, control and automation applications applications. The 1st step is the designing of partial product (PP) generation by using AND gates. The 2nd step is PP processing (PPP). The PPP is designed in two phases. In the first phase, the Wallace tree logarithm has been used to reduce the PPs, whereas, in the second phase the partial products are computed by using energy-efficient half adder and 3:2 compressor. If there are two rows of PP half adder is used, whereas for 3 rows PP 3:2 compressor. In the 3rd step, the final addition has been done by using a carry-save adder (CSA) which fastens the overall operation. The Multiplier has been simulated by using Synopsys tool with 90nm CMOS pdk technology. The performance metrics such as power, delay, PDP and EDP of the multiplier are computed and compared with the other two multipliers. The effects of power, delay, PDP and EDP are also observed by varying the input power supply. It is witnessed, the proposed Wallace multiplier has best performances as compared to other multipliers taken for consideration in terms of delay, EDP and PDP. Thus, the multiplier is energy-efficient and could be a possible alternative for future nano-electronics, control and automation applications.

#### REFERENCES

- R. Shanmuganathan, K. Brindhadevi, "Comparative analysis of various types of multipliers for effective low power," *Microelectronic Engineering*, vol. 214, pp. 28-37, 2019.
- [2] Archana Rani, Naresh Grover, "An Enhanced FPGA based asynchronous microprocessor design using VIVADO and ISIM," *Bulletin of Electrical Engineering and Informatics*, vol. 7, no. 2, pp. 199-208, 2018.
- [3] I. Hussain, S Chaudhury, "Performance comparison of 1-bit conventional and hybrid full adder circuits," *Advances in Communication, Devices and Networking*, vol 462, pp. 43-50, 2018.
- [4] Sheikh Tanzim Meraj, Nor Zaihar Yahaya, Kamrul Hasan, Ammar Masaoud, "Single-phase 21-level hybrid multilevel inverter with reduced power components employing low frequency modulation technique," *International Journal of Power Electronics and Drive System (IJPEDS)*, vol 11, no 2, pp. 810-822, 2020.
- [5] C. S. Wallace, "A Suggestion for a Fast Multiplier," *IEEE Transactions on Electronic Computers*, vol. EC-13, no. 1, pp. 14-17, Feb 1964, doi: 10.1109/PGEC.1964.263830.
- [6] Ansiya Eshack, S. Krishnakuma, "Pipelined vedic multiplier with manifold adder complexity levels," *International Journal of Electrical and Computer Engineering (IJECE)*, vol. 10, no. 3, pp. 2951~2958, 2020.
- [7] I. Hussain, R. K. Sah, M. Kumar "Performance Comparison of Wallace Multiplier Architectures," *International Journal of Innovative Research in Science, Engineering and Technology*, vol. 4, no. 1, pp. 18729-18734, 2015.
- [8] S. Kakde, S. Khan, P. Dakhole, S. Badwaik, "Design of area and power aware reduced Complexity Wallace Tree multiplier," 2015 International Conference on Pervasive Computing ICPC, Pune, India, pp. 1-6, 2015, doi: 10.1109/PERVASIVE.2015.7087207.
- [9] I. Hussain, M. Kumar, "A fast and reduced complexity wallace tree multiplier," *Journal of Active and Passive Electronic Devices*, vol.12, pp.63-71, 2017.
- [10] G. C. Ram, D. S. Rani, R. Balasaikesava, K. B. Sindhuri, "Design of delay efficient modified 16 bit Wallace multiplier," 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology RTEICT, 2016, pp. 1887-1891, doi: 10.1109/RTEICT.2016.7808163.
- [11] I. Hussain, C. K. Pandey, S. Chaudhury, "Design and Analysis of High Performance Multiplier Circuit," *Devices for Integrated Circuit DevICE*, pp. 245-247, 2019, doi: 10.1109/DEVIC.2019.8783322.
- [12] A. A. AlJuffri et al., "ASIC realization and performance evaluation of scalable microprogrammed FIR filters using Wallace tree and Vedic multipliers," 2015 IEEE 15th International Conference on Environment and Electrical Engineering EEEIC, pp. 1995-1998, 2015, doi: 10.1109/EEEIC.2015.7165480.
- [13] S. Asif, Y. Kong, "Analysis of different architectures of counter based Wallace multipliers," *Tenth International Conference on Computer Engineering & Systems ICCES*, pp. 139-144, 2015, doi: 10.1109/ICCES.2015.7393034.
- [14] S. Khan, S. Kakde, Y. Suryawanshi, "VLSI implementation of reduced complexity wallace multiplier using energy efficient CMOS full adder," 2013 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1-4, 2013, doi: 10.1109/ICCIC.2013.6724141.
- [15] S. Asif, Yinan Kong, "Performance analysis of Wallace and radix-4 Booth-Wallace multipliers," 2015 Electronic System Level Synthesis Conference ESLsyn, pp. 17-22, 2015.
- [16] I. Hussain, A. Singh, S. Chaudhury, "A review on the effects of technology on CMOS and CPL logic style on performance, speed and power dissipation," 2018 IEEE Electron Devices Kolkata Conference EDKCON, pp. 332-336, 2018, doi: 10.1109/EDKCON.2018.8770506.
- [17] Chip-Hong Chang, Jiangmin Gu, Mingyan Zhang, "Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 51, no. 10, pp. 1985-1997, Oct 2004, doi: 10.1109/TCSI.2004.835683.
- [18] I. Hussain, M. Kumar, "Design and performance analysis of a 3-2 compressor by using improved architecture," *Journal of Active and Passive Electronic Devices*, vol. 12, pp.173-181, 2017.
- [19] R. Nirlakalla, T. S. Rao, T. J. Prasad, "Performance evaluation of high speed compressors for high speed multipliers," *Serbian Journal of Electrical Engineering*, vol. 8, no. 3, pp. 293-306, 2011.
- [20] I. Hussain, S. Chaudhury, "A new 4-2 compressor for VLSI Circuits and Systems," International Conference on Frontiers in Smart System Technologies FSST, pp. 409-414, 2019.
- [21] M. Dorojevets, A. K. Kasperek, N. Yoshikawa, A. Fujimaki, "20-GHz 8 x 8-bit parallel carry-save pipelined RSFQ multiplier," *IEEE Transactions on Applied Superconductivity*, vol. 23, no. 3, pp. 1300104-1300104, June 2013, doi: 10.1109/TASC.2012.2227648.
- [22] K. Prasad, K. K. Parhi, "Low-power 4-2 and 5-2 compressors," Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256), pp. 129-133 vol. 1, 2001, doi: 10.1109/ACSSC.2001.986892.
- [23] I. Hussain, S. Chaudhury, "CNFET based low power full adder circuit for VLSI applications," Nanoscience & Nanotechnology-Asia, vol. 10, no. 3, pp. 286-291, 2020.

# **BIOGRAPHIES OF AUTHORS**



**Inamul Hussain** completed his Bachelor in Technology in the Stream Electronics and Communication Engineering. He has completed his Master in Technology in VLSI Design. He is currently working towards his Ph.D. in the Department of Electrical Engineering, National Institute of Technology Silchar (an institute of national importance), India. His research area includes low power VLSI, CMOS and CNTFET-based circuits and systems design, communication systems. He has publications in reputed journals and conferences. He is also acting as a reviewer for many reputed journals such as IET Micro-Nano Letter, IET Circuits, Devices & Systems, Bentham sciences journals, BEEI, reputed IEEE conferences. He also served on the technical program committee in different journals and international conferences. Mr. Hussain is the youngest son of Siddique Hussain (a retired teacher) and Rukiya Khanam.



**Saurabh Chaudhury** is a Professor associated with the Department of Electrical Engineering, National Institute of Technology Silchar (an institute of national importance), and India. He has obtained his M. Tech and Ph.D. from the IIT Kharagpur in the field of Microelectronics and VLSI Design, in 2001 and 2009 respectively, through QIP Program, Government of India. He has always strived for knowledge and research and dreams big. He is working in the domain of low power VLSI design, synthesis, leakage minimisation, CNTs and image processing. He has completed one AICTE (RPS) Project worth Rs.8 lacs, as the Principal Investigator and reviewed many papers in technical journals including IEEE, IET and international conferences. He is a senior member of the IEEE and supervised Ph.D. and M. Tech (PG) scholars.