Reducing the Computational Complexity Of A 2d Gaussian Filter for Image Processing (An Overview)

Deepak Raj  
Department of Electronics and Communication  
DCRUST Murthal  
Insa.deepakraj@gmail.com

Dr. Poonam Singal  
Department of Electronics and Communication  
DCRUST Murthal  
singal.poonam@rediffmail.com

Abstract - 2D Gaussian filter is one of the very useful techniques in image processing, this technique is very useful especially in image smoothing. Basically, the implementation of 2D Gaussian filter needs heavy computational resources, when this type of technique comes down to real-time applications, efficiency in the implementation is vital. An obstacle for this is floating-point math representation, as it requires a heavy amount of computational power to achieve real-time image processing. On the other hand, a fixed-point approach is more satisfactory. By using fixed-point arithmetic, we increase the speed as well as efficiency in many ways. We also reduce the area of hardware by reducing the LUTs.

Keywords - computational cost, Verilog hardware, LUT, Gaussian smoothing filter, FPGA

I. INTRODUCTION

The fields such as medicine, astronomy, geography etc require results in real-time efficiency in the implementation of digital image processing is all important, as part of literature review, an example of convolution using FPGA was developed in [1] using [3*3] Sobel filter on a [15*15] images. New series of filters are developed at the hardware level for image smoothing and processing (edges detection, sharpen operation, enhance intensity operation and brightness adjustment), in order to improve the sharpness and intensity of images and help in diagnosis the medical specialists. By using HDL for image processing is a quite new approach extending the field of digital design on reconfigurable circuits to digital image processing using very large scale integration (VLSI) technologies [2].

The acquisition of the image is done pixel by pixel and uses two FIFO dual port RAM memories to store the 3 lines of the images. The Gaussian filter was modified using bit quantities for the kernel representation. We also compress the data or the reduction of the elements of matrix which plays the vital role in the modern world, because data compression [3] is a technique by which we can store a large amount of data in a limited storage device. Data compression or reduction of elements can be used in mass communication, online data storage and real-time data storing [3].

II. METHODOLOGY

A. Gauss Filter

The gauss filter is a 2D convolution operation which is used to smooth images and remove noise. The convolution of an image w(x,y) of size m*n with a function f(x,y), denote by w(x,y)*f(x,y), is given by (1) [4].

\[ w(x,y) * f(x,y) = \sum_{s=-a}^{a} \sum_{t=-b}^{b} w(s,t) f(x-s,y-t) \]  

(1)

where a and b are given by \( a = m/2 \) and \( b = n/2 \). Graphical representation of (1) is illustrated in figure (1). The Gaussian filter has two parameters, the dimension of the window and the standard deviation. If \( \sigma \) is large the image smoothing effect will be large.

One of the applications of gauss filter is to remove the “salt and pepper noise”. Gaussian smoothing filters are very effective low-pass filters from the view point of both the spatial and frequency domain. Standard deviation of gauss filter plays an important role in its behavior. The Gaussian function is used in many research areas such as it is used as a smoothing operator, also used in image matrix mathematics. This function is symmetric always and never becomes to zero.
Fig 1. A window filter. The shaded pixels represent the input window that produce the filtered value in the output image. Each possible window position generates a corresponding pixel value in the output image [5].

B. Field programmable gate array (FPGA)

Field programmable gate arrays (FPGA) are semiconductor devices that configurable logic blocks (CLBs) combines with the help of programmable interconnects. FPGAs can be programmed again to desired application or functionality requirements [6]. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used in application-specified integrated circuit (ASIC), which are custom manufactured for specific design task. Although some dominant one-time programmable FPGAs are available for example the SRAM based FPGAs, which can be programmed as

![FPGA internal architecture](image)

Due to the programmable nature, FPGAs are an ideal fit for many different markets. As the industry leaders, Xilinx provides comprehensive solution made up of FPGA devices, advanced software, configurable and ready to use IP cores for market and applications such as aerospace and defense, metal detection, medical science, radio astronomy, ASIC prototyping, software defined radio and cryptography etc.

C. Image smoothing

Smoothing of an image is a method of tuning parameter which is used to control the extent of smoothing. The aim of smoothing is to remove out noise or other rapid phenomena. In smoothing the data point of image are partially change. Smoothing would be used in two important causes that can support in data investigation. First one by being able to withdraw more quantity of information from the signal and another one by being able to provide analyses that are both flexible and robust be used different algorithm. The Gaussian smoothing operator is use to blur the image and remove the unwanted details and the noise. It is widely used in graphical software.

D. Kernel Quantization

The implementation of Gaussian filter in FPGA was done by VHDL description language. The Gaussian filter is composed by addition and multiplication process between the image and the kernel, where the image is represented by the matrix with value from 0 to 255 (8 bits). The kernel is like a simple square matrix (values lies between 0 and 1). The kernel must also be represented by a number of bits that can be implemented in the FPGA. In order to validate the result obtained from the FPGA processing. With the help of a quantizer function a MATLAB simulation was developed to explain double precision numbers as floating-point or fixed-point binary representation.

The algorithm [7] developed is applied to detect and remove the Gaussian noise. The standard images like coins, tress and Lenna for various means, i.e. 0, 0.01 and 0.03 and variances in the range of 0.01 to 0.05 for all means are used for performance evaluation [7].
III. HARDWARE IMPLEMENTATION

1. Image filtering process

A grayscale image is represented with the help of a matrix of pixels with values ranging from 0 to 255. Sending a [512*512] image to the FPGA requires converting that image into a vector of 2 612 444 elements as shown in figure 4.[1] where data is the pixel value and ADDR is the memory address of each pixel, respectively. To code this in VHDL the values of the pixels are represented with 8 bits, being the color black 0*00 and the color with 0*FF. The convolution of an image with a [3*3] kernel requires extracting each [3*3] sub-image from the original image, from left to right, and top to bottom.

![Fig.4 conversion of an image matrix to an image vector](image)

The input and output buffers are implemented on the FPGA. The matrix vector multiplication involves multiply and accumulated operation. For efficient implementation and maximum speed-up, integer arithmetic unit is used [8].

2. Convolution operation

Convolution has been widely used in computer science and image processing, including object recognition and image matching. Software implementation is faster than the hardware implementation in image processing. Thus using the FPGA we are able to process the filtering at the same time of reading the image. That’s why the implementation of two-dimensional convolution on a Xilinx virtexV FPGA platform state machine. We implemented Gaussian filter with different sigma values [9]. As we know, smoothing can be effected by convolve the original image I (x,y) of the size h x w with a Gaussian mask G(x,y) as shown in equation 2. It is obtained by computing the sum of products among the input image and a smaller Gaussian matrix of the size (3*3). A 2D convolution using a 3*3 mask and 3*3 input as shown in figure 5.[9]

\[
f(x,y) = \sum_{i=0}^{h-1} \sum_{j=0}^{w-1} G(i,j) I(x-i, y-j)
\]

(2)

![Fig 5. Convolution operation](image)
Same as cross-correlation, except that the kernel is “flipped” (horizontally and vertically). Convolution is commutative and associative.[10] Sobel edge detectors work.

The main advantages of using MEX-files over plain m-files are the fact that loops, and other operations that MATLAB struggles with should be much faster when written in a compiled language than in native MATLAB, and that existing programs can easily be ported to run from within MATLAB [11].

**FUTURE SCOPE**

The main objective of this work is to reduce the computational complexity of a 2D Gaussian filter. By reducing the elements or Look Up Tables (LUTs) we directly reduces the area of hardware so by this we increase the speed of image processing as well as reducing the cost of large hardware. For this we need to know about compression between LUTs [12]. As a study case, the LUT-based interpolator was used as the core of a programmable noise generator able to output signals with different Probability Density Functions (PDFs)[13].

By this process of reducing the LUTs we increase the future scope of image processing by Gaussian filter. In industries, aerospace, medical purpose or many other places we use this technique of computation.

**CONCLUSION**

As we study that the computational complexity in a floating-point is more than in fixed-point, so here we work on fixed-point and for that purpose we use MATLAB and Xilinx Vivado tool. The performance of Gaussian filter is much faster when using an FPGA compared to CPU, or GPU. The main reason behind this is that FPGA has capability to process the filtering while reading the image at the same time.

This paper presents the design and implementation for Gaussian filters of different kernel sizes, and it was observed that it is not recommended to use floating point arithmetic for the FPGA implementation as it uses many resources of the FPGA. All of the modules were designed and implemented on a Xilinx Spartan 6 FPGA.

If the kernel size increases, the use of LUTs and slice resisters will increase by internal implementation of FIFO memories.

**REFERENCES**