Fast Fourier transform (FFT) is an essential component in many digital signal processing and communications systems. The performance of the FFT component is a key factor in evaluating the overall system performance, and it is common to use it as a benchmark for the whole system. Many attempts have been made to enhance the FFT performance, both on algorithm and implementation levels. Software and hardware designs exist to implement this component. In this paper, an optimized hardware implementation of FFT processor on FPGA is presented, where the steps of radix-2 FFT algorithm are well analyzed and an optimized design is developed as a result, with full exploitation of the hardware platform capabilities to achieve optimum performance. The performance results of the proposed design are demonstrated, and compared to other related works and reference designs.