Research on high performance signal processing common platform

0 Preface

Array signal processing technology is rapidly developing in the field of signal processing, and various new algorithms and new processing technologies are emerging. It requires signal processing systems to quickly adapt to various new algorithms and technologies, using traditional hardware-based hardware. The signal processing system developed by the design method cannot meet such requirements. The development of a versatile computing platform, as far as possible through the software to achieve signal processing functions, has become a new trend in signal processing, "software radar", "software radio" and other concepts are based on this idea.

Through flexible software programming to adapt to the changes of the algorithm, through simple hardware expansion to adapt to changes in scale, the flexibility of the system is greatly improved, and the development cycle and cost are greatly reduced. To support this design idea, a signal processing module that constitutes a general-purpose computing platform must be developed. This module can meet the real-time processing requirements of the system, and has the versatility and scalability.

1 system structure

With the rapid advancement of microelectronics technology, the speed of processors continues to increase, but the demand for computing power in practical applications is far beyond the range that a single processor can provide. The use of parallel processing technology to form a multi-processor system meets the needs. The application of computing power is an effective technical approach.

The purpose of parallel processing is to speed up the entire calculation process by using multiple processing units to simultaneously process tasks, thereby reducing task execution time. The entire task can be broken down into small tasks that are assigned to each processing unit in the parallel processing system for execution. In general, these parallel execution tasks cannot be performed completely independently. The calculation in one task may require the data in another task, and there is a requirement for data exchange between each processing unit. The time that must be waited for by exchanging data reflects the synchronization overhead between processing units. Therefore, it is not difficult to see that parallel processing additionally increases the overhead of data communication and synchronization waiting.

In order to reduce the task execution time and increase the number of processing units, it is the primary means, and the task should be more fine-grained to increase the parallelism of the task, but the total traffic will be increased while increasing the processing unit and task granularity. The increase, coupled with the idle latency caused by the synchronization time and the uneven allocation of tasks, increases the number of processing units to increase the system processing power is not worth the loss. This makes it important to consider two aspects when designing a parallel processing system: improved processing unit performance and improved communication technology between processing units.

1.1 Processing unit selection

In the communication, voice, and image processing, the dynamic range of the signal is limited. Generally, the fixed-point operation can meet the requirements. The radar and sonar signals require a large data dynamic range and data precision. If the fixed-point processing is performed, data overflow or lower will occur. Overflow, severe processing will not be possible. If you use shift scaling or use fixed-point simulation floating-point operations, the execution speed of the program will be greatly reduced. To enhance the applicability of the computing platform, the general-purpose signal processing platform uses a floating-point processor.

With the same amount of tasks, a "small" scale system with high-performance processing units is more efficient than a "large" scale system with lower performance processing units. The performance of the parallel processing unit is quite important. It includes not only the operation speed, but also the memory bandwidth and data communication speed. TI's TMS320C6000 series DSP is the highest performance general-purpose programmable DSP in the industry. The TMS320C6701 has higher performance in the series. Floating point processor. This DSP fully meets the requirements of the signal processing unit performance of the designed general-purpose computing platform, so the TMS320C6701 is selected as the processing unit of the signal processing module.

1.2 Design of communication network

Array signal processing must be that multiple signal processing units work in parallel, and subtasks are allocated in each processing unit of the parallel processing system. The data communication speed and synchronization time between subtasks depend not only on the communication speed of the processing unit itself, but also on the connection. The communication interconnect network of the processing unit and the complex network with rich communication links often provide high data communication speed, but it is much more difficult to design and maintain. Different types of communication networks can be used for different practical applications, which can reduce the complexity of the communication network.

In the interconnection structure design, the interconnection structure of the entire parallel signal processing system is divided into two levels: a system level interconnection structure and a module level interconnection structure.

The system-level interconnect structure is mainly used for communication between modules. In this design, the system-level control network and signal processing network are implemented by RaceWay and VME respectively. The module level interconnect structure mainly refers to the network structure within the signal processing module. The signal processing module system structure is shown in Figure 1.

The signal processing module contains 4 DSPs that provide 4GFLOPS peak processing capability. A shared bus interconnect structure is used within the module. In general, program code and operation data should be stored in the on-chip RAM or local memory of each DSP, which can reduce the number of shared memory accesses, reduce bus contention, and shorten the storage access delay. Shared memory is typically used to support the exchange of data between four DSPs within a module and to support the exchange of data between modules.

In order to reduce the delay caused by each DSP contention bus in the module and improve the communication capability between the DSPs, the adjacent DSPs also form a FIFO ring through a bidirectional FIFO connection. This structure is very suitable for pipeline processing applications, minimizing the overhead of data movement and improving the communication speed between processors.

Pipeline processing is widely used because of its simplicity and efficiency. However, because it only utilizes the parallelism of task time and neglects the spatial parallelism, the parallelism is not high and the speedup ratio is limited. When the load of a certain task in the pipeline is larger than other segments, a processing bottleneck is formed and the system efficiency is reduced. Therefore, the pipeline is often combined with concurrent operations, that is, on the basis of pipeline processing, part of the use of spatial parallelism, called local parallel global serial network. Corresponding to this is a global parallel local serial network, which first uses spatial parallelism to reuse time parallelism, and designs multiple pipelines working in parallel.

The interconnection form of the signal processing module of the parallel signal processing system--the structure of the shared bus and the FIFO ring can well adapt to various deformations of the pipeline processing.

Perkins 201-400KW Diesel Generator

201-400KW Diesel Generator,Perkins Soundproof Generator,Perkins Super Silent Type Diesel Generator,Perkins Super Silent Power Generator

Shanghai Kosta Electric Co., Ltd. , https://www.ksdpower.com