Cost/Performance Issues Of Digital Signal Processors

Xo Vue

University of Minnesota, Morris

600 East 4th Street UMM# 142

Morris, MN 56267

(320) 589-0439

vuexo@cda.morris.umn.edu

 


 

 

ABSTRACT

Manufacturers of digital signal processors (DSPs) are producing DSP chips with better performance, power consumption less than 2-V, and chip size that are at or below 0.35 mm [1].  One benefit from changing these design characteristics of a DSP chip is that it can reduce operating voltage and lower the cost of a better circuit design.  In this paper, some cost/performance issues of a general-purpose processor and a digital signal processor will be compared, by observing the basic use of hardware accelerators and architectural techniques such as dynamic concurrency scaling and dynamic superscaler architecture. Then, some possible solutions will be presented to fix these cost/performance issues of a multiprocessor digital signal processing (DSP) system for high speed, real time applications.

 

Keywords      

Digital signal processing (DSP), digital signal processors (DSPs), general-purpose microprocessor

 

1.  INTRODUCTION

Today, there is a wide range of products incorporating DSPs.  An example of a product that uses DSP is a music synthesizer.  DSPs are also use in automobile, consumer electronics, graphics, instrumentation, medical, military, and speech and telecommunications areas. Some applications that are used within these areas are image transmission and compression in graphics, radar processing in military, robotics in industry, or hearing aids and patient monitoring in the medical area.  Most of these DSP applications require high-levels of performance, concurrency and the use of hardware acceleration [1].

 

What is a digital signal processor (DSPs)? Basically, a DSPs is a high-speed single chip microprocessor or microcomputer designed to perform computer intensive digital signal processing tasks [9].  An example is using a programmable digital signal-processing chip to process digital audio data streams or to process noise filtering in audio amplifiers. This means that typical computer such as an IBM PC design for business and other general application are not optimized to handle digital signal processing algorithms, such as digital filtering and fourier analysis. With the aid of advanced architecture, parallel processing, and dedicated DSP instruction sets, digital signal processors can execute millions of instructions per second (MIPS) [6].

 

Permission is granted to make copies of this document for personal or

classroom use.  Copies are not to be made or distributed for profit or

commercial purposes.  To copy otherwise, or in any way publish this

material, requires written permission.

 

 

The current-generation of DSP chips feature at most a 16-bit to 24-bit designs and can deliver from 40 to over 100 MIPS [3].  This level of capability allows complicated algorithms to be executed at a high rate in a small amount of time.

 

How does a digital signal processor work? DSP systems are designed to sample incoming analog signals at fixed time intervals, it must be fast enough to accurately describe the signal, with enough resolution to keep the noise level low. In addition to

doing this, it must convert the signal into a long list of numbers that represent the amplitude (e.g., voltage) of the signal at these points. The accuracy of this approximation determines the system's performance and the sampling rate determines the dynamic range that can be handle by the micro-controller [8]. 

 

A micro-controller react to and controls events, a typical micro-controller application is the monitoring of a house. As the temperature rises, the controller causes the windows to open. If the temperature goes above a certain level, the air conditioner is activated. In addition, if the system detects a burglar, the doors are locked and the windows barred. A micro-controller requires an additional component such as data converters like the analog to digital (A/D) and digital to analog (D/A) converters to be able to interface to analog signals.  A/D and D/A converters are electronic circuits that convert analog audio signals to digital audio signals or digital audio signals to analog audio signals.  Their representative sampling frequency range from 5.5125 kHz to 48 kHz and a 16-bit resolution or higher are used [5]. 

 

As digital signal processing becomes ubiquitous in both personal computers and embedded applications, designers must decide how best to implement signal-processing functions in their systems. There are limited possibilities, therefore in most cases designers have the choice to implement DSP on dedicated DSP chips or general-purpose microprocessors [2].

                 

2. DPSs vs. GENERAL PURPOSE MICRO

Some issues that must be considered relating to the choice of processor are the applications, new and improved architectures, the cost of development, processor power usage, processor execution of complex algorithms, new enhanced features within software, development tools, and performance.  Therefore, designers must also decide whether their applications should use digital signal processors or general-purpose microprocessors.

 

Before designers can consider a certain processor for their applications and DSP implementations.  They must recognize that designing computer systems involves difficult cost-performance trade-offs.  Such as a system that offers high performance levels and sophisticated new features will require more advanced hardware, just as a new architecture would mean new instruction sets, then there has to be new development tools (assemblers, compilers, debuggers, etc.) [7].  Which could mean more testing and debugging time. Hence, it would lead to an increase in cost.

       

To avoid the increase of design cost, designers can always choose to implement DSP on a general-purpose processor, such as an Intel's Pentium processor. Though, a general-purpose processor lacks DSP capabilities.  It can execute most DSP tasks to a certain level of confidence, meaning that the results will be liable.  Usually, the operation of DSP tasks depends greatly on the application using the processor, because not all application requires computational-intensive operations to complete its task.   Therefore, using a general-purpose processor is just a much simpler way to achieve higher performance and lower the design cost, than trying to replace your whole existing processor with a digital signal processor.   

 

An example of a system in which it can be beneficial to use an already existing general-purpose processor to implement DSP is a desktop PC.  Implementing DSP applications, such as audio processing or modem signals on a general-purpose processor enables you to add digital signal processing applications with little or no additional cost [2]. Other examples are cellular phones and PDA's (personal digital assistants).  In addition to keeping cost down, using a general-purpose processor for DSP functions reduces product size and lowers power consumption.

 

2.1 The better microprocessor

To understand whether a general-purpose processor is really well suited for DSP tasks, let's consider an example of a common DSP filter algorithm, Finite Impulse Response filter (FIR) see Figure 1.  Samples are presented to the FIR filter sequentially and the most recent samples are kept in a row of registers. The value in each register is multiplied by the filter coefficient aj and summed to form a filter output [8].

 

 

 

Figure 1. Finite Impulse Response Filter

 

The FIR will be implemented in two different processors, a DSPs and a general-purpose microprocessor to compare the number of instructions needed to complete an operation. The FIRs instructions are as follow [2]:

 

DSP instructions:

 

move       #addr, r0,                 ; load data address into r0

move       #Haddr, r4,              ; load coefficient address into r4

rep           #Ntaps                    ; repeat the following intructions

mac          z0, y0, a                  x: (r0) +, x0     y: (r4) +, y0

 

 

GENERAL-PURPOSE PROCESSOR instructions:

 

Loop:      mov         *ro,r3      ; load data into r3

                Mov        *r1,r4      ; load coefficient into r4

                Mpy       r3, r4, r5  ; multiply into r5

                Add         r5, r6       ; add r5 into accumulator r6

                Ind           r0             ; increment r to read delay line

                Inc           r1             ; increment r to coefficients

                Dec          ctr            ; increment loop counter

                Jnz          loop         ; jump to top if more taps remain

 

In comparison, a general-purpose processor requires more instructions to implement the same filter algorithm than DSPs. However, despite the promising potentials of being able to execute most DSP tasks, it cannot out perform a digital signal processor. The drawbacks of having a general-purpose processor to execute DSP tasks would lead to a slow instruction-execution rate, because it requires more instruction cycles to implement a signal-processing algorithm.

 

This results from the lack of the many key architectural features of digital signal processors, such as a single-cycle multiply accumulator (MAC), hardware looping, multiple on-chip memory buses, and dedicated address generator that support modulo arithmetic [2]. These features are important because DSP algorithms require sampling of input sequences, repetitive execution, and are numerically intensive. The finite impulse response filter (FIR) algorithm just described, demonstrates the computational burden on a DSP chip. 

 

2.1.1  Some Important DSP features

The important role of a DSP chip's address generator is to control the addresses sent to the program and data memories, specifying where the information is to be read from or written to. It must support modulo arithmetic because, DSP systems sample incoming signals at fixed intervals, therefore modulo arithmetic must reduce all numbers to a fixed set such as an interval of (0-11) instead of (1-12).  It does this by repeatedly adding and subtracting N until the result is within the range (0…N-1) [8].

 

The MAC is also an important feature of a digital signal processor.  A typical multiplier accumulator (MAC) performs a 16-by-16-bit multiplication and then adds the 32-bit product to a 32-bit accumulation register in a single instruction cycle, which is called a MAC-cycle [6]. Basically, the MAC is use to perform computational-intensive operations concurrently on incoming signals.  In this way, a multiplication step and an addition step can be computed at the same time resulting in a faster instruction-execution rate. 

 

Although, all microprocessors can perform data manipulation and mathematical calculation, it is difficult and expensive to make a device that is optimized for both tasks.  Therefore, a dedicated processor made for digital signal processing will be the choice since it offers more advantage over a general-purpose processor. Such advantages are: a strong price/performance ratios for DSP applications, consume little power when processing DSP tasks, the architecture is simplify for digital signal processing programming, and often have a support of DSP-oriented application and development tools [2].

 

 

3. REAL -TIME DSP TASK

Ideally, DSP tasks should be executed in real-time but in some applications, such as image processing, the DSP algorithms cannot be implemented in real-time using available DSP devices owing to the high sampling rate required [6].  Hardware and software issues, and the available suitable development tools impose these limitations.  DSP devices also fall far behind in the requirements in several areas of real-time applications, such as speech recognition, image processing, and audio manipulation.

 

3.1.1 Errors in real-time application     

Many different kinds of problems could result from a typical implementation of these real-time DSP applications.  Some of these problems are caused by the algorithm and/or by characteristics of a given processor [5].  Some causes of incorrect DSP implementation are:

 

·          Amplitude range of input signal.  If the samples of input signals are out of range, their values will not be presented correctly. Samples that are out of range will be rounded to the maximum or minimum values and the signal will be distorted.  This problem can be eliminated, by proper scaling of input signal.

·          Overflow problems.  During calculations, an overflow may occur. As in audio processing, when the input signal has reach above 0 dB, which is the maximum output of an audio signal without distortion, it results in an overflow.

·          Memory problems.  Each DSP processor has a finite amount of memory.  Therefore, the maximum amount of data needed for an application should be recalculated in order to avoid overwriting of data.

·          Timing problems. This occurs when there is a sequence of instructions performed on each signal sample, and there is a time limit dictated by sampling frequency.  The ability of the processor to reach that limit depends on the instruction cycle. Therefore, programs should be designed to be as optimal as possible [5].

 

In spite of these errors and disadvantages, DSP devices offer cost-effective solutions for the implementations of many DSP algorithms and applications [6].

 

3.1.2 Application performance using hardware

Most of these DSP applications require high-levels of performance, concurrency and the use of hardware acceleration [1]. Let's consider an example, the manipulation of PC audio, which is able to generate digital signals and consume a significant amount of computational resources and memory.  When manipulating digital audio, each real-time audio task steals CPU cycles from other processing tasks.  Hence, it leads to a decrease in the execution speed of the applications.  This results from the numerical signal processing tasks, all of which are running at the same time.  Therefore, when there are too many real-time algorithms running at the same time it can lead to an unstable system with high technical cost.   In order to achieve a higher level of performance and concurrency, a DSP system would have to trade hardware cost for performance. 

 

The problem is that signal-processing functions impose an excessive burden on processors.  To handle these excessive tasks, hardware accelerators that can handle more computations must be present, but having hardware accelerators also means a higher cost. Even though hardware acceleration imposes a cost, here are three reasons why hardware acceleration is needed for digital signal processing.

 

1.        Signal processing capabilities. 

An example of the capabilities of DSP is a wavetable synthesizer, originally with 24 voices, and when a hardware accelerator of 48 voices is added it could increase the number of synthesized voices to 64 [1]. Leaving the user with more voices and power to work with.

 

2.        Signal processing performance. 

An example of signal processing performance using hardware acceleration would include the implementation of DSP algorithms, which work better when more computational resources are available.  An example would be to make a synthesized voice sound more realistic, in doing so more memory (RAM) are required in order to combine many layers of sound into one to improve the overall quality of a single voice.

 

3.        Signal processing concurrency. 

To run applications concurrently would require PC's to be equipped with hardware accelerators. An example would be replacing a half-duplex sound card that cannot play and record audio simultaneously with a full-duplex sound card that can play and record simultaneously at the same time.  This is beneficial, because more work is done in a smaller amount of time.

 

3.1.3 Application performance without hardware

Next let's consider other options of achieving reasonable DSP performances without relying on hardware accelerators to help achieve that high-level of performance and it will also lower the technical cost. The main objective of this is to increase instruction-execution rate. 

 

3.1.3.1 Dynamic superscalar architecture

An architectural feature use to speed instruction-execution rate is dynamic superscalar architecture. Dynamic superscalar architecture automatically executes nearby instructions in parallel whenever possible.  Though data dependencies within programs and restrictions on which types of instructions can execute in parallel often prevent programs from taking advantage of the potential instruction throughput, parallel execution significantly increases the average rate of instruction execution [2].  Combined with high clock speeds, dynamic superscalar architecture can yield high instruction-execution rates that compensate for general-purpose processor's poor instruction-set efficiency in DSP applications.

 

Although dynamic superscalar architecture is efficient in helping improve signal-processing power, it poses a problem.  Since, instructional scheduling is dynamic, program-execution time is difficult to predict and this makes it hard for DSP programmers [2]. Poor execution-time prediction can lead to serious programming problems, because most DSP applications are implemented in real-time or have real-time constraints.  Which is why sometimes we still rely on hardware accelerators to achieve our desired performances.  

 

4. PERFORMANCE OPTIMIZATION

In addition to hardware accelerators and boosting of a processor's instruction-execution rate, we can also strengthen DSP performance by increasing the amount of DSP work the processor accomplishes per instruction.

 

We can add Single Instruction Multiple Data instruction-set (SIMD) extensions to processors. SIMD instructions partition registers and ALU's so that multiple items of data are present in one register or memory location and so that one instruction can process the data in parallel. [2] For example, a SIMD processor has a total of one 64-bit register.  It can be partition into eight 8-bit, four 16-bit, two 32-bit, or just having it as one 64-bit data element.  SIMD performs operation such as addition, subtraction, or any other computation, on multiple pairs of data elements within the ALU using only one instruction set [4].

 

In many DSP applications, SIMD instructions are effective.  Intel Pentium Processor uses SIMD instructions in its multimedia extensions (MMX), which drastically improve the Pentium Processor.  Though SIMD instructions may improve the processor, these extensions also produce complications such as new bugs.

 

To implement these extensions and still maintain operating-systems compatibility, Intel designed the MMX instructions to share registers with the processor's floating-point unit [2].  DSP programs may incur a penalty of many cycles when switching from floating point arithmetic to MMX modes.  Programs will tend to switch between floating point and MMX until the operation is complete, which puts a burden on the processor. But, since few DSP applications require the use of both fixed-point arithmetic and floating-point arithmetic, switching between MMX and the floating-point modes is not a concern. 

 

4.1 Balanced architecture

Another technique used to optimize the performance of a DSP processor is the balanced architecture [1].  This architecture makes it possible to achieve approximately the performance of the expensive high-end systems.  The balanced architecture solves the conflicting needs of signal processing functions such as concurrency, performance, and capabilities, with the assistance of an idealistic hardware accelerator that incorporates a programmable core (the programmable core preserves the advantages of programmable systems). The advantages are [1,7]:

 

·          Software solutions afford the possibility for the same hardware can be reconfigured to serve multiple functions. For example, a Dolby Digital decoder in a DVD playback scenario.

·          Software solutions allow field driver upgrades to support evolving software standards. For example, alternative multi-channel audio decoders such as MPEG-2. 

·          Software solutions accelerate development. Bugs in hardware require redesign and modifications to masks while bugs in software can be fixed by changing the codes.

·          Software solutions make it easy to customize algorithms and feature sets.

 

The key advantage of the balanced architecture is that future enhancements, such as new hardware or new software are not necessary to improve the overall performance or quality of DSP processing. In the future, there is a possibility that the balanced architecture will make it possible for moderately priced systems to support the capabilities of the high-end performance systems with only a small cost-performance penalty. 

 

4.1.1 Dynamic Concurrency Scaling

Another advantage of the balanced architecture solution is that it supports dynamic concurrency scaling [1].  Dynamic concurrency scaling assures that DSP resources are always utilized to the fullest extent possible, given the concurrency requirements at the moment of execution.  Dynamic concurrency scaling goes into effect when a DSP task happens to require an additional task to complete its process, when there are many other tasks running at the same time, those tasks will receive a message requesting that they free up resources for DSP processing.

 

Consider an example of an Internet game.  At the same moment the game is running, it requires the modem to send and receive signals to and from the Internet.  In this case, the DSP processor must free up resources such as reducing the quality of sound or the graphic of the game.  Another way the DSP processor can free up resources is to transfer some of the signals, which do not necessarily require DSP processing over to the host processor or it must refine its algorithm. For DSP tasks to achieve highest level of performance, DSP tasks usually consume all possible digital signal-processing resources available.

 

4.2  Other solutions

If additional processor power is needed to handle DSP programs and applications, adding in reduced instruction set computer (RISC) architecture can improve the performance, lower the cost, and lower power consumption [2,4,6].  Basically, RISC reduces the instruction sets into a very small size. This approach minimizes the amount of hardware circuitry (gates and transistors) needed to build a processor.  This also leaves extra space on the chip so designers can add extra devices to make the remaining instructions execute more quickly. Recognize that the power of a processor does not come from the operations in its instruction set but from the fact that it can execute each instruction very quickly, typically in a few billionths of a second.

 

5. CONCLUSION

To reduce cost and improve performance, in some cases, designers may even integrate a general-purpose processor and a DSPs into one processor, because most applications still require a general-purpose processor to handle other processing tasks other than signal processing. The benefit of integrating two processors can lead to a more powerful computing system, which can execute signal processing tasks and any other computing task. 

 

However, having two processors also contradicts several common design objectives: lowering the system part count, reducing power consumption, minimizing the size of the processor, and lowering its cost [2]. Therefore to reduce the difficulty of having two processors, designers would have to integrate the system's functionality into one processor.  Reducing the processor from two to one will lead to smaller instruction sets and smaller tool suite [7].         

 

In order to avoid the poor performances, all characteristics of the processor should be examined and the algorithms should be designed according to those characteristics.  The ideal is to choose the right functions that best suite your DSP applications and when all these proper functions are linked together, it will produce the benefit of low cost, low power, and a smaller size processor.  This would optimize central-processing tasks and maximize DSP tasks.

 

6. REFERENCES

 

[1]   Barish, J., Croteau, J., “Use programmable DSPs for cost-effective PCI digital audio design.” Electronic Designs, v46 n9, pp.44-48 (April 20, 1998).

 

[2]   Blalock, Garrick. “General-purpose microprocessors for DSP         applications: Consider the trade-offs.” EDN, v42 n22 pp165-170, (October 23, 1997).

 

[3]   Bursky, Dave. “Higher-Throughput DSP Chips Take On Complex Applications.” Electronic Designs, v46 n12, pp.80(1) (May 25, 1998).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[4]   Caulk, Bob. “Optimize DSP Design With An Extensible Core.” Electronic Designs, v44 n2, pp.81-85

(January 22, 1996).

 

[5]   Certic, J., Dobrosavljevic, Z., Milic, L., “Implementations of Basic DSP Algorithms on ADSP-2181.” 4th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Services. TELSIKS'99 . IEEE. Part vol.2, pp.498-501 (1999)

 

[6]   Deka, Rabin. “A comprehensive study of digital signal processing devices.” Microprocessors and Microsystems, v19 n4, pp.209-221, (May 1995).

 

[7]   Gold, Irving. “DSP: There's a Simpler Way.” Electronic News (1991), v46 i31, p.24 (July 31, 2000).

 

[8]   Griffin, Grant. Iowegian's DSP Guru. "DSP Basics and Facts".  http:\\www.dspguru.com/info/faqs/index2.html

 

[9]   Herrin, E. Golden. “DSPs and CNCs.” Modern Machine Shop, v68 n8, pp.144-146, (January 1996).