Cost/Performance
Issues Of Digital Signal Processors
Xo Vue
University of Minnesota, Morris
600 East 4th Street UMM# 142
Morris, MN 56267
(320) 589-0439
vuexo@cda.morris.umn.edu
Manufacturers of digital
signal processors (DSPs) are producing DSP chips with better performance, power
consumption less than 2-V, and chip size that are at or below 0.35 mm [1]. One benefit from changing these design
characteristics of a DSP chip is that it can reduce operating voltage and lower
the cost of a better circuit design. In
this paper, some cost/performance issues of a general-purpose processor and a
digital signal processor will be compared, by observing the basic use of
hardware accelerators and architectural techniques such as dynamic concurrency
scaling and dynamic superscaler architecture. Then, some possible solutions
will be presented to fix these cost/performance issues of a multiprocessor
digital signal processing (DSP) system for high speed, real time applications.
Digital signal processing
(DSP), digital signal processors (DSPs), general-purpose microprocessor
1. INTRODUCTION
Today, there is a wide range of products
incorporating DSPs. An example of a
product that uses DSP is a music synthesizer.
DSPs are also use in automobile, consumer electronics, graphics,
instrumentation, medical, military, and speech and telecommunications areas.
Some applications that are used within these areas are image transmission and
compression in graphics, radar processing in military, robotics in industry, or
hearing aids and patient monitoring in the medical area. Most of these DSP applications require
high-levels of performance, concurrency and the use of hardware acceleration
[1].
What is a digital signal processor (DSPs)?
Basically, a DSPs is a high-speed single chip microprocessor or microcomputer
designed to perform computer intensive digital signal processing tasks
[9]. An example is using a programmable
digital signal-processing chip to process digital audio data streams or to
process noise filtering in audio amplifiers. This means that typical computer
such as an IBM PC design for business and other general application are not
optimized to handle digital signal processing algorithms, such as digital
filtering and fourier analysis. With the aid of advanced architecture, parallel
processing, and dedicated DSP instruction sets, digital signal processors can
execute millions of instructions per second (MIPS) [6].
Permission
is granted to make copies of this document for personal or
classroom
use. Copies are not to be made or
distributed for profit or
commercial
purposes. To copy otherwise, or in any
way publish this
material, requires written permission.
The current-generation of DSP chips feature at most
a 16-bit to 24-bit designs and can deliver from 40 to over 100 MIPS [3]. This level of capability allows complicated
algorithms to be executed at a high rate in a small amount of time.
How does a digital signal processor work? DSP
systems are designed to sample incoming analog signals at fixed time intervals,
it must be fast enough to accurately describe the signal, with enough
resolution to keep the noise level low. In addition to
doing this, it must convert the signal into a long
list of numbers that represent the amplitude (e.g., voltage) of the signal at
these points. The accuracy of this approximation determines the system's
performance and the sampling rate determines the dynamic range that can be
handle by the micro-controller [8].
A micro-controller react to and controls events,
a typical
micro-controller application is the monitoring of a house. As the temperature
rises, the controller causes the windows to open. If the temperature goes above
a certain level, the air conditioner is activated. In addition, if the system
detects a burglar, the doors are locked and the windows barred. A
micro-controller requires an additional component such as data converters like
the analog to digital (A/D) and digital to analog (D/A) converters to be able
to interface to analog signals. A/D and
D/A converters are electronic circuits that convert analog audio signals to
digital audio signals or digital audio signals to analog audio signals. Their representative sampling frequency
range from 5.5125 kHz to 48 kHz and a 16-bit resolution or higher are used
[5].
As digital signal processing becomes ubiquitous in
both personal computers and embedded applications, designers must decide how
best to implement signal-processing functions in their systems. There are
limited possibilities, therefore in most cases designers have the choice to
implement DSP on dedicated DSP chips or general-purpose microprocessors [2].
2. DPSs vs. GENERAL PURPOSE MICRO
Some issues that must be considered relating to the
choice of processor are the applications, new and improved architectures, the
cost of development, processor power usage, processor execution of complex
algorithms, new enhanced features within software, development tools, and
performance. Therefore, designers must
also decide whether their applications should use digital signal processors or
general-purpose microprocessors.
Before designers can consider a certain processor
for their applications and DSP implementations. They must recognize that designing computer systems involves
difficult cost-performance trade-offs.
Such as a system that offers high performance levels and sophisticated
new features will require more advanced hardware, just as a new architecture
would mean new instruction sets, then there has to be new development tools
(assemblers, compilers, debuggers, etc.) [7].
Which could mean more testing and debugging time. Hence, it would lead
to an increase in cost.
To avoid the increase of design cost, designers can
always choose to implement DSP on a general-purpose processor, such as an Intel's
Pentium processor. Though, a general-purpose processor lacks DSP
capabilities. It can execute most DSP
tasks to a certain level of confidence, meaning that the results will be
liable. Usually, the operation of DSP
tasks depends greatly on the application using the processor, because not all
application requires computational-intensive operations to complete its
task. Therefore, using a
general-purpose processor is just a much simpler way to achieve higher
performance and lower the design cost, than trying to replace your whole
existing processor with a digital signal processor.
An example of a system in which it can be beneficial
to use an already existing general-purpose processor to implement DSP is a
desktop PC. Implementing DSP applications,
such as audio processing or modem signals on a general-purpose processor
enables you to add digital signal processing applications with little or no
additional cost [2]. Other examples are cellular phones and PDA's (personal
digital assistants). In addition to
keeping cost down, using a general-purpose processor for DSP functions reduces
product size and lowers power consumption.
2.1 The better microprocessor
To understand whether a general-purpose processor is
really well suited for DSP tasks, let's consider an example of a common DSP
filter algorithm, Finite Impulse Response filter (FIR) see Figure 1. Samples are presented to the FIR filter
sequentially and the most recent samples are kept in a row of registers. The
value in each register is multiplied by the filter coefficient aj and summed to form a
filter output [8].

Figure 1.
Finite Impulse Response Filter
The FIR will be implemented in two different
processors, a DSPs and a general-purpose microprocessor to compare the number
of instructions needed to complete an operation. The FIRs instructions are as
follow [2]:
DSP instructions:
move #addr,
r0, ; load data address
into r0
move #Haddr,
r4, ; load coefficient
address into r4
rep #Ntaps ; repeat the following
intructions
mac z0,
y0, a x: (r0) +, x0 y: (r4) +, y0
GENERAL-PURPOSE PROCESSOR instructions:
Loop: mov *ro,r3 ;
load data into r3
Mov *r1,r4 ;
load coefficient into r4
Mpy r3, r4, r5 ; multiply into r5
Add r5, r6 ;
add r5 into accumulator r6
Ind r0 ;
increment r to read delay line
Inc r1 ;
increment r to coefficients
Dec ctr ;
increment loop counter
Jnz loop ;
jump to top if more taps remain
In comparison, a general-purpose processor requires
more instructions to implement the same filter algorithm than DSPs. However,
despite the promising potentials of being able to execute most DSP tasks, it
cannot out perform a digital signal processor. The drawbacks of having a
general-purpose processor to execute DSP tasks would lead to a slow
instruction-execution rate, because it requires more instruction cycles to
implement a signal-processing algorithm.
This results from the lack of the many key
architectural features of digital signal processors, such as a single-cycle
multiply accumulator (MAC), hardware looping, multiple on-chip memory buses,
and dedicated address generator that support modulo arithmetic [2]. These
features are important because DSP algorithms require sampling of input
sequences, repetitive execution, and are numerically intensive. The finite
impulse response filter (FIR) algorithm just described, demonstrates the
computational burden on a DSP chip.
2.1.1 Some Important DSP features
The important role of a DSP chip's address generator
is to control the addresses sent to the program and data memories, specifying
where the information is to be read from or written to. It must support modulo
arithmetic because, DSP systems sample incoming signals at fixed intervals,
therefore modulo arithmetic must reduce all numbers to a fixed set such as an
interval of (0-11) instead of (1-12).
It does this by repeatedly adding and subtracting N until the result is
within the range (0…N-1) [8].
The MAC is also an important feature of a digital
signal processor. A typical multiplier
accumulator (MAC) performs a 16-by-16-bit multiplication and then adds the
32-bit product to a 32-bit accumulation register in a single instruction cycle,
which is called a MAC-cycle [6]. Basically, the MAC is use to perform
computational-intensive operations concurrently on incoming signals. In this way, a multiplication step and an
addition step can be computed at the same time resulting in a faster
instruction-execution rate.
Although, all microprocessors
can perform data manipulation and mathematical calculation, it is difficult and
expensive to make a device that is optimized
for both tasks. Therefore, a dedicated processor made for
digital signal processing will be the choice since it offers more advantage
over a general-purpose processor. Such advantages are: a strong price/performance
ratios for DSP applications, consume little power when processing DSP tasks,
the architecture is simplify for digital signal processing programming, and
often have a support of DSP-oriented application and development tools [2].
3. REAL -TIME DSP TASK
Ideally, DSP tasks should be executed in real-time
but in some applications, such as image processing, the DSP algorithms cannot
be implemented in real-time using available DSP devices owing to the high
sampling rate required [6]. Hardware
and software issues, and the available suitable development tools impose these
limitations. DSP devices also fall far
behind in the requirements in several areas of real-time applications, such as
speech recognition, image processing, and audio manipulation.
3.1.1 Errors
in real-time application
Many different kinds of problems could result from a
typical implementation of these real-time DSP applications. Some of these problems are caused by the
algorithm and/or by characteristics of a given processor [5]. Some causes of incorrect DSP implementation
are:
·
Amplitude range of input
signal. If the samples of input signals are out of
range, their values will not be presented correctly. Samples that are out of
range will be rounded to the maximum or minimum values and the signal will be
distorted. This problem can be
eliminated, by proper scaling of input signal.
·
Overflow problems. During calculations, an overflow may occur. As in audio
processing, when the input signal has reach above 0 dB, which is the maximum
output of an audio signal without distortion, it results in an overflow.
·
Memory problems. Each DSP processor has a finite amount of memory. Therefore, the maximum amount of data needed
for an application should be recalculated in order to avoid overwriting of
data.
·
Timing problems. This occurs when there is a
sequence of instructions performed on each signal sample, and there is a time
limit dictated by sampling frequency.
The ability of the processor to reach that limit depends on the instruction
cycle. Therefore, programs should be designed to be as optimal as possible [5].
In spite of these errors and disadvantages, DSP
devices offer cost-effective solutions for the implementations of many DSP
algorithms and applications [6].
3.1.2 Application
performance using hardware
Most of these DSP applications require high-levels
of performance, concurrency and the use of hardware acceleration [1]. Let's
consider an example, the manipulation of PC audio, which is able to generate
digital signals and consume a significant amount of computational resources and
memory. When manipulating digital
audio, each real-time audio task steals CPU cycles from other processing
tasks. Hence, it leads to a decrease in
the execution speed of the applications.
This results from the numerical signal processing tasks, all of which
are running at the same time.
Therefore, when there are too many real-time algorithms running at the
same time it can lead to an unstable system with high technical cost. In order to achieve a higher level of
performance and concurrency, a DSP system would have to trade hardware cost for
performance.
The problem is that signal-processing functions
impose an excessive burden on processors.
To handle these excessive tasks, hardware accelerators that can handle
more computations must be present, but having hardware accelerators also means
a higher cost. Even though hardware acceleration imposes a cost, here are three
reasons why hardware acceleration is needed for digital signal processing.
1.
Signal
processing capabilities.
An example of the
capabilities of DSP is a wavetable synthesizer, originally with 24 voices, and
when a hardware accelerator of 48 voices is added it could increase the number
of synthesized voices to 64 [1]. Leaving the user with more voices and power to
work with.
2.
Signal
processing performance.
An example of signal
processing performance using hardware acceleration would include the
implementation of DSP algorithms, which work better when more computational
resources are available. An example
would be to make a synthesized voice sound more realistic, in doing so more
memory (RAM) are required in order to combine many layers of sound into one to
improve the overall quality of a single voice.
3.
Signal
processing concurrency.
To run applications
concurrently would require PC's to be equipped with hardware accelerators. An
example would be replacing a half-duplex sound
card that cannot play and record audio simultaneously with a full-duplex sound card that can play and
record simultaneously at the same time.
This is beneficial, because more work is done in a smaller amount of
time.
3.1.3
Application performance without hardware
Next let's consider other options of achieving
reasonable DSP performances without relying on hardware accelerators to help
achieve that high-level of performance and it will also lower the technical
cost. The main objective of this is to increase instruction-execution
rate.
3.1.3.1
Dynamic superscalar architecture
An architectural feature use to speed
instruction-execution rate is dynamic
superscalar architecture. Dynamic superscalar architecture automatically
executes nearby instructions in parallel whenever possible. Though data dependencies within programs and
restrictions on which types of instructions can execute in parallel often
prevent programs from taking advantage of the potential instruction throughput,
parallel execution significantly increases the average rate of instruction
execution [2]. Combined with high clock
speeds, dynamic superscalar architecture can yield high instruction-execution
rates that compensate for general-purpose processor's poor instruction-set
efficiency in DSP applications.
Although dynamic superscalar architecture is
efficient in helping improve signal-processing power, it poses a problem. Since, instructional scheduling is dynamic,
program-execution time is difficult to predict and this makes it hard for DSP
programmers [2]. Poor execution-time prediction can lead to serious programming
problems, because most DSP applications are implemented in real-time or have
real-time constraints. Which is why
sometimes we still rely on hardware accelerators to achieve our desired
performances.
4. PERFORMANCE OPTIMIZATION
In addition to hardware accelerators and boosting of
a processor's instruction-execution rate, we can also strengthen DSP
performance by increasing the amount of DSP work the processor accomplishes per
instruction.
We can add Single Instruction Multiple Data
instruction-set (SIMD) extensions to processors. SIMD instructions partition
registers and ALU's so that multiple items of data are present in one register
or memory location and so that one instruction can process the data in
parallel. [2] For example, a SIMD processor has a total of one 64-bit
register. It can be partition into
eight 8-bit, four 16-bit, two 32-bit, or just having it as one 64-bit data
element. SIMD performs operation such
as addition, subtraction, or any other computation, on multiple pairs of data
elements within the ALU using only one instruction set [4].
In many DSP applications, SIMD instructions are
effective. Intel Pentium Processor uses
SIMD instructions in its multimedia extensions (MMX), which drastically improve
the Pentium Processor. Though SIMD
instructions may improve the processor, these extensions also produce
complications such as new bugs.
To implement these extensions and still maintain
operating-systems compatibility, Intel designed the MMX instructions to share
registers with the processor's floating-point unit [2]. DSP programs may incur a penalty of many
cycles when switching from floating point arithmetic to MMX modes. Programs will tend to switch between
floating point and MMX until the operation is complete, which puts a burden on
the processor. But, since few DSP applications require the use of both
fixed-point arithmetic and floating-point arithmetic, switching between MMX and
the floating-point modes is not a concern.
4.1 Balanced architecture
Another technique used to optimize the performance
of a DSP processor is the balanced
architecture [1]. This architecture makes it possible to
achieve approximately the performance of the expensive high-end systems. The balanced architecture solves the
conflicting needs of signal processing functions such as concurrency,
performance, and capabilities, with the assistance of an idealistic hardware
accelerator that incorporates a programmable core (the programmable core
preserves the advantages of programmable systems). The advantages are [1,7]:
·
Software
solutions afford the possibility for the same hardware can be reconfigured to
serve multiple functions. For example, a Dolby Digital decoder in a DVD
playback scenario.
·
Software
solutions allow field driver upgrades to support evolving software standards.
For example, alternative multi-channel audio decoders such as MPEG-2.
·
Software
solutions accelerate development. Bugs in hardware require redesign and
modifications to masks while bugs in software can be fixed by changing the
codes.
·
Software
solutions make it easy to customize algorithms and feature sets.
The key advantage of the balanced architecture is
that future enhancements, such as new hardware or new software are not
necessary to improve the overall performance or quality of DSP processing. In
the future, there is a possibility that the balanced architecture will make it
possible for moderately priced systems to support the capabilities of the
high-end performance systems with only a small cost-performance penalty.
4.1.1 Dynamic
Concurrency Scaling
Another advantage of the balanced architecture
solution is that it supports dynamic
concurrency scaling [1]. Dynamic concurrency scaling assures that DSP
resources are always utilized to the fullest extent possible, given the
concurrency requirements at the moment of execution. Dynamic concurrency scaling goes into effect when a DSP task
happens to require an additional task to complete its process, when there are
many other tasks running at the same time, those tasks will receive a message
requesting that they free up resources for DSP processing.
Consider an example of an Internet game. At the same moment the game is running, it
requires the modem to send and receive signals to and from the Internet. In this case, the DSP processor must free up
resources such as reducing the quality of sound or the graphic of the
game. Another way the DSP processor can
free up resources is to transfer some of the signals, which do not necessarily
require DSP processing over to the host processor or it must refine its
algorithm. For DSP tasks to achieve highest level of performance, DSP tasks
usually consume all possible digital signal-processing resources available.
4.2 Other solutions
If additional processor power is needed to handle
DSP programs and applications, adding in reduced instruction set computer
(RISC) architecture can improve the performance, lower the cost, and lower
power consumption [2,4,6]. Basically,
RISC reduces the instruction sets into a very small size. This approach
minimizes the amount of hardware circuitry (gates and transistors) needed to
build a processor. This also leaves
extra space on the chip so designers can add extra devices to make the
remaining instructions execute more quickly. Recognize that the power of a
processor does not come from the operations in its instruction set but from the
fact that it can execute each instruction very quickly, typically in a few
billionths of a second.
5. CONCLUSION
To reduce cost and improve performance, in some
cases, designers may even integrate a general-purpose processor and a DSPs into
one processor, because most applications still require a general-purpose
processor to handle other processing tasks other than signal processing. The
benefit of integrating two processors can lead to a more powerful computing
system, which can execute signal processing tasks and any other computing
task.
However, having two processors also contradicts
several common design objectives: lowering the system part count, reducing
power consumption, minimizing the size of the processor, and lowering its cost
[2]. Therefore to reduce the difficulty of having two processors, designers
would have to integrate the system's functionality into one processor. Reducing the processor from two to one will
lead to smaller instruction sets and smaller tool suite [7].
In order to avoid the poor performances, all
characteristics of the processor should be examined and the algorithms should
be designed according to those characteristics. The ideal is to choose the right functions that best suite your
DSP applications and when all these proper functions are linked together, it
will produce the benefit of low cost, low power, and a smaller size
processor. This would optimize
central-processing tasks and maximize DSP tasks.
6. REFERENCES
[1] Barish, J., Croteau, J., “Use programmable
DSPs for cost-effective PCI digital audio design.” Electronic Designs, v46 n9,
pp.44-48 (April 20, 1998).
[2] Blalock, Garrick. “General-purpose
microprocessors for DSP
applications: Consider the trade-offs.” EDN, v42 n22 pp165-170, (October
23, 1997).
[3] Bursky, Dave. “Higher-Throughput DSP Chips
Take On Complex Applications.” Electronic Designs, v46 n12, pp.80(1) (May 25,
1998).
[4] Caulk, Bob. “Optimize DSP Design With An Extensible Core.” Electronic Designs, v44 n2, pp.81-85
(January 22, 1996).
[5] Certic, J., Dobrosavljevic, Z., Milic, L.,
“Implementations of Basic DSP Algorithms on ADSP-2181.” 4th
International Conference on Telecommunications in Modern Satellite, Cable and
Broadcasting Services. TELSIKS'99 . IEEE. Part vol.2, pp.498-501 (1999)
[6] Deka, Rabin. “A comprehensive study of digital signal processing devices.” Microprocessors and Microsystems, v19 n4, pp.209-221, (May 1995).
[7] Gold, Irving. “DSP: There's a Simpler Way.” Electronic News (1991), v46 i31, p.24 (July 31, 2000).
[8] Griffin, Grant. Iowegian's DSP Guru. "DSP Basics and
Facts".
http:\\www.dspguru.com/info/faqs/index2.html
[9] Herrin, E. Golden. “DSPs and CNCs.” Modern Machine
Shop, v68 n8, pp.144-146, (January 1996).