The main problem between FX-cards and DX9 (or better PS2.0) is
the difference in concept, but lets start at the beginning :
ATI where simply quicker with their development of a next-gen
grafics-card, so MS took what ATI had available, and DX9 was
designed quite close to the Radeon-architecture.
Short shader-routines, 24bit fp-precision as standard (although lower
is not a problem) and some other things that are almost tailored
towards the ATI-architecure.
Unfortunatly, NV took a completly different attempt on their nex-gen architecture.
NV aimed from the beginning at 32bit fp-precision, long shader-routines
and so on...
When DX9 was released, it became obvious that all the specs.
that made it run exeptionaly good on ATI-architecture would
actualy hit all the weak spots of NV-architecture. Thats not realy
anyones fault, it just happend due to the very different design
of both hardwares. But it was also much to late for NV to quickly
change their architecture, as developing a modern grafics-chip
is an enormous amount of work and takes month/years ...
One of the problems for example is, that most DX9-games use 24bit fp precision.
Unfortunatly, NV-cards have no internal 24bitfpp-mode, and have
to run in 32bitfpp-mode.
Of course that mode is slower then a native 24bitfpp-mode would be.
So to keep up with ATI in DX9 PS2.0 performance, NV would have
to run 32bitfpp-mode as fast as ATI runs its 24bitfpp-mode, OR
find a way to run it in 16bitfpp-mode without loss of image quality,
which can be achieved, but only with lots of effort ... (see DooM�).
And as PS-routines become more and more complex, we get to a
point where 16bitfpp-mode will not suffice anymore. But for the time
beeing, no game realy needs 24bitfpp-mode.
NV now tries to optimize their drivers to re-compile DX9-instructions
to a more NV-hardware near code. They exchange 24 bitfpp with
a
mixture of 16/32bitfpp while keeping as close to the reference
image quality
as possible, they re-order instructions to better fill the nv-pipeline
and make more efficient use of its bandwith.
So, the problem lies in between the hardware and the API, and
the drivers are actualy doing their best to overcome this gap.
|