🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

XMVECTOR and XMMATRIX passed as parameters

Started by
7 comments, last by ryt 6 years, 5 months ago

I wanted to learn about DirectXMath so I red this tutorial by digitalerr0r. I think it's a great tutorial but I didn't understand few things.
He mentions that for best practice XMVECTOR/XMMATRIX should be only used for operations on stack and if we want to use a class member that it's best to use XMFLOAT* counterparts because DirectXMath is highly optimized to use SIMD.

At one point he creates PrintVector() and PrintMatrix() functions. For vector one he uses XMVECTOR, a regular variable on stack, where for matrix one he uses XMMATRIX* a pointer. Is there a reason for this? Is it better to use pointer types or regular variables when passing these as params or it doesn't make difference?

[Edit] Sorry for inconvenience, I posted to early. I hope it's clear now.

Advertisement

And what exactly is it you didn't understand?

If you try passing a bunch of SSE types as parameters to a function you will seen find you need __vectorcall on MSVC. Your tutorial author may have been just avoiding this and using a pointer instead.

Whether a function call will be better off passing SSE types as parameters, or with pointers, depends on the context. It is not a straight-forward answer, unless you're just passing one or two XMVECTOR, then doing this as a parameter on 64bit builds will be good since floating point operations are going to be done in SIMD registers anyways (so loads/stores are super cheap or sometimes non-existent).

Ok, so I should generally be good by passing pointers or "values" (and returning them) of XMVECTOR/XMMATRIX instead of using XMFLOAT* counterparts for arguments/returns?

I dunno. Test it out and see which one is better.

I found this text. If I understood correctly it explains that 16bit or bigger values are always passed through registers, whether is a XMM (float) or SIMD (XMVECTOR) so it actually doesn't make any difference. This only holds for 4 or less params. If there are more than 4 params, or if the params don't fit into registers, then they will be passed through stack.
The same holds for return values.

It can be rather complicated as there are many options.

The behavior is different based on the system.  Different hardware, different compilers, and different compiler settings all make a difference.

The actual DirectX libraries have their own conventions that the compiler supports precisely, and they are different in 32-bit and 64-bit versions.  Calls to your own functions can possibly support those same calling conventions, but they can also end up getting compiled with lower performance options or with higher performance options. 

 

The linked story talks of both SSE and AVX values. There are differences between the systems both in terms of operations and in terms of how they are passed.

Slightly older compilers would pass them on the stack.  Newer compilers (MSVC2015 for example) can potentially pass them through the XMM registers, with 64-bit code providing better support than 32-bit code because there are more CPU registers guaranteed on the 64-bit chipsets. 

Additionally, compilers that can target older chips often used the FPU for floating point values, but newer version use SIMD operations in the MMX or XMM registers for floating point values. 64-bit version guarantee SSE and SEE2 features which the compilers can use where the 32-bit versions do not . 

The concern about those switches is that you need to guarantee they're available.  By default 64-bit code assumes a processor built after about 2001, 17 years ago.   So if you wanted to take advantage of AVX2 instructions and the improved instructions and registers, you wild be limited to Haswell-like (and later) processors. Your program would crash on older CPUs. If you use compiler options like /arch:avx or /arch:avx2 that guarantee the presence of additional features, like YMM and ZMM registers the compiler can make different choices.  On the other side, it is possible in 32-bit code to disable XMM, disable MMX, or to require the x87 FPU, effectively generating code that could run on CPUs from the 1990s.  Those are entirely up to you as compiler options, useful if you can guarantee things about the target computer.

Certain calling conventions, such as __vectorcall (rather than __fastcall) can also make a difference, as can the ordering of parameters. 32-bit can potentially support up to 6 values in those registers, but again it all comes down to details like those listed above. You would need to specify those in your code.

 

I agree. I was mainly interested in x64 as this is my demo platform.

This topic is closed to new replies.

Advertisement