🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

Started by
19 comments, last by ProfL 4 years, 10 months ago

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

 

Advertisement
3 minutes ago, alvaro said:

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

 

I realize that work on the gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

2 minutes ago, CarlML said:

I realize that work gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

3 minutes ago, alvaro said:

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

1 minute ago, CarlML said:

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

yes, but you only proven that a higher exponent overflows (quicker) and that causes a slow down.


#include <ctime>
#define NCOUNT 10000000
float vals[ NCOUNT ];
int PowTest( ) {
    using namespace std;
    for ( int i = 0; i < NCOUNT; i++ ) {
        vals[ i ] = 1.f + rand( )* 0.00001f / RAND_MAX;
    }
    float vall0 = 0.0f;
    clock_t begin0 = clock( );
    for ( int i = 0; i < NCOUNT; i++ ) {
        vall0 += pow( vals[ i ], 4.0f );
    }
    clock_t end0 = clock( );
    clock_t begin1 = clock( );
    float vall1 = 0.0f;
    for ( int i = 0; i < NCOUNT; i++ ) {
        vall1 += pow( vals[ i ], 400.0f );
    }
    clock_t end1 = clock( );
    double elapsed_secs0 = double( end0 - begin0 ) / CLOCKS_PER_SEC;
    double elapsed_secs1 = double( end1 - begin1 ) / CLOCKS_PER_SEC;
    printf( "%f %f %f %f\n", elapsed_secs0, elapsed_secs1, vall0, vall1 );
    return 0;
}

sorry, my lazy mod to get it compile

26 minutes ago, ProfL said:

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

The numbers have to be random so that the compiler won't optimize things out.

If I use random values between 0.0f and 1.0f the times are similar as in my test and the result doesn't go infinite. So overflow can't be the only reason.

Edit: but they probably went infinitesimal.

In my case I get the same timings from both code paths after my change.

 

No, you don't need to add random numbers, the array already obfuscates the access for the compiler, at least in the default settings, but my mod has some rand to have noise. results are still the same for 4 and 400

If I use random values between 1.0 and 1.2 there is no overflow either way and the timing results are nearly the same so the culprit was most likely infinite or infinitesimal numbers.

Thanks for the input.

thanks for your feedback, and I think it's good you've tried it yourself rather than just believing blindly the guys on the internet :)

But like someone said, your GPU results might differ. I suggest you try that also, simplest would be https://www.shadertoy.com/

just click on "new" and modify the color output to run a few more pow and increase your loop count until the fps drops  (try full screen for the best slowdown).

you might be also surprised how many times you can run pow without to worry about that instruction.

This topic is closed to new replies.

Advertisement