🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to Math and Physics

Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

CarlML · 2019-09-05T16:20:44

I'm curious about how the pow function works. Does the cost of the function go up linearly with the number of powers? Does it calculate val*val 40 times in pow(val, 40.0)?

Math and Physics Programming

Started by CarlML September 03, 2019 11:26 AM

19 comments, last by ProfL 4 years, 10 months ago

alvaro

21,604

September 05, 2019 03:40 PM

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

CarlML

217

Author

September 05, 2019 03:43 PM

3 minutes ago, alvaro said:

You are not getting the correct lesson from this. You really should test the code in conditions that are as realistic as possible. If you are computing this in a shader but your test is on the CPU, the numbers you speak of are irrelevant.

I realize that work on the gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

alvaro

21,604

September 05, 2019 03:45 PM

2 minutes ago, CarlML said:

I realize that work gpu might be different but this thread is about the pow() function in general, not the specific use case I brought up.

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

ProfL

717

September 05, 2019 03:53 PM

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

CarlML

217

Author

September 05, 2019 03:53 PM

3 minutes ago, alvaro said:

There is nearly nothing that can be said about the pow() function in general. The only context in which the speed of pow() matters is in optimizing code, and that can only be done in the context of specific compilers and specific hardware.

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

ProfL

717

September 05, 2019 03:55 PM

1 minute ago, CarlML said:

Well my simple test case was enough to determine that the exponent matters in some cases. I'm not sure why but that's something. You can test yourself if oyu like. I posted the code on the first page.

yes, but you only proven that a higher exponent overflows (quicker) and that causes a slow down.


#include <ctime>
#define NCOUNT 10000000
float vals[ NCOUNT ];
int PowTest( ) {
    using namespace std;
    for ( int i = 0; i < NCOUNT; i++ ) {
        vals[ i ] = 1.f + rand( )* 0.00001f / RAND_MAX;
    }
    float vall0 = 0.0f;
    clock_t begin0 = clock( );
    for ( int i = 0; i < NCOUNT; i++ ) {
        vall0 += pow( vals[ i ], 4.0f );
    }
    clock_t end0 = clock( );
    clock_t begin1 = clock( );
    float vall1 = 0.0f;
    for ( int i = 0; i < NCOUNT; i++ ) {
        vall1 += pow( vals[ i ], 400.0f );
    }
    clock_t end1 = clock( );
    double elapsed_secs0 = double( end0 - begin0 ) / CLOCKS_PER_SEC;
    double elapsed_secs1 = double( end1 - begin1 ) / CLOCKS_PER_SEC;
    printf( "%f %f %f %f\n", elapsed_secs0, elapsed_secs1, vall0, vall1 );
    return 0;
}

sorry, my lazy mod to get it compile

CarlML

217

Author

September 05, 2019 03:58 PM

26 minutes ago, ProfL said:

the reason you get different results is because your vall overflows (becomes inf), which is a special number for the cpu that is handled in a fallback mode, slower. instead of initializing vals to random, initialize it to 1.f or 1.00001f (in case you worry that affects the pow function). after that change and accumulating to vall4 and vall400, I get the same time results.

The numbers have to be random so that the compiler won't optimize things out.

If I use random values between 0.0f and 1.0f the times are similar as in my test and the result doesn't go infinite. So overflow can't be the only reason.

Edit: but they probably went infinitesimal.

ProfL

717

September 05, 2019 04:00 PM

In my case I get the same timings from both code paths after my change.

No, you don't need to add random numbers, the array already obfuscates the access for the compiler, at least in the default settings, but my mod has some rand to have noise. results are still the same for 4 and 400

CarlML

217

Author

September 05, 2019 04:16 PM

If I use random values between 1.0 and 1.2 there is no overflow either way and the timing results are nearly the same so the culprit was most likely infinite or infinitesimal numbers.

Thanks for the input.

ProfL

717

September 05, 2019 04:20 PM

thanks for your feedback, and I think it's good you've tried it yourself rather than just believing blindly the guys on the internet

But like someone said, your GPU results might differ. I suggest you try that also, simplest would be https://www.shadertoy.com/

just click on "new" and modify the color output to run a few more pow and increase your loop count until the fps drops (try full screen for the best slowdown).

you might be also surprised how many times you can run pow without to worry about that instruction.

🎉 Celebrating 25 Years of GameDev.net! 🎉

Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines