🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Is pow(val, 40.0) more expensive than pow(val, 4.0) ?

Started by
19 comments, last by ProfL 4 years, 10 months ago

I'm curious about how the pow function works. Does the cost of the function go up linearly with the number of powers? Does it calculate val*val 40 times in pow(val, 40.0)?

Advertisement

That depends a lot on the hardware and the compiler. in general, both should be the same if there is hardware support. If the compiler realizes you want x*=x; x*=x; it most likely will be quicker.

the way pow works is usually :

exp( exponent * log( base ) );

Thanks for the reply. Sounds good that it would generally be the same.

 

So I guess calculating a very tight specular highlight using pow() is not more expensive than calculating a wide highlight then.

When it comes to performance, you can only tell by measuring. Even if at some point you think you understand how something like this works, a few years later compilers and hardware might have changed and your knowledge might become obsolete. This has happened to me several times.

But my educated guess is the same as ProfL's: It will probably be implemented using exp(exponent * log(base)) and it won't matter what the exponent is.

if you use pow, to calculate a highlight, in a shader, you most likely will pass the value as a constant, that's what the compiler won't see, hence it will use a generic exp+log. In that case it will run at the same speed, no matter what value that light power constant is.

I just did some time measurements and it seems the exponent does matter. Between 4.0 and 40.0 there was no noticable difference but as the exponent got higher there was, so it seems there is a point where the algorithm changes based on the exponent.

Some results in milliseconds for doing pow() 100 000 times in c++ (including accessing an array to get a random value):

4.0: 2.977
40.0: 2.966
400.0: 4.192
4 000: 4.803
40 000: 3.872
400 000: 3.742

I wouldn't be surprised if there are fast-paths in place for small and/or common exponents. That could be as part of the library implementation or applied by the compiler or the hardware.

On 9/3/2019 at 11:50 AM, CarlML said:

I just did some time measurements and it seems the exponent does matter. Between 4.0 and 40.0 there was no noticable difference but as the exponent got higher there was, so it seems there is a point where the algorithm changes based on the exponent.

Some results in milliseconds for doing pow() 100 000 times in c++ (including accessing an array to get a random value):

4.0: 2.977
40.0: 2.966
400.0: 4.192
4 000: 4.803
40 000: 3.872
400 000: 3.742

First of all, are you really calculating specular highlights on the CPU?

Either way, this type of synthetic test is probably not very relevant. For instance, depending on the range of numbers you are plugging in, you might be getting degradation for high exponents because handling infinities, or denormalized (very small) numbers might be slower than operating on regular numbers. More generally, in your real program the CPU might be able to parallelize the pow() with some other operations, while in your test it might not (or the other way around). The cache usage might be very different. Etc.

The way to test performance is to introduce timings in your program and run it in realistic conditions.

Out of curiosity, try with an exponent `4' (instead of `4.0'). In some cases this might be much faster.

99% of the time, an artificial benchmark like this benchmarks the capability of the creator, not of it supposes to test ;)

could you share your test code, your compiler name, compile settings. 

2 hours ago, alvaro said:

First of all, are you really calculating specular highlights on the CPU?

Either way, this type of synthetic test is probably not very relevant. For instance, depending on the range of numbers you are plugging in, you might be getting degradation for high exponents because handling infinities, or denormalized (very small) numbers might be slower than operating on regular numbers. More generally, in your real program the CPU might be able to parallelize the pow() with some other operations, while in your test it might not (or the other way around). The cache usage might be very different. Etc.

The way to test performance is to introduce timings in your program and run it in realistic conditions.

Out of curiosity, try with an exponent `4' (instead of `4.0'). In some cases this might be much faster.

No specular calcualtion happens in a shader. Doing it on the cpu would be crazy.?

As you point out, I suspect the timing difference has something to do with numbers going infiinite or infitinesimal. In a regular use case where I would use values between 20.0 and 100.0 for calucalting specular I suspect there generally would not be a big difference.

To say that the exponent does not matter is wrong though because those numbers in my test don't lie.

1 hour ago, ProfL said:

99% of the time, an artificial benchmark like this benchmarks the capability of the creator, not of it supposes to test ;)

could you share your test code, your compiler name, compile settings. 

Don't be butthurt that the numbers didn't go your way.?

In any case I appreciate the input.

 

This was my test code, using Visual Studio 2017:

 


float vals[100000];
for (int i = 0; i < 100000; i++)
{
   vals[i] = random.getf(0.0f, 1000.0f);
}
float vall = 0.0f;
timer.Start();
for (int i = 0; i < 100000; i++)
{
    vall += pow(vals[i], 4.0f);
}
float tim1 = timer.End();
timer.Start();
for (int i = 0; i < 100000; i++)
{
    vall += pow(vals[i], 400.0f);
}
float tim2 = timer.End();
Print(Vec3(tim1, tim2, vall));

 

This topic is closed to new replies.

Advertisement