# Float vs Double

Recently I took a CUDA course and one of the things they mentioned to keep an out for was the usage of double precision. Double precision operations are slower and the added precision wasn’t worth it, so they say. This made me wonder whether this is also true for regular programming with languages such as C++ and Java. Especially because both of these languages have a double precision as the default floating point number. So how bad is it to use doubles instead of regular float, and what about that precision. I have investigated this in Java.

For this test I worked out a couple of scenarios, and executed them with floats and doubles. Each scenario consists of doing an operation 100 million times. Executing each of these scenarios 100 times gives a nice average of the duration. The tested scenarios are:

1. Add up 0.001 on each iteration.
2. Subtract 0.001 on each iteration (starting with 200.000).
3. Add an value which gets halved upon each iteration (starting with 1000).
4. Add up a variable value between 0.001 and 0.099.
5. Switch between adding and subtracting an increasing value (iteration * 0.1).

To compare the float and the double the most important is that the value is correct, and only then the performance should be considered. Since we are dealing with floating point numbers, some error is to be expected, but the goal should be to keep it as small as possible. An overview of the runtime performance is shown in the following figure, this immediately shows that often there isn’t a big improvement for the floats. For the first scenario, we immediately see that float is not sufficient, due to the big difference between the values we are adding. As the sum grows there simply isn’t enough room to respect the order of magnitude difference between the small value and the sum. In the end the float reports a result of 32768, instead of the real solution of 100.000. What is even worse is that we see that the runtime of the floats is larger than that of the double (201ms compared to 146ms).

The second scenario has the same effect, but shows different behaviour. The result of the float is wrong as it still reports the original 200.000 compared to the actual 100.000. But what we see from the runtime is that the JVM completely removed the calculation as it runs in 0ms.

Scenarios 3 and 5 give in the correct result, although scenario 5 is less accurate than doubles. This is caused by the fact that there is no big difference between the order of magnitude of the numbers, or the smaller numbers don’t really matter anymore. The runtime for scenario 3 does not show a big improvement (243ms vs 252ms), whereas scenario 5 does it in half the time compared to doubles (176ms vs 343ms).

Scenario 4 was an attempt to see how floating point multiplication with an integer is being handled. But just as with scenario 1 and 2, the magnitude difference completely dominates any other error as the wrong result (2097152 instead of 4950000) is being reported. Moreover, just as with scenario 1, is the float operations slower than the double.

In general it is clear that float has a lot of limitations when dealing with both large and small numbers, as any developer should know. What is however interesting is that in most cases floats was not considerably faster than using doubles. This means that in general there should be no reason to use floats over doubles.

Of course there could still be good reasons to do so, such as minimise memory usage. If you however use floats to increase the performance of your application, you might be out of luck. You will need an application that performs millions of floating point operations before any performance throughput can be achieved. But in both cases it is advised to verify that the solution is correct, and see whether performance has been increased.

Advertisements

This site uses Akismet to reduce spam. Learn how your comment data is processed.