rulururu

post Tricky Floating Point Numbers

September 27th, 2008

Filed under: C++ — Kai @ 6:34 pm

Everyone knows that floating point numbers do have finite ranges, but this limitation can show up in unexpected ways. For instance you may find the output of the following lines of code surprising.

float f = 16777216; 
cout << f << " " << f+1 << endl;

Against expectations this code prints the value 16777216 twice.

What happened? According to the IEEE specification for floating point arithmetic, a float type is 32 bits wide. Twenty four of these bits are devoted to the significand (what used to be called the mantissa or also coefficient) and the rest to the exponent. The number 16777216 is 224 and so the float variable f has no precision left to represent f+1.
A similar phenomena would happen for 253 if f were of type double because a 64-bit double devotes 53 bits to the significand.

The following code prints 0 rather than 1.

x = 9007199254740992; // 2^53 
cout << ((x+1) - x) << endl;

We can also run out of precision when adding small numbers to moderate-sized numbers. For example, the following code prints “Sorry!” because DBL_EPSILON (defined in float.h) is the smallest positive number ε such that 1 + ε ≠ 1 when using double types.

x = 1.0;
y = x + 0.5 * DBL_EPSILON;
if (x == y)
    cout << "Sorry!" << endl;

Similarly, the constant FLT_EPSILON is the smallest positive number ε such that 1 + ε ≠ 1 when using float types.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

ruldrurd
Powered by WordPress, Content and Design by Kai Bellmann
Entries (RSS) and Comments (RSS)