31 August 2015

(1) gives a good overview of IEE 754.

Float.floatToIntBits(0.1f) gives 3dcccccd.

#include <stdio.h>
int main(int argc, char *argv[])
{
  float f = 0.1; 
  printf("0x%8x\n", *((int*) &f));
  printf("0x%x\n", *((unsigned char*) &f) & 0xff);
  return 0;
}

The above code gives:

0x3dcccccd
0xcd

The memory layout of 0.1 in little endian format is:

Address
 lower     cd
   |       cc
   |       cc
   V       3d
 higher

Float.toString’s Javadoc explains how to get the string representation of a Float.

The following Java code:

    System.out.println(0.1);
    System.out.printf("%f\n", 0.1);
    System.out.printf("%.14f\n", 0.1);
    float f = 0.1f;
    System.out.printf("%f\n", f);
    System.out.printf("%.14f\n", f);

gives:

0.1
0.100000
0.10000000000000
0.100000
0.10000000149012

The following C code:

  printf("%f\n", 0.1);
  printf("%.14f\n", 0.1);
  printf("%f\n", f);
  printf("%.14f\n", f);

gives:

0.100000
0.10000000000000
0.100000
0.10000000149012

12.12.5 Floating-Pointersions says that the default precision for %f is 6. So 0.1 with %f produces 0.100000.

Decimal 0.1 can’t be represented precisely in IEEE 754.

0.1 x 2 = 0.2      0
0.2 x 2 = 0.4      0
0.4 x 2 = 0.8      0
0.8 x 2 = 1.6      1
0.6 x 2 = 1.2      1
0.2 x 2 = 0.4      0

(0.1)10 = (0.00011)2. 0011 is the repeated part.

Denormal Numbers

(1) says:

Many older floating point standards disallow such denormal numbers, leading to a gap between zero and the smallest representable positive number that is larger than the gap between the two smallest representable positive numbers.

If denormal numbers are disallowed, a = 1.00000000000000000000000 * 2^min (min is -127 for single presion floating numbers) is the smallest representable positive number. b = 1.00000000000000000000001 * 2^min is the next smallest representable positive number.

The gap between zero and the smallest representable positive number:  
  gap_a_0 = a - 0 = 1.00000000000000000000000 * 2^min
The gap between the two smallest representable positive numbers:      
  gap_b_a = b - a = 0.00000000000000000000001 * 2^min

It is obvious that gap_a_0 is much larger thant gap_b_a.

If denormal numbers are allowed, the denormal number 0.00000000000000000000001 * 2^(min+1) is the smallest representable positive number. And the denormal number 0.00000000000000000000002 * 2^(min+1) is the next smallest representable positive number.

The gap between zero and the smallest representable positive number:  
  gap_a_0 = a - 0 = 0.00000000000000000000001 * 2^(min+1)
The gap between the two smallest representable positive numbers:      
  gap_b_a = b - a = 0.00000000000000000000001 * 2^(min+1)

Now gap_b_a is equal to gap_a_0.

Reference