31 August 2015

(1) gives a good overview of IEE 754.

Float.floatToIntBits(0.1f) gives 3dcccccd.

#include <stdio.h>
int main(int argc, char *argv[])
{
float f = 0.1;
printf("0x%8x\n", *((int*) &f));
printf("0x%x\n", *((unsigned char*) &f) & 0xff);
return 0;
}


The above code gives:

0x3dcccccd
0xcd


The memory layout of 0.1 in little endian format is:

Address
lower     cd
|       cc
|       cc
V       3d
higher



Float.toString’s Javadoc explains how to get the string representation of a Float.

The following Java code:

    System.out.println(0.1);
System.out.printf("%f\n", 0.1);
System.out.printf("%.14f\n", 0.1);
float f = 0.1f;
System.out.printf("%f\n", f);
System.out.printf("%.14f\n", f);


gives:

0.1
0.100000
0.10000000000000
0.100000
0.10000000149012


The following C code:

  printf("%f\n", 0.1);
printf("%.14f\n", 0.1);
printf("%f\n", f);
printf("%.14f\n", f);


gives:

0.100000
0.10000000000000
0.100000
0.10000000149012


12.12.5 Floating-Pointersions says that the default precision for %f is 6. So 0.1 with %f produces 0.100000.

Decimal 0.1 can’t be represented precisely in IEEE 754.

0.1 x 2 = 0.2      0
0.2 x 2 = 0.4      0
0.4 x 2 = 0.8      0
0.8 x 2 = 1.6      1
0.6 x 2 = 1.2      1
0.2 x 2 = 0.4      0


(0.1)10 = (0.00011)2. 0011 is the repeated part.

### Denormal Numbers

(1) says:

Many older floating point standards disallow such denormal numbers, leading to a gap between zero and the smallest representable positive number that is larger than the gap between the two smallest representable positive numbers.

If denormal numbers are disallowed, a = 1.00000000000000000000000 * 2^min (min is -127 for single presion floating numbers) is the smallest representable positive number. b = 1.00000000000000000000001 * 2^min is the next smallest representable positive number.

The gap between zero and the smallest representable positive number:
gap_a_0 = a - 0 = 1.00000000000000000000000 * 2^min
The gap between the two smallest representable positive numbers:
gap_b_a = b - a = 0.00000000000000000000001 * 2^min


It is obvious that gap_a_0 is much larger thant gap_b_a.

If denormal numbers are allowed, the denormal number 0.00000000000000000000001 * 2^(min+1) is the smallest representable positive number. And the denormal number 0.00000000000000000000002 * 2^(min+1) is the next smallest representable positive number.

The gap between zero and the smallest representable positive number:
gap_a_0 = a - 0 = 0.00000000000000000000001 * 2^(min+1)
The gap between the two smallest representable positive numbers:
gap_b_a = b - a = 0.00000000000000000000001 * 2^(min+1)


Now gap_b_a is equal to gap_a_0.