(1) gives a good overview of IEE 754.
Float.floatToIntBits(0.1f)
gives 3dcccccd
.
#include <stdio.h>
int main(int argc, char *argv[])
{
float f = 0.1;
printf("0x%8x\n", *((int*) &f));
printf("0x%x\n", *((unsigned char*) &f) & 0xff);
return 0;
}
The above code gives:
0x3dcccccd
0xcd
The memory layout of 0.1 in little endian format is:
Address
lower cd
| cc
| cc
V 3d
higher
Float.toString’s Javadoc explains how to get the string representation of a Float.
The following Java code:
System.out.println(0.1);
System.out.printf("%f\n", 0.1);
System.out.printf("%.14f\n", 0.1);
float f = 0.1f;
System.out.printf("%f\n", f);
System.out.printf("%.14f\n", f);
gives:
0.1
0.100000
0.10000000000000
0.100000
0.10000000149012
The following C code:
printf("%f\n", 0.1);
printf("%.14f\n", 0.1);
printf("%f\n", f);
printf("%.14f\n", f);
gives:
0.100000
0.10000000000000
0.100000
0.10000000149012
12.12.5 Floating-Pointersions
says that the default precision for %f
is 6. So 0.1
with %f
produces
0.100000
.
Decimal 0.1
can’t be represented precisely in IEEE 754.
0.1 x 2 = 0.2 0
0.2 x 2 = 0.4 0
0.4 x 2 = 0.8 0
0.8 x 2 = 1.6 1
0.6 x 2 = 1.2 1
0.2 x 2 = 0.4 0
(0.1)10 = (0.00011)2. 0011 is the repeated part.
Denormal Numbers
(1) says:
Many older floating point standards disallow such denormal numbers, leading to a gap between zero and the smallest representable positive number that is larger than the gap between the two smallest representable positive numbers.
If denormal numbers are disallowed, a = 1.00000000000000000000000 * 2^min
(min
is -127
for single presion floating numbers) is the
smallest representable positive number. b = 1.00000000000000000000001 * 2^min
is the next smallest representable positive number.
The gap between zero and the smallest representable positive number:
gap_a_0 = a - 0 = 1.00000000000000000000000 * 2^min
The gap between the two smallest representable positive numbers:
gap_b_a = b - a = 0.00000000000000000000001 * 2^min
It is obvious that gap_a_0
is much larger thant gap_b_a
.
If denormal numbers are allowed, the denormal number
0.00000000000000000000001 * 2^(min+1)
is the smallest representable positive
number. And the denormal number 0.00000000000000000000002 * 2^(min+1)
is the next smallest representable positive number.
The gap between zero and the smallest representable positive number:
gap_a_0 = a - 0 = 0.00000000000000000000001 * 2^(min+1)
The gap between the two smallest representable positive numbers:
gap_b_a = b - a = 0.00000000000000000000001 * 2^(min+1)
Now gap_b_a
is equal to gap_a_0
.
Reference
- IEEE 754 Converter
- Decimals (fractions) to Binary conversion - Part 2
- Converting Decimal Fractions to Binary
- Decimal to Floating-Point Converter
- IEEE-754 Analysis
- (1) “5.2.1 Computer Arithmetic of Programming Language Pragmatics”