3.3 Number encoding *¶
Tip
In this book, chapters marked with an asterisk '*' are optional readings. If you are short on time or find them challenging, you may skip these initially and return to them after completing the essential chapters.
3.3.1 Integer encoding¶
In the table from the previous section, we observed that all integer types can represent one more negative number than positive numbers, such as the byte
range of
Firstly, it's important to note that numbers are stored in computers using the two's complement form. Before analyzing why this is the case, let's define these three encoding methods:
- Sign-magnitude: The highest bit of a binary representation of a number is considered the sign bit, where
represents a positive number and represents a negative number. The remaining bits represent the value of the number. - One's complement: The one's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by inverting all bits except the sign bit.
- Two's complement: The two's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by adding
to their one's complement.
Figure 3-4 illustrates the conversions among sign-magnitude, one's complement, and two's complement:
Figure 3-4 Conversions between sign-magnitude, one's complement, and two's complement
Although sign-magnitude is the most intuitive, it has limitations. For one, negative numbers in sign-magnitude cannot be directly used in calculations. For example, in sign-magnitude, calculating
To address this, computers introduced the one's complement. If we convert to one's complement and calculate
Additionally, there are two representations of zero in sign-magnitude:
Like sign-magnitude, one's complement also suffers from the positive and negative zero ambiguity. Therefore, computers further introduced the two's complement. Let's observe the conversion process for negative zero in sign-magnitude, one's complement, and two's complement:
Adding byte
length being only 8 bits, the carried-over
One last puzzle is the byte
, with an additional negative number,
However, the two's complement
As you might have noticed, all these calculations are additions, hinting at an important fact: computers' internal hardware circuits are primarily designed around addition operations. This is because addition is simpler to implement in hardware compared to other operations like multiplication, division, and subtraction, allowing for easier parallelization and faster computation.
It's important to note that this doesn't mean computers can only perform addition. By combining addition with basic logical operations, computers can execute a variety of other mathematical operations. For example, the subtraction
We can now summarize the reason for using two's complement in computers: with two's complement representation, computers can use the same circuits and operations to handle both positive and negative number addition, eliminating the need for special hardware circuits for subtraction and avoiding the ambiguity of positive and negative zero. This greatly simplifies hardware design and enhances computational efficiency.
The design of two's complement is quite ingenious, and due to space constraints, we'll stop here. Interested readers are encouraged to explore further.
3.3.2 Floating-point number encoding¶
You might have noticed something intriguing: despite having the same length of 4 bytes, why does a float
have a much larger range of values compared to an int
? This seems counterintuitive, as one would expect the range to shrink for float
since it needs to represent fractions.
In fact, this is due to the different representation method used by floating-point numbers (float
). Let's consider a 32-bit binary number as:
According to the IEEE 754 standard, a 32-bit float
consists of the following three parts:
- Sign bit
: Occupies 1 bit, corresponding to . - Exponent bit
: Occupies 8 bits, corresponding to . - Fraction bit
: Occupies 23 bits, corresponding to .
The value of a binary float
number is calculated as:
Converted to a decimal formula, this becomes:
The range of each component is:
Figure 3-5 Example calculation of a float in IEEE 754 standard
Observing Figure 3-5, given an example data
Now we can answer the initial question: The representation of float
includes an exponent bit, leading to a much larger range than int
. Based on the above calculation, the maximum positive number representable by float
is approximately
However, the trade-off for float
's expanded range is a sacrifice in precision. The integer type int
uses all 32 bits to represent the number, with values evenly distributed; but due to the exponent bit, the larger the value of a float
, the greater the difference between adjacent numbers.
As shown in Table 3-2, exponent bits
Table 3-2 Meaning of exponent bits
Exponent Bit E | Fraction Bit |
Fraction Bit |
Calculation Formula |
---|---|---|---|
Subnormal Numbers | |||
Normal Numbers | Normal Numbers | ||
It's worth noting that subnormal numbers significantly improve the precision of floating-point numbers. The smallest positive normal number is
Double-precision double
also uses a similar representation method to float
, which is not elaborated here for brevity.