A slide rule is an ancient mechanical calculator used for multiplying numbers.
It’s based on applying the Law of Logarithm of a Product, which says
log(a) * log(b) = log(a * b)
.
By drawing a simple log-scale ruler, you can enable a dumb
ape to multiply two numbers by simply adding the distances between them.
Representing floating point numbers in a computer is a game of tradeoffs. The more bits you use, the more accuracy you get. But bits are expensive. Where you use those bits matters though. What if you represented these numbers using a scheme that concentrated its accuracy in only the most important range?
In typical floating point formats, a number is broken into a mantissa and exponent. For example, the number 12.34 could be represented as 1.234 x 10^1. The mantissa holds the significant digits (1.234) while the exponent represents the magnitude (10^1).
With log encoding, more bits are used for the exponent and fewer for the mantissa. This means the exponent has a wider range of possible values. As a result, numbers near 1 can be represented very precisely by adjusting the exponent.
For example, a 4-bit log encoding could represent 1.0, 1.1, 1.2, 1.3 by storing the exponents 0, 1, 2, 3 rather than trying to store precise mantissas. This concentrates the accuracy in a narrow range at the cost of precision at extremes.
These two ideas are combined in one section of this talk by Bill Dally of nvidia. (Only a very small part, there is a lot more to this talk worth watching.)
Less bits and faster multiplication. Exactly the sort of thing to make the AI go brrrr.