Base conversion can be used to compress number representations, and can be used to hide the real number representation. The algorithm is well known, anyone can program that algorithm, and it is not so hard to handle. I was playing with Haskell writing that algorithm to check how effective is that algorithm to create URL shorteners. My idea is quite simple, you have an URL database ID, which is an integer number, indexed as many entries on the database, but once the database starts growing, the number can reach a length that is not so easy to remember. For example the ID `999 999 999 999` is not so easy to remember.

The basic algorithm is to have an ASCII alphabet as representation of positional numeric system symbols, and then to apply the standard base conversion algorithm using that alphabet to get the string representation of the number, so *encoding* and *decoding* algorithms in Haskell can be written as follows.

alpha :: String alpha = "0123456789" ++ "abcdefghijklmnopqrstuvwxyz" ++ "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ++ "._" alphaLength :: Integer alphaLength = fromIntegral $ length alpha aenc :: Integer -> String aenc i = let fin = fromIntegral alphaLength encn :: Integer -> String -> String encn x y | x == 0 = reverse y | otherwise = let (n, m) = x `divMod` fin r = alpha !! fromIntegral m in encn n $ y ++ [r] in encn i [] adec :: String -> Integer adec i = let fin = alphaLength lns = fromIntegral $ length i decn :: String -> Integer -> Integer -> Integer decn (x:xs) y z = let pw, el, nm :: Integer pw = lns - (fromIntegral z + 1) el = fromIntegral $ fromJust $ elemIndex x alpha nm = (el * (fin ^ pw)) + fromIntegral y in decn xs nm (z + 1) decn [] y _ = y in decn i 0 0

The *encoding* and *decoding* algorithms above are using a numeric base of *64*, and the alphabet is case-sensitive. So, for the number `12 345` we have the string representation on numeric base 64 as `“30V”`. The number `999 999 999 999` has the representation as `“ezkFg__”` and `999 999 999 999 999` has the representation as `“3znWAND__”`. The trick on the base conversion on numbers with that size is made thanks to the data types used on the base conversion. The standard integer and long data types in **C** have up to 64 bits and probably 128 bits in some systems. Still they have a limit. With the `Integer` data type in Haskell there is almost no limit because they are *“arbitrary-precision integers”*. To use that kind of numbers on **C**, you must install a library like *apcalc*.

Also you can hide the number representation if you keep your alphabet *private* and you modify the positional symbols, where you can have a very different number representation of any number using this kind of base conversion, depending on your alphabet. Another option is to use bit representations, so you can extend the base alphabet to other ASCII symbols and have an extended base conversion, for example using 128 printable characters you can have a base 128 encoded number.

This encoding is very similar to *Base N* encoding, but the algorithm is quite different. It is being described on the RFC 4648. I have implemented that algorithm but any alphabet and any base encoding on the Caffeine Project — you can use base 128 if you want — and the difference with other implementations is the fact that it can handle *streaming data*, so you can create base encoded *streams*, and do not requires fixed size buffers. The standard functions can encode fixed size buffers, and `encode_stream` and `decode_stream` can encode and decode streaming buffers. You can see the source code here.

Quite well.