There
is a sentence in "Silence Like Diamonds" -- oh,that? How nice of you to ask. The fourth episode is up now, over at Light Reading,
and if this is your first time here you can find my explanation here and Light Reading's explanation there. If you really just wanted to read a good story, go
read! This will wait!
(Smiling and waving as a few folks leave the room)
RUN FOR YOUR LIFE, IT'S ABOUT TO BE MATH IN HERE!
(crouching behind podium until pandemonium subsides, addressing the few remaining cool people in the front row)
All
right then.
As I was saying, there's a sentence in "Silence Like
Diamonds" that draws a lot of questions from readers, so I thought today
I'd talk about that. It's in Episode 3,
which ran last Friday:
Ever since the Yan-Dimri fast factorization algorithm had flipped the advantage from the encryptors to the cryptanalysts, only isolated systems could be really secure (at the cost of being really useless).
I'm
happy to note that in the 72 hours after that, "Yan-Dimri fast factorization" drew a few search
hits on Google, so to begin with: I made that up. We call this stuff science
fiction, you know?
But
I'm guessing that the reason why it drew Google attention was because there are
plenty of people who recognize the implications in that sentence; in fact I'm a
bit surprised that more people don't.
So
to begin at the beginning: since about the 16th century in the European
cultural sphere, and no later than 1900 everywhere, there's been a relentless
arms race between encryptors (codemakers) and cryptanalysts (codebreakers).
Mathematical tools have gotten better and better on both sides, and theoretical
understanding of encryption/decryption has advanced rapidly, driven first by
the competing great powers in the last half of the 1800s, and then by war and
cold war, and now by a huge international information market.
The
balance began to swing -- apparently permanently -- toward encryption with the
introduction of keyword, master key, etc. systems. I'm skipping over a vast
amount of detail here, so real crypto people please forgive all the variations
and interesting sidelines that I might introduce. Here's a too-brief and too-undetailed version of how we got where we are now, and how the world of "Silence Like Diamonds" is different.
Think of the code that small children used to use, where A=1, B=2, etc. Suppose we
encode a block of text:
HOW
ALL OCCASIONS DO INFORM AGAINST ME AND SPUR MY DULL REVENGE
For
our purposes in this example, I'll use 0 for space. Encoded in that simple
fashion, the message is now:
8-15-23-0-1-12-12-0-15-3-3-1-19-9-15-14-19-0-4-15-0-9-14-6-15-18-13-0-1-7-1-9-14-19-20-0-13-5-0-1-14-4-0-19-16-21-18-0-13-25-0-4-21-12-12-0-18-5-22-5-14-7-5
To
read it, then, the recipient would simply replace the numbers with letters (and
zeros with spaces).
Of
course that cipher would be pathetically
easy to break, especially with modern gear like any old spreadsheet. In fact, if you knew it was that simple a code, you'd probably just set up a list of permutations in your spreadsheet (a row where you tried A=1 B=2, a row where you tried B=0, C=1, etc) and read down a few lines till you saw an English sentence.
Yet systems like that
were used down through the American Civil War; you can find Edgar Allen Poe
explaining how to break that sort of code in "The Gold Bug".
Many
other ideas were tried, but the road that proved most profitable was the one
that leads through the Enigma machine, originally created for high-security
business cables in the 1920s and then adapted by the Germans during World War
II. Leaving aside all the fascinating
mechanics -- and they really are worth your while if you've never read the
story -- the math of it was simply that if instead of a simple substitution,
you had some number of encoding rules, and a key that told you which rule to use for each successive character, then every character of the clear text
would be represented by several different characters, eventually all the
possible ones, in the code. For example,
suppose we add the successive digits of
π
to the substitution above; then the
new code is:
11-16-27-1-6-21-14-6-20-6-8-9-28-16-24-17-21-3-12-19-6-11-20-10-18-21-21-3-3-14-10-14-14-21-28-8-17-6-9-8-15-10-9-22-25-30-21-7-18-26-0-9-29-14-12-9-25-9-31-9-18-12-14
Of
course it doesn't have to be
π
and it doesn't have to be addition; I could
have multiplied each by the successive digits of √5 and then subtracted successive
digits of √22, so that the message recipient would then decode by first adding successive
digits of √22, then dividing by the successive digits of √5, and finally substituting that simple numeric code. Important thing to note right here -- you'll see why later -- in this primitive version, it takes just about as long to encode as it does to decode.
This
is a considerable improvement over the simple substitution ciphers; Herbert Yardley's codebreakers, in the United
States's Black Chamber around World War I, would have had to work at it for a few
days, especially with message this short.
Notice that the E's in "revenge" are now represented by 9, 9, and
14. The spaces are represented by
1,6,3,3,8, 9,9,7,0, and 9 successively, the double L in all is 21-14 but the
double L in dull is 14-12, and so forth. With a longer sample (say 1500 characters instead of the 62 here) and a long
afternoon (and ideally a spreadsheet), though, a good amateur cryptographer
could crack this while barely breaking a sweat. (I'm skipping how, here; let me recommend Simon Singh's THE CODE BOOK as a good place to start a fascination with cryptography -- I wish it had been around when I first became interested, confidential information ago).
But
rudimentary as my little demonstration here is, it introduces the basic tricks of modern encryption:
there's a key, there's an operation performed by combining the message with the
key, and there's a resulting coded message, which is hard to read without the
key, but easy to read by applying the key through a defined operation. As long as the sender and the intended receiver both have the key, and the
would-be interceptor doesn't, the would-be interceptor has a fairly-tough-up-to-impossible math
problem to crack.
Some
things make the problem harder than others. There are patterns in every
language -- ETAOINSHRDLU is a famous one, that is, in a long enough passage of
random English, E will be the most frequent letter, followed by T, A, etc. If
there is also a pattern in the key -- say, for example, it is the digits of
π,
beginning over again with 3 for every message -- then in a big enough sample,
the combined pattern will emerge and can be teased out, especially if something
is a frequent part of a message (you might have seen The Imitation Game and
remember what a difference it made when British cryptographers that realized Heil Hitler would occur frequently in the encrypted
messages).
But
what a complicated world is hidden in that phrase "the sender and the
intended receiver both have the key, and the would-be interceptor doesn't."
A perfect, unbreakable key would simply be a
list of random digits held by both sender and receiver, as long as the list was
used once and then thrown away; with no pattern in the key, there'd be no
breaking it. But that would require that
every spy, warship, infiltrator, or whatever go out with an extremely long
random number list of his/her own, in some cases smuggling it through enemy lines, and never accidentally losing it or having it confiscated. Whenever
they can, modern encrypters do use those unbreakable "one time pads"
("pad" because the original WW2 version was a printed, gridded pad of
sheets, on each of which you wrote your message on a top line, added the random numbers from the
middle line, and sent the resulting code from the bottom line, destroying it as
soon as you finished. For a really fascinating story about how that all worked
in practice, see Leo Marks's BETWEEN SILK AND CYANIDE).
Suppose,
though, that you can't send a one-time pad with someone. Perhaps they will be
going elsewhere indefinitely, or forever. Perhaps a one-time pad is too
dangerous for them to carry (since it's pretty obviously incriminating
evidence). What can you do instead?
For
World War II and most of the Cold War, the answer was highly unsatisfactory:
you made hard-to-figure-out keys that didn't look like keys. Another example
from The Imitation Game, (a great movie if you're not trying to substitute it
for reading Andrew Hodge's brilliant biography Alan Turing: The Enigma): the
Soviet mole, who is working inside a far-above-top-secret facility into which
he dare not take any suspicious object, was using passages from the King James
Bible as his key. Though it still
produces some very difficult code to break manually, presumably there will be
recurring patterns produced by overlaps of frequent words. For example, if
GERMAN and BEGAT line up every so often (one is likely to be frequent in the
message, the other in the key), and if what we're doing is adding the numbers
for the two letters, subtracting 26 if necessary, and then converting to a
letter, the letter combination IJYNU would show up extra frequently.
The
genius of the Enigma machine was that the key wasn't written out as a key, but an initial setting
for a machine that then automatically generated the key as needed; each time
you pushed a key to encode a letter, three internal rotors turned by specified
amounts, reorganizing the substitution of the next letter. To read the message you had to know which three (of a possible 5)
rotors were in which order at which starting positions, plus how a plugboard was arranged, and since the
pattern was changed every night at midnight, the Germans thought it couldn't be
broken at all, given the huge number of possible combinations. But again, it
was breakable because there was a pattern to the Enigma's generation of the key,
and a pattern to the German messages underneath, and an enormous sample (when
there's that much of a war going on, and everyone is talking to everyone else,
the message volume is huge).
So
the Cold War began in a rough tie between encrypters and decrypters (since the
British had shared some of their codebreaking methods with us voluntarily and
others with the Soviets via espionage).
One-time pads were unbreakable (and still are and always will be as long
as the numbers are truly random) but you had to hand off the pad between sender
and receiver somehow, and the pad itself could be stolen, intercepted, lost,
etc. Generated keys had patterns in them and were therefore breakable, even
with enormously complex and sophisticated key-generation machines (and later,
software). If a government, agency, business, or other organization changed keys often enough, and the key wasn't reused too
much, and the process of generating it was not too obvious or wasn't stolen by the opposition,
then most messages could be kept secret for long enough, most of the time, with
the codebreakers occasionally getting ahead of the game because they got
bigger, faster computers to run the breaking software on, or because the
encrypters made a mistake somewhere.
And
then in the 1970s some nice mathematicians came along and put the encrypters so far ahead that they've been there ever since. They figured out
a variety of processes using (what they hope are) one-way functions.
A
one-way function is simply a process with numbers that is easy to do in one
direction but hard to do in reverse. "Easy" and "hard"
might have a genuine universal meaning related to the P=NP question, but for
applied engineering purposes, it's always just "relative to the capabilities of the
people with the secret and the people trying to get the secret."
In
particular, the encryption mathematicians discovered a process in which it was
quite easy to encode using the digits of a very large number (say two thousand bits, which is about a 600-digit number in decimal) that was a semi-prime, i.e. the product of two primes, but
the process only worked backwards with enormous effort, so the message was hard
to decode -- unless you had the "trap door," the value of one of the
two primes, in which case it was easy.
So
the Chief of the Global GoodORBad Guys Org (GG|BGO) could simply broadcast the
semiprime and the encryption rule, and anyone who copied it down could send him a message, with no
one else being able to read it. Using
one of the two prime factors that he had multiplied to create the semi-prime,
the Chief could decode that same message quickly and easily. If he kept those factors
secret, to break the code and read his agents' messages, someone would have to
factor the public key.
Furthermore, his agents could simply mail him a public key (presumably generated on their laptops or phones) from any anonymous mailbox or email, keeping their trapdoor primes somewhere concealed (a few hundred digits is not hard to conceal, after all), and when he wanted to reply he could send it by any public channel. (Ever wonder what those numbers stations on the shortwave might be?)
And
there matters have rested, because factoring is a hard operation. You can see
this a little by just looking at the time it takes to factor a semi-prime by
hand: 35 is a semi-prime, and its primes are 5 and 7, as anyone who remembers
the multiplication table figured out in less time than it took you to read
this. How about 2491? How long did it take you to find 47*53? (highlight to
read) Now, if you're really a demon at
factoring, try 307,961. (If you really must know, that's 547 times
563).
Here's
the interesting thing, from a computing standpoint: not only is factoring a
hard operation but it gets much harder (in terms of machine time taken) the
bigger the number, and in modern crypto, it's just not practical at all.
So
that's why RSA and its many cousins, which rely on semiprimes, are nearly as
unbreakable as a 1-way pad. And a good thing too, because those methods of encryption are keeping thieves out of your bank account, your boss out of that naughty website you frequent, trolls out of taking over your i.d. and sending threatening notes to the president, and all the rest. True, if the white hats don't change the key often
enough, and the black hats can keep guessing for years or centuries, eventually
they might manage to factor that semiprime ... In fact a few
very limited tricks are known for doing just that on a few very special classes
of large semiprimes (look up Fermat's method for one). The public record so far
is for factoring a 232-digit number, which took hundreds of hours of computer
time across about two years, and was essentially just a very-well-thought out brute force search.
So
... since it's easy to create those really big semiprimes, and since factoring takes so long that the key
will change long before anyone factors the public key and gets in, it's an
encryption paradise. As long as the government doesn't make everyone give them
one of the factors and leave all of them in a pile on someone's desk over the
weekend we should be fine, right?
Except.
No
one has actually satisfactorily shown that there can't be a fast way to factor
a very large semiprime.
Except,
if it's really impossible, someone ought to have proved it's impossible by now,
given how big the economic incentive is.
Or if it is possible, you'd think someone would have done it.
And math problems can have a really long hang time for solution.
Remember,
Fermat's Last Theorem took 350 years. The four-color problem stood open for
about 130. This Wolfram article on unsolved problems contains many that have
been open for a century or more.
Right
now, somewhere out there, mathematicians who would like to have a name in the
history books are working on factoring large semi-primes, and other equally ambitious mathematicians are working on proving it's
impossible. (Technically, too, someone might show it to be possible without
showing how, but that idea didn't lead me to a story in whicht things blow up and people daringly do deeds of derring-do).
If
the factoring side wins the race -- which they could do at any time, and it's
always possible that a spy service somewhere might already have done so -- instantly,
"... there is nothing covered, that shall not be revealed; neither hid, that shall not be known. Therefore whatsoever ye have spoken in darkness shall be heard in the light; and that which ye have spoken in the ear in closets shall be proclaimed upon the housetops."
Everything.
All at once. Private bank accounts and passwords, credit card accounts, corporate
records, everything; the only limit will be how fast human beings can absorb
and use the information.
There
are other encryption methods possible, but most of them are also driven by
large semiprimes.
So
... in my imagined future world, two mathematicians named Yan and Dimri came up
with a fast-factorization algorithm somewhere between 2020 and 2025. It's a
world where all encryption, everywhere (except for one-time pads) is broken
quickly; encryption only slows corporate and political espionage down a little bit. And thus it's a world where
people like Yip, Yazzy, Dusan, and Markus are never short of work or things to
do; in a world where there are no good locks, there's always a job for a guard.
Do
I think that world will happen?
I
think no one yet knows (or at least no one yet admits they know) whether it
could. So, for the moment, it's perfect for science fiction.
And
if I were running a large operation's security, I might deputize a
math-literate person to watch the literature on semi-prime factorization.
Anyway,
enjoy the story, see you Friday with some other topic, and this time I got to
talk about two favorite subjects, math and fiction, so consider yourself lucky
that I only wrote this much.