Tuesday, January 3, 2012

Articles that start fiction ideas, #3: A couple of things by me over in the statistical semiotics world (or that number stuff Barnes does), and the surprising fact that math has consequences

The most common places I blog about the technical side of my analytic work are at The CMO Site and All Analytics, two blogs over in the UBM system.  For those of you who are interested (and even those who are not) it happened that today I had a piece in each blog, and that led me to some thoughts about implications.  If any of the following is incomprehensible, you might want to skip over to the All Analytics piece about how to get started in hierarchical clustering (there are more ideas in the comments) or The CMO Site piece about Experian's  Clone My Customer marketing tool. (Assuming you like incomprehensibility ... but then you read this blog, so why wouldn't you?)

Here's the thought.  Slowly, steadily, the math people and the stats people are finding ways to bridge the chasm between their approaches to the universe.  (If you didn't know there was a chasm, here's your quick summary: statsfolk think of numbers as representing reality, and try to see through them to the reality; the math tribe sees numbers as a kind of separate reality, which they try to see.  From this arise many questions about what we can know and whether knowing anything is actually interesting, and eventually we end up with the kinds of conundrums that make modern physics so hard for the rest of us to follow).

Call the math world the realm of formal solutions: you prove the Pythagorean theorem and you now know it's true for right triangles on a continuous plane; later you prove it's a special  case of a rule that is true for all triangles on a continuous plane; later you show the planar case is special and there's a more general case for all regular surfaces, and then for all surfaces, and so on.   

The stats world is the world of empirical solutions: you measure a thousand right triangles in the lab, estimate an extended polynomial function, find that the coefficients for the squared sides are the only ones significantly different from zero, and that they don't differ significantly from 1, and write the Pythagorean theorem, plus or minus n%.  Later you add cases that are not right triangles, not planar, not regular, etc.

Now here's the thing.  Look at what Experian is doing: it's just about the purest empirical method there is.  You know you like the nuts from this tree, so you look through the tree index, find the similar trees, and go pick nuts there.  And advances in clustering are making this more and more powerful as a tool, because (as I said in the All Analytics comments) they're beginning to be able to discover and refine more complex rules. 

Notice that the rules found by statistics are not at all the same thing as the ones found by math; we know the Pythagorean theorem would be true for a right triangle drawn on a sheet of tantalum  in the Andromeda Galaxy if the sheet is flat enough because we know the math; we only infer it if our source of the Pythagorean theorem is empirical measurements here.  But for a surveyor on Earth ... there's no difference that matters about how we know.

So imagine "Clone My Rule" -- a nonhierarchical search program that says, in effect, shucks, that Second Law of Thermodynamics is interesting, let's see how many other laws like it we can find.  Where Claude Shannon noticed that there is a connection between information noise/signal and thermodynamic entropy (whose implications are still being worked out almost 70 years later) because he saw that the equations are alike, Clone My Rule would just find that the shapes the data made in some imaginary space were alike; the "why" of the equations would be unnecessary.

In effect, the tools available might make it possible to think about a world where we've found all the magic but don't understand any of it; the machines apply all that empirically understood stuff for us, so that we just wish for a thing and the gigantic system in which we are all embedded says, "Here's the price, still want it?" or "Not available at this time at any price, but we'll keep looking."  You just ask it for ... oh, I don't know, world peace, self-licking ice cream cones, dragons, faster than light travel, perfect sandwiches. 

My short story Things Undone* dealt with an alternate history where the math could give you all the answers, along with complete understanding -- and a human world that was bluntly horrible, I suppose my subconscious mind's way of saying that if we understood how to do everything we wanted to do, unfortunately, we would. Now I find myself wondering about the world where statistics gives us a similar gift -- and I'm wondering if that might not just be our old fictional friend, the Singularity, but with one joker in the deck: the machine can always instantly write exactly the book you want to read, but then it will have to read it to you.



*in mobi and epub, also findable on Amazon and B&N)