Perfect Alignment is Impossible
Perfect Alignment is Unnecessary
None of this is certain, I’m thinking out loud here (albeit over the course of years and maybe a few mid-paragraph rephrasing edits): salt accordingly.
First inkling on this came from a comment about the Halting Problem, noting that despite its apparent specificity, a lot of problems turn out to reduce to it. Antivirus, for one. I summarize the Halting Problem as “You can’t always predict whether an algorithm complicated or powerful enough to be useful will loop or stop.” The proof involves self-reference: if you have a predictor candidate, feed it to itself, and also feed it a version of itself with one bit flipped. One will be wrong.
Public key cryptography was the next inkling. Different types of problems can be easier or harder to calculate, and that makes them useful for secret messages even when the way you’re sending them is obvious.
Third, morality is a kind of public key cryptography. The secret it’s sending is “I’m not a backstabbing nutjob”. Which is a simple enough message, when you put that way, except that backstabbing nutjobs also have brains, and tend to devote them to becoming able to backstab you despite your defenses, using things like “lying”.
Humans have roughly similar-sized brains to each other (citation needed); time spent learning to effectively lie to other people is time not spent learning to effectively help other people. Morality, and religion in general (or vice versa, depending on which you think is a subset of the other), are anti-nutjob defense mechanisms that work by making lying harder to do subtly. “Thou Shalt Not Kill” shows up and gets broken in basically every major moral system, and the specifics on where it gets broken are mostly opaque because anything too transparent is in danger from the rules-lawyer flavor of backstabbing nutjobs.
This does not mean any specific religion or belief system is true, certainly not in all particulars, reality cannot be simulated with infinite precision (see the Halting Problem) and besides, the errors are how you filter out the secret sociopath atheists (and also the regular atheists, but that’s a price most religions are willing to pay). Original term there is “shibboleth”. It does mean the biggest errors will get filed down over time through evolutionary pressure, assuming we can get that time.
Okay, so, AI though? It’s big enough to fool everyone all the time, right? At least, the kind of AI that we’re worried about killing everyone.
Not wrong; also not yet what we have. Humans already get fooled a bunch by each other and ourselves, never mind some hypothetical superintelligence. We’ve survived so far, in aggregate, because our failures have not, up until roughly the invention of nuclear weapons, been extinction-scale (however much the remaining Iroquois would like to protest re: smallpox).
There are no guarantees. There never have been.
Also, if I try to keep rewriting this or even expanding on it I’m never going to actually publish, and so it’s going to go the way of two other drafts on this approximate topic I’ve been sitting on for years now. So, hitting “continue” and attempting to put this in contact with some more actual reality.
