Proph3t's homepage

The Human Alignment Problem

Author's note: I have previously written about this topic here, although I called it 'the human coordination problem' then. This is my attempt to write it simpler, so that even the Emacs users among you can understand it ;).

In Isaac Asimov's Foundation series, there's a planet called Gaia. On Gaia, everyone ('everyone' refers to people, animals, and even some inanimate matter) is part of a group consciousness. Because of this, each being does its best to maximize for the goals of the planet as a whole.

From an alignment perspective, this is an ideal planet. We can semi-formally state that a population P is aligned when each of its p members optimizes for the expected value of the population as a whole, or ∀p∈P, p maximizes for EV(P).

Unfortunately, humans don't work like ps. We aren't born as machines that optimize for social welfare. Even those that signal their p status are often just posing.

So we use a number of hacks to help us manage this problem. Or at least some of us do. Somalia, North Korea, and Turkmenistan are some cautionary tales of failures to manage the human alignment problem. The three main hacks are cultures / religions, philosopher-kingdoms, and "masses-elites-experts" institutions.

Hack #1: Cultures and Religions

The most obvious approach is to try to re-program humans to be more like ps, to be more in line with the Gaian ideal. The most successful versions of this look something like:

Figure out a good set of rules that at least rule out the most anti-social activities. Thou shalt not kill, Thou shalt not steal, et cetera et cetera.
Tell people that if they follow these rules, they receive some huge reward. They go to heaven. They get to sleep with virgins. They reach enlightenment. The grander and the less verifiable, the better. Punishment for not following the rules should be severe and also non-verifiable.
For those that question the epistemic roots of your rules and their accordant rewards and punishments, laugh at them, ostracize them, or when all else fails give them a pokey pokey with a swordy swordy.

This hack, always questionable in its effectiveness, is working less and less in the 21st century.

Hack #2: Philosopher-Kingdoms

The second hack, invented by this guy from a long long time ago called Plato (credit to his parents, this is a pretty sick name), is philosopher-kingdoms. Plato reasoned that most people were dumb and selfish but that some people were smart and altruistic. According to him, we solve the alignment problem when we:

Identify said smart and altruistic people (philosophers)
Give them all the power (make them kings)

The problem, of course, is that (1) is pretty hard. If 1 out of 100 people is smart and altruistic, even if you can sort out 90% of the bad ones you're still left with a 1:9 good:bad ratio.

Hack #3: Masses elect elites who oversee experts

This hack takes a much more pragmatic approach. Instead of assuming that we'll be able to identify philosophers, this approach tries to evoke philosopher-like behavior from non-philosophers.

Since politicians are elected by people, they will be forced to serve those people or lose their job, or so the theory goes. Since boards of directors are elected by shareholders, they will be forced to serve the shareholders' interests or lose their job, or so the theory goes.

Unfortunately, the theories are wrong. But they're also not off by so far as to make this hack unworkable. So it's become the predominant one in modern societies.

Hack #4?: DAOs

Decentralized autonomous organizations (DAOs) are an attempt to hack around this problem using a wombo combo of code and game theory. Bitcoin, Moloch DAO, and The Meta-DAO are examples. Since the design space is large, I'm going to focus my discussion on what I know best: the Meta-DAO. Vitalik Buterin covers the topic in greater detail here, if you're interested.

So the Meta-DAO is closer to a philosopher-kingdom than it is a 'masses-elites-specialists' institution. Just like a philosopher-kingdom, the Meta-DAO centralizes control in a hopefully-benevolent philosopher. The difference is that the philosopher is not a human being but a market. All of the important decisions create markets in assets that represent welfare, and a piece of code called 'autocrat' makes decisions purely based on the prices of those assets over time. If these details sound fuzzy, you can either read more here and here or just accept that DAOs are attempts to use code and game theory to bring us closer to an ideal society.

DAOs are still in the experimental phase. We shall see whether this hack enters the great arc of history!

Conclusion

The human alignment problem arises from a difference between how humans would ideally behave and how they actually behave. Historical hacks to cope with this problem include cultures / religions, philosopher-kingdoms, and 'masses-elites-experts' institutions. DAOs are a new experimental way to cope with this problem, possibly superior to the prior hacks.