I‘ve previously written about the LessWrong community’s tendency to take crazy ideas far more seriously than they deserve. A lot of what’s convinced me of that is my in-person experience with the community, in my last year of living in the Bay Area (I moved here November 2013). But a lot of that stuff is awkward to talk about online.
So in my last post on the subject, I vaguely gestured at things people might know about “if you’ve followed the LessWrong community online.” One of the things I had in mind was Roko’s Basilisk, something Eliezer Yudkowsky (the founder of LessWrong) infamously freaked out over, deleting the original thread and banning all discussion of the topic from LessWrong.
Because the original thread was deleted, it can be difficult to know if anything you hear about the basilisk incident online is accurate. One site that I thought used to have some pages with pretty good explanations appears to have taken that material down. The original thread is currently archived here, but I don’t know how long that will last.
So I’m making this post, partly so I’ll have something I know I’ll be able to refer people to if I want to talk about Roko’s Basilisk in the future. I’m also writing this because there was a reference to Roko’s Basilisk in the alt text of an xkcd strip last week, and it sparked some more discussion which included some seriously misleading statements by Eliezer and other members of the LessWrong community.
Oh, and just so you know, the ideas I’ll be discussing in this blog post have apparently given some people nightmares, and Eliezer apparently still believes that merely reading about them may put you at risk of eternal torture.
Got that? Okay.
I’m going to try to provide as much context as possible here, starting with why I care about this. Since moving to San Francisco a year ago, I’ve gotten heavily involved with the Bay Area’s rationalist and effective altruist communities. LessWrong, a blog/forum founded by Eliezer Yudkowsky, was one of the original pillars of those communities in the Bay Area.
Those communities have greatly enriched my life; I don’t know where I’d be today without them. But for the sake of the health of those communities, I think it’s important that we be able to criticize our heroes, and to see someone prominent in them trying to suppress criticize of themselves is scary.
(I should also note that I think the Bay Area rationalist and EA communities have already evolved a long way beyond their origins in Eliezer Yudkowsky’s personal fan club. If there’s one thing I regret about this post, it’s conflating MIRI and CFAR. They’re separate organizations now and deserve to have that recognized.)
Okay, so–for the sake of making this comprehensible to people coming in from outside the LessWrong bubble–who is Eliezer Yudkowsky? Eliezer is the founder of a non-profit that was originally called the Singularity Institute for Artificial Intelligence, then the Singularity Institute, and now the Machine Intelligence Research Institute (MIRI).
For many people, the reason they know about Eliezer is that for several years, he was a guest blogger at economist Robin Hanson’s blog Overcoming Bias. Eliezer’s posts at Overcoming Bias are what formed the seed material for LessWrong. Many other people know about Eliezer because of his Harry Potter and the Methods of Rationality, which present the ideas from his LessWrong blog posts in the form of Harry Potter fanfiction.
Eliezer’s blog posts from the period when he was most active as a blogger (i.e. when he was on Overcoming Bias plus the early days of LessWrong) are collectively know as “the Sequences.” The Sequences are frequently cited in quasi-reverent tones on LessWrong. This includes silliness like responding to any mention of global warming with shouts of “politics is the mindkiller!” I’ve read the Sequences in their entirety.
For a long time, the stated purpose of the Singularity Institute was more or less to build a superintelligent AI to take over the world–for the greater good, of course. Note that Eliezer has objected to this characterization of his work, but I think it’s accurate. His position was that someone building a superintelligent AI that takes over the world is more or less inevitable, therefore we can’t shrink from the responsibility of making sure that the AI that takes over the world has the right values (“Friendly AI”, in Eliezer’s jargon, as opposed to “Unfriendly AI.”)
My understanding is that many people at MIRI still hope they’ll be the ones to build the Friendly Super-AI that will take over the world, but they’ve also partly moved towards saying maybe they won’t be able to do that, and instead they’ll end up doing research that helps someone else make their Super-AI be Friendly.
I’ve always been kinda blasé about the take-over-the-world plans, because I think there’s basically no chance they’ll work. I expect humans will eventually build AI that’s smarter than us, but it will probably be built by a big corporation like Google or IBM or else the result of a Manhattan Project-style government effort. But if I thought MIRI had a chance of actually building an AI smart enough to take over the world, I’d be pretty freaked out that they had doing this as their goal.
Eliezer also appears to have believed, for a long time, that the world is in danger and he’s the only one who can save it. For example, he’s said:
I think my efforts could spell the difference between life and death for most of humanity, or even the difference between a Singularity and a lifeless, sterilized planet. I don’t mean to say, of course, that the entire causal load should be attributed to me; if I make it, then Ed Regis or Vernor Vinge, both of whom got me into this, would equally be able to say “My efforts made the difference between Singularity and destruction.” The same goes for Brian Atkins, and Eric Drexler, and so on. History is a fragile thing. So are our causal intuitions, where linear chains of dependencies are concerned. Nonetheless, I think that I can save the world, not just because I’m the one who happens to be making the effort, but because I’m the only one who can make the effort. And that is why I get up in the morning.
I believe the above paragraph was written in the year 2000, but he’s said similar things much more recently. In a series of Q&A videos he did in 2010, he implied he still believed he had a unique, near-irreplaceable role to play in the future of the world:
If I got hit by a meteor right now, what would happen is that Michael Vassar would take over responsibility for seeing the planet through to safety, and say ‘Yeah I’m personally just going to get this done, not going to rely on anyone else to do it for me, this is my problem, I have to handle it.’ And Marcello Herreshoff would be the one who would be tasked with recognizing another Eliezer Yudkowsky if one showed up and could take over the project, but at present I don’t know of any other person who could do that, or I’d be working with them.
Another important bit of context: The moral philosophy known as “utilitarianism” or “consequentialism” is quite popular on LessWrong. You can follow the second link for the Stanford Encyclopedia of Philosophy article, but relevant here is that utilitarianism is notorious for endorsing a lot of apparently crazy things. That’s not just something random uninformed critics of utilitarianism say. It’s a commonplace among philosophers who study these issues for a living.
For example, utilitarianism apparently endorses killing a single innocent person and harvesting their organs if it will save five other people. It also appears to imply that donating all your money to charity beyond what you need to survive isn’t just admirable but morally obligatory.
This second idea was definitely being discussed in the LessWrong community in the lead-up to the basilisk incident. For example, in that same series of 2010 video Q&As, he said things like:
I would be asking for more people to make as much money as possible if they’re the sorts of people who can make a lot of money and can donate a substantial amount fraction, never mind all the minimal living expenses, to the Singularity Institute.
If the thing that you’re best at is investment banking, then work for Wall Street and transfer as much money as your mind and will permit to the Singularity institute where [it] will be used by other people.
This idea is not totally novel, though it’s usually discussed in terms of donating to fight extreme poverty, not donating to organizations like the Singularity Institute.
I mention all this out even though I’m much more sympathetic to utilitarianism than a lot of people are. I think it’s important context for what follows. (I recommend Joshua Greene’s Moral Tribes if you want to see how traditional objections to utilitarianism can maybe be resolved. You can also read what I’ve said about trying to make lots of money to donate to charity here.)
Roko’s Basilisk comes from a thread on LessWrong that famously caused Eliezer to freak out and delete the entire thread. Because the entire thread got deleted, for a long time it’s been hard to get accurate information on what really happened, but recently (I think) someone posted the entire cached thread here. Here’s the key paragraph from the original post by Roko, from July 2010:
In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn’t give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity. This seems to be what CEV (coherent extrapolated volition of humanity) might do if it were an acausal decision-maker.1 So a post-singularity world may be a world of fun and plenty for the people who are currently ignoring the problem, whilst being a living hell for a significant fraction of current existential risk reducers (say, the least generous half). You could take this possibility into account and give even more to x-risk in an effort to avoid being punished. But of course, if you’re thinking like that, then the CEV-singleton is even more likely to want to punish you… nasty. Of course this would be unjust, but is the kind of unjust thing that is oh-so-very utilitarian. It is a concrete example of how falling for the just world fallacy might backfire on a person with respect to existential risk, especially against people who were implicitly or explicitly expecting some reward for their efforts in the future. And even if you only think that the probability of this happening is 1%, note that the probability of a CEV doing this to a random person who would casually brush off talk of existential risks as “nonsense” is essentially zero.
In the original post, the first appearance of the “CEV” acronym linked to an explanation of the concept. That link is now broken, but here’s one that currently works. The key thing is that building an AI that embodies the CEV of humanity is Eliezer’s preferred strategy for building “Friendly AI.” In other words, Roko’s original suggestion was that the basilisk was a possible consequence of Eliezer/the Singularity Institute succeeding at what it was trying to do.
Personally, I’m not worried about the basilisk. My understanding is that, even if you accept all the premises, including ones about acausal decision-makers (which I will make no attempt to explain here), the threat can still be evaded if you simply resolve to ignore all threats of acausal blackmail.
On the other hand, I’m not sure how you can rule out an AI of the sort Eliezer wants to build doing things that would be horrifying from the point of view of common-sense morality. As Roko says, that “kind of unjust thing that is oh-so-very utilitarian.”
I’m personally not clear on whether Roko was the first person to come up withe basilisk. The version of the original post linked above has a footnote which says “in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous,” implying the idea was floating around before Roko posted it. But possibly this footnote was added as an edit after the initial posting.
Before eventually deleting the entire thread, Eliezer posted this comment (emphasis in original):
I don’t usually talk like this, but I’m going to make an exception for this case.
Listen to me very closely, you idiot.
YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.
There’s an obvious equilibrium to this problem where you engage in all positive acausal trades and ignore all attempts at acausal blackmail.
Until we have a better worked-out version of TDT and we can prove that formally, it should just be OBVIOUS that you DO NOT THINK ABOUT DISTANT BLACKMAILERS in SUFFICIENT DETAIL that they have a motive to ACTUALLY BLACKMAIL YOU.
If there is any part of this acausal trade that is positive-sum and actually worth doing, that is exactly the sort of thing you leave up to an FAI. We probably also have the FAI take actions that cancel out the impact of anyone motivated by true rather than imagined blackmail, so as to obliterate the motive of any superintelligences to engage in blackmail.
Meanwhile I’m banning this post so that it doesn’t (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I’m not sure I know the sufficient detail.)
You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends.
This post was STUPID.
(For those who have no idea why I’m using capital letters for something that just sounds like a random crazy idea, and worry that it means I’m as crazy as Roko, the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.)
Then everything got deleted. All discussion of Roko’s Basilisk got banned from LessWrong. As I understand it, this ban remains in effect to this day, at least in theory, though of you do a Google site search of LessWrong for “Roko’s Basilisk,” you’ll find a lot of stuff that’s slipped through the cracks.
I know that a number of members of the LessWrong community have credited the basilisk incident with helping him see a lot of the things wrong with the community. I basically agree with this response–the basilisk incident is a clear case of Eliezer taking a crazy idea way more seriously than it deserved to be taken.
(Note that given various subsequent clarifications, Eliezer’s position appears to be that Roko’s exact version of the basilisk idea is wrong and wouldn’t work, but reading it might cause someone to stumble onto a working version of the basilisk, and therefore Roko’s Basilisk was dangerous and needed to be suppressed.)
Beyond that, the basilisk incident has brought out how awful Eliezer at handling criticism. With a couple exceptions I can think of, he generally defaults to dismissing his critics as haters, trolls, liars, accusing them of having irrational vendettas, and even calling them evil.
In fact, Eliezer has sort-of admitted he handled the basilisk incident poorly, but appears to believe his main mistake was underestimating how evil his critics are:
If I had to state the basic quality of this situation which I overlooked, it wouldn’t so much be the Streisand Effect as the existence of a large fraction of humanity—thankfully not the whole species—that really really wants to sneer at people, and which will distort the facts as they please if it gives them a chance for a really good sneer. Especially if the targets can be made to look like nice bully-victims. Then the sneering is especially fun. To a large fraction of the Internet, targets who are overly intelleshual, or targets who go around talking using big words when they aren’t official licensed Harvard professors, or targets who seem like they take all that sciunce ficshun stuff seriously, seem like especially nice bully-victims.
Interpreting my deleting the post as uncritical belief in its contents let people get in a really good sneer at the fools who, haha, believed that their devil god would punish the unbelievers by going backward in time.
A more reasonable explanation is that “people who go to great lengths to suppress information generally have something to hide” is generally a pretty good heuristic. So when Eliezer deleted the post, some people naturally suspected he was hiding some embarrassing secret or something. Maybe not a correct inference, but an understandable one.
And it’s not as if, early on, people could easily get Eliezer’s side of the story. Because Eliezer was trying to stop all discussion of Roko’s Basilisk, period. Normally, I’m pretty hard on people who pass on internet rumors without fact-checking, but in this any attempt at fact-checking was mainly going to reveal Eliezer trying to suppress discussion–as if he has something to hide.
Eliezer may claim to understand his mistake here, but I don’t think he does. Really understanding his mistake would require putting himself in his critics shoes, and realizing how their perspective could look reasonable from their point of view. I’m not sure he’s capable of seeing his critics that way.
Much of Eliezer’s complaints about things other people have said about Roko’s Basilisk are directed at RationalWiki, a wiki that’s mostly devoted to debunking pseudoscience. Eliezer claims that RationalWiki has conducted a “systematic campaign of lies and slander about LessWrong.”
I don’t particularly care for RationalWiki. Some of their articles have a political axe to grind; they have an article attacking effective altruism which seems largely driven by one editor’s opinion that finance is evil. But Eliezer’s accusation that they’ve engaged in systematic lying about LessWrong appears unsupported.
For example, Eliezer has claimed it’s a “malicious lie” to describe Roko’s Basilisk as:
an argument used to try and suggest people should subscribe to particular singularitarian ideas, or even donate money to them, by weighing up the prospect of punishment versus reward
On the contrary, Eliezer claims:
Neither Roko, nor anyone else I know about, ever tried to use this as an argument to persuade anyone that they should donate money. Roko’s original argument was, “CEV-based Friendly AI might do this so we should never build CEV-based Friendly AI”, that is, an argument against donating to MIRI. Which is transparently silly because to whatever extent you credit the argument it instantly generalizes beyond FAI and indeed FAI is exactly the kind of AI that would not do it. Regardless, nobody ever used this to try to argue for actually donating money to MIRI, not EVER that I’ve ever heard of. This is perhaps THE primary lie that RationalWiki crafted and originated in their systematic misrepresentation of the subject; I’m so used to RationalWiki telling this lie that I managed not to notice it on this read-through on the first scan.
I can’t read Roko’s mind, but if Roko meant the basilisk to be an argument against donating to MIRI, this isn’t at all clear from his original post. In fact, Roko said that “You could take this possibility [the basilisk] into account and give even more to x-risk in an effort to avoid being punished.” If Roko didn’t mean that to be an argument for donating to MIRI, it’s at least easy to see why people took it as one.
Okay, that’s all I have to say about this subject for now. TLDR; Eliezer has quite a few weird beliefs, but probably the most disturbing part is his eagerness to assume his critics are evil liars who hate him for no reason, when more reasonable explanations are available.