Necessary Nuance
Posts
Is ‘none pizza with left beef’ a modern Turing test?

Is ‘none pizza with left beef’ a modern Turing test?

I know that may sound like gibberish, but hear me out.

Sara Kelley-Mudie
August 25, 2025

Welcome! If you like what you read, please consider forwarding this newsletter to someone else you think would enjoy it. And if someone forwarded this newsletter to you please consider subscribing!

Some of my students used to (lovingly, I think) mock me anytime I started a sentence with “I was listening to a podcast…” so I spent a long time trying to think of another way to start this post but, well, it was a podcast that sparked this thought. I am nothing if not someone who listens to a lot of podcasts.

The podcast in question was an episode of Wonderful, in which Griffin and Rachel McElroy share things that they like, that’s good, that they’re into. For this episode, Griffin brought “none pizza with left beef” a meme that is, shockingly, old enough to be starting college this fall (which means, even more shockingly, that 2007 was 18 years ago).

As I listened to Griffin describe this very specific moment in the history of human/computer interaction that “none pizza with left beef” I suddenly thought, “is ‘none pizza with left beef’ a modern Turning test?” Which, I know, sounds absurd in any number of ways but I hope you’ll come on this journey with me.

For those of you who may be unfamiliar with “none pizza with left beef”, it’s the result of experimenting with online pizza ordering systems, which were new at the time. Steven Molaro was testing the accuracy of the ordering system which allowed you to specify, amongst other things, which half of the pizza you wanted toppings on. As you can see from the screenshot of Molaro’s order (and the pizza that was delivered) he was able to order a pizza with absolutely no toppings… except a normal amount of beef on the left side.

An image a plain circle of dough with approximately a dozen meatballs scattered on top

None pizza with left-ish beef

Wikipedia and Know Your Meme have good explanations of the origin and history of the meme (as does Griffin), so I’m not going to go into more depth. The important part of the history here (and core to the broader “special delivery” meme) is that this is not a pizza you could order from another human.

There is, of course, the obvious reason you wouldn’t order this pizza from another human: you would feel ridiculous, and you would probably worry about becoming a meme yourself. But even without those inhibitions, a human is going to have some follow-up questions if you try to order a circle of plain bread with some lumps of meat on top. Primarily, “are you sure that’s what you want?” Or they may say they can’t make a pizza like that because they don’t want to be set up for a complaint, or to become a meme themselves.

I’m guessing most folks reading this are familiar with the idea of the Turing test; in a nutshell, a human evaluator is asked to review a transcript of a conversation between a human and a machine. If the evaluator can’t tell the difference, the machine passes the test. There is plenty of philosophical debate about what “passing” this test actually means, but for many people it has become a shorthand for “if a machine can pass this test, it’s displaying ‘intelligence’ of some kind.”

I’m also guessing that most folks reading this know that “imitating human language without regard to accuracy” is a) something that LLMs can do quite well and b) an insufficient measure of intelligence. Imitating language is one thing. Understanding it is something else entirely.

Which brings me back to “none pizza with left beef.” I’m assuming an LLM or other machine learning model could be trained to recognize that this kind of pizza order is, at the very least, atypical. However, given LLM’s sycophancy problem I’m not confident that it would assertively reject this order.

More than that, I wonder if any of the various programs under the broad umbrella of “AI” would be capable of understanding that they were, for lack of a better term, being fucked with. Is knowing when to roll your eyes a sign of intelligence? How about being able to recognize a joke? Maybe we won’t truly achieve artificial intelligence until we can teach a machine how to “yes, and” a bit.

So much of the conversation about AI is framed as humans needing to be prepared for how to use these tools, but what if instead we talked about whether or not AI is ready for us? There’s a line in this article from Anthropic about whether or not Claude can be trained to run a shop about how some of the attempts to “jailbreak” the program with their orders were evidence that “Anthropic employees are not entirely typical customers” and… sure, maybe the typical person isn’t going to attempt to order tungsten cubes. But humans ARE going to use language in silly and unpredictable ways, and even in ways that are inscrutable to other humans – just ask any adult to explain why so many middle schoolers find the word “skibidi” funny.

“Google thinks you should put glue on your pizza” is a thing that happens because jokes and sarcasm are not always easy to recognize. You also have to factor in the meme-equivalent of linguistic drift. “Put glue in your pizza sauce” started as a joke response to a question on r/Pizza, but has become a shorthand for “this LLM output is nonsense.” “What color is the dress” could be a straightforward question, or it could be a shorthand for how perception differs based on perspective, or it could be a reference to what is often called “the last good day on the internet.” Some of that can be figured out by context, but in order to determine which it is you probably also need to know how old the people talking are, as well as their knowledge of meme culture. Frankly, I’m not going to believe a program is capable of intelligence until it can have a prolonged, passionate argument about whether the dress is black and blue or gold and white.

So, is “none pizza with left beef” a modern Turing test? I don’t know. But I do know that being able to make sense of it, both literally and figuratively, requires a lot more than being able to string together a statistically-probable combination of words.

^{Did you enjoy this? Do you think I should write more than once every five months? Share it with someone else you think would enjoy it; getting more subscribers would be a good extrinsic motivator for me.}

Reply

or to participate.