In 1979, Caroly Wilcox designed a muppet, “Short Red.” It joined the ranks of the faceless “anything muppet” or AM monsters that donned any facial features necessary to serve in choirs and background scenes.
It wasn’t until Season 11 that Elmo got a named appearance, and even then his personality and voice underwent radical changes as the muppet was picked up and discarded by a series of actors. He went from mumbling all his lines under Brian Muehl to yelling them in a gruff voice under Richard Hunt, who hated the muppet so much that he threw it at the feet of Kevin Clash. Clash created the high-pitched third-person-speaking muppet that in later years would become so popular as to have his own dedicated segment, “Elmo’s World.”
Embeddings from Language Models (ELMo) first appeared on the AI scene in 2018. The preceding years had introduced word embeddings, digital representations of the meanings of words learned from large quantities of naturally occurring text and stored in static databases. ELMo took the state of the art forward by calculating a representation of a word in context. Whereas all the possible meanings of a word had to coexist inside a single representation created by one of the older approaches, ELMo created a word’s representation on the fly in the context of the words around it. It created a stir, but it didn’t upend the field like its namesake did Sesame Street. Nevertheless, with help from an unlikely ally, the march of the muppets was about to begin.
In 1984, the Japanese company Takara struck a deal to rebrand its transforming mecha toys from its Diaclone and Micro Change toy lines to sell on the American market. The new name: Transformers.
In 2017, the sequence transduction model known as the Transformer made its first waves, in a publication asserting it could do everything other neural network architectures could and more. Previously the recurrent neural network, which read text from left to right and maintained a single store of memory, dominated the scene. ELMo is one such model, although it reads the sentence both forwards and backwards and combines the results.
The Transformer compares every word in a sentence directly to every other word, enabling it to make connections between words far away from each other. Although this approach made a splash, like ELMo, it was short of earthshaking. Like a bright child, a large transformer model has a lot of room to learn complex patterns, but, child or network, a clever mind alone isn’t enough to build a skill. You need lots and lots of data.
It wouldn’t be until late 2018 that ELMo’s idea of general pretraining and the novel Transformer model joined forces. Less than half a year after ELMo, the world of natural language would get a knock on its door from a yellow first on a marionette stick. Not Big Bird, but just as old and nearly as iconic.
I take requests. If you have a fictional AI and wonder how it could work, or any other topic you’d like to see me cover, mention it in the comments or on my Facebook page.