Natural Language Processing: How did we teach Computers to talk like Humans
It has been quite a journey to arrive at a ChatGPT model!
ChatGPT is definitely not born out of thin air! Today we look at the whole history of research and development for the past 80 years in the field of Natural Language Processing. And something tells me we are just getting started. We look at:
History of Natural Language Processing
NPL metrics: quantifying human-like language performance
History of Natural Language Processing
It has been quite a journey to arrive at a ChatGPT model! It took some time before we thought about modeling language as a probabilistic generative process. NLP studies the interactions between computers and human language and it is also as old as computers themselves.
Warren Weaver was the first to suggest an algorithmic approach to machine translation (Weaver’s memorandum) in 1949 and this led to the Georgetown experiment, the first computer application to machine translation in 1954 (“The first public demonstration of machine translation: the Georgetown-IBM system, 7th January 1954“). In 1957, Chomsky established the first grammar theory (Syntactic Structures). ELIZA (1964) and SHRDLU (1968) can be considered to be the first natural-language understanding computer programs.
The 60s and early 70s marked the era of grammar theories. Transformational-generative Grammar by Chomsky in 1965, Case grammar by Fillmore in 1968 (“The Case for Case“) recognized the relationship among the various elements of a sentence. In 1969, Collins proposed the semantic network (“Retrieval time from semantic memory“), a knowledge structure that depicts how concepts are related to one another and illustrates how they interconnect. In 1970, Augmented Transition Networks (“Transition Network Grammars for Natural Language Analysis“) was a type of graph theoretic structure used in the operational definition of formal languages. In 1972, Schank developed Conceptual Dependency Theory (“A conceptual dependency parser for natural language”) to represent knowledge for natural language input into computers.
During the 70s, the concept of conceptual ontologies became quite fashionable. Conceptual ontologies are similar to knowledge graphs where concepts are linked to each other by how they are associated. You can imagine generating sentences by following concepts paths in ontologies. The famous ones are MARGIE (1975 - “MARGIE: Memory Analysis Response Generation, and Inference on English“), TaleSpin (1976 - “TALE-SPIN, An Interactive Program that Writes Stories“), QUALM (1977- “The Process of Question Answering“), SAM (1978 - “Computer Understanding of Newspaper Stories“), PAM (1978 - “PAM - A Program That Infers Intentions“), Politics (1979 - “POLITICS: Automated Ideological Reasoning“) and Plot Units (1981 - “Plot units and narrative summarization“).
The 80s showed a great period of success for symbolic methods. In 1983, Charniak proposed Passing Markers (“Passing Markers: A Theory of Contextual Influence in Language Comprehension“), a mechanism for resolving ambiguities in language comprehension by indicating the relationship between adjacent words. In 1986, Riesbeck and Martin proposed Uniform Parsing (“Uniform Parsing and Inferencing for Learning“), a new approach to natural language processing that combines parsing and inferencing in a uniform framework for language learning. In 1987, Hirst proposed a new approach to resolving ambiguity: Semantic Interpretation (“Semantic Interpretation and Ambiguity“).
The 90s saw the advent of statistical models for NLP. It was the beginning of thinking about language as a probabilistic process. In 1989, Balh proposed a tree-based method to predict the next word in a sentence (“A tree-based statistical language model for natural language speech recognition“). IBM presented a series of models for statistical machine translation (“The Mathematics of Statistical Machine Translation: Parameter Estimation”). In 1990 Chitrao and Grishman demonstrated the potential of statistical parsing techniques for processing messages (“Statistical Parsing of Messages“) and Brill et al introduced a method for automatically inducing a part-of-speech tagger by training on a large corpus of text (“Tagging an Unfamiliar Text With Minimal Human Supervision“). In 1991, Brown proposed a method for aligning sentences in parallel corpora for machine translation applications (“Aligning Sentences in Parallel Corpora“).
In 2003, Bengio proposed the first neural language model (“A Neural Probabilistic Language Model“), a simple feed-forward model. In 2008, Collobert and Weston applied multi-task learning with ConvNet (“A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning“). In 2011, Hinton built a generative text model with Recurrent Neural Networks (“Generating Text with Recurrent Neural Networks“). In 2013, Mikolov introduced Word2Vec (“Efficient Estimation of Word Representations in Vector Space”) which completely changed the way we approach NLP with NN. In 2014, Sutskever suggested a model for sequence-to-sequence learning (“Sequence to Sequence Learning with Neural Networks“). In 2017, Vaswani gave us the Transformer architecture that led to a revolution in model performance (“Attention Is All You Need“). In 2018, Devlin presented BERT (“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding“) that popularized Transformers. And in 2022, we finally got to experience ChatGPT that completely changed the way the public perceived AI!
NPL metrics: quantifying human-like language performance
Have you read a paper on Large Language Models recently? "Our Perplexity metric is 20.5": That is great but what does that mean?! I feel that more than any other domain, NLP has a range of peculiar metrics that never seem to completely solve the problems they are trying to address. The evolution of NLP metrics in recent years is commensurate with algorithm advancements and highlights our inability to efficiently measure intelligence as a quantitative value.
Keep reading with a 7-day free trial
Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.