A Primer on Generative AI

Brian Ratajczak
The Modern Scientist
9 min readJan 4, 2023

--

Next year, I can probably create this update using ChatGPT; today however, this tool — the most powerful AI chat-interface released to the public — is only trained on data through 2021, so it has no sense for what happened in 2022 [1]. That said, if you have not played around with ChatGPT, I highly recommend signing up — all you need is an email. Then you can get a sense for how impressive this technology already is, responding effectively to prompts ranging from “what is the meaning of life” to “write a story in the style of the New Yorker about the Calabasas pickleball scene.”

Taking a step back, the big update this year in AI — perhaps even all of big tech — has been the rapid advancements in Generative AI, and in particular, how these technologies have started to be productized for public use. I first came across these tools this summer, when my friend was creating pictures using Wombo Art. Initially non-plussed (how does writing a few words constitute as art?), I soon began to appreciate the creativity involved in prompting these models and developing these technologies; grasp how this can be applied across multiple mediums — text, image, audio, video; and grapple with the new legal, ethical, and societal implications (note that video on deepfakes is from 2 years ago!). Now, I am as curious and bullish on Generative AI as I am on any other technology.

The remainder of this post explores a few questions related to this new technology: what is it; how might it impact us; how will it develop; and what is the business model (i.e. who will fund it).

What is Generative AI?

In its most basic form, machine learning is about designing an algorithm to solve a problem. Generative AI is a relatively new paradigm that is designed to create entirely novel content. To set this in context, I’ll borrow from Benedict Evans (slightly edited):

The conceptual breakthrough of machine learning was to take a class of problem that is ‘easy for people to do, but hard for people to describe’ and turn that from logic problems into statistics problems. Instead of trying to write a series of logical tests to tell a photo of a cat from a photo of a dog, which sounded easy but never really worked, we give the computer a million samples of each and let it do the work to infer patterns in each set. This works tremendously well, but comes with the inherent limitation that such systems have no structural understanding of the question — they don’t necessarily have any concept of eyes or legs, let alone ’cats’.

To simplify hugely, generative networks run this in reverse — once you’ve identified a pattern, you can make something new that seems to fit that pattern. So you can make more pictures of ‘cats’ or ‘dogs’. To begin, these tended to have ten legs and fifteen eyes, but as the models have got better, the images have got very convincing. But they’re still not working from a canonical concept of ‘dog’ as we do (or at least, as we think we do) — they’re matching or recreating or remixing a pattern.

What’s enabled this has been the massive increase in compute capabilities and data sets (in fact, it’s possible that pictures of you have been used to train these large models). Accordingly, these models can respond with the same biases that we see in the real world — from racial stereotypes to sexualization. It’s worth noting, this is the same technology that a Google engineer thought was sentient. While there are open questions on what consciousness means, Generative AI more than anything is applied statistics.

All said, this technology is not an end, it is a tool. One of the key interfaces end users have with these models is how we prompt them (prompt engineering). This is no different than how things work today: in consulting we would say a problem well framed is a problem half solved. Prompting in this new medium is likely to differentiate those most effective at using these tools, as well as a means to test these models’ limits — something the leading researchers in the space openly share. In short, Generative AI is a tool that has the potential to transform everyday life.

How might Generative AI impact us in the next 3, 5, 10 years?

Of all the predictions I’ve heard, the one that has resonated most was comparing Generative AI to email. In the beginning of email, there were musings about how this would displace the postman. The postal service still exists, and email’s impact has gone much further — changing the way all of us work and communicate. Generative AI is likely to have similar implications on productivity and team effectiveness; our workflows and communication patterns; and homework (highly recommend this article)!

One of the first questions that arises is around jobs — a common topic when discussing any rapidly advancing technology. To assess the potential impact, it can be helpful to consider implications with a short-term and long-term view. In the long run, automation should not be a concern until we have 10:1 student to teacher ratios (and ideally 4-day work weeks!). In the short run, potential large-scale job displacement could introduce lots of friction and pain. With regards to Generative AI, one job that gets a lot of attention is that of artists. While this may diminish the need for the total number of artists, these tools should also improve creativity and productivity of the artists that remain [2].

As a side note, when discussing AI and jobs, it’s a bit ironic the first wave of jobs most impacted are the creative ones. Conventional wisdom was that blue collar jobs would be the first to go (like the 3.5 million trucking jobs with the advent of autonomous vehicles); however, in hindsight, this reverse dynamic totally makes sense. AI is exceptionally good at getting to an 80%+ answer, but not a 99%+ one. When it comes to jobs like driving, being wrong 1% of the time can lead to accidents and even deaths. In more creative domains like art (or jobs like mine), an 80% answer can be quite compelling — we won’t get hung up if a Wes Anderson depiction of The Shining doesn’t capture everything perfectly — making AI output more applicable.

Personally, what gets me most excited about Generative AI is how it will totally transform storytelling. Just as Canva made photo editing more accessible to everyone, video (and text and image) editing will rapidly become much simpler and more powerful — benefiting both professionals and amateurs (like me!). Initially, we will be able to enhance our current pictures and videos, and soon advances will enable us to recreate moments or design entirely new ones. For instance, by the time I have a young child, I might be able to hear about her day and then spend a few minutes with a tool creating a bedtime story by inputting a moral / lesson (based on whatever is top of mind for that day); the main characters (from her own grandma to Captain Hook); the style (from Dr Seuss to J.K Rowling); the format and length; and any description of a plot. If you’re curious about the current state of this technology, check out StoriesForKids.

Of course, I would be remiss not to address concerns of AI taking over the world. If you want to dig into that rabbit hole, the alignment problem and mesa-optimizers are a good place to start. If you want to apply these worries through technology that exists today, consider Meta’s impressive new AI agent, CICERO, which combines strategic reasoning with generative AI (natural language processing). As with many new technologies, we will likely appreciate progress most when we look back at our lives before it.

How will these tools develop?

What is exciting is not where the models are now; it’s their rapid pace of advancement and imagining where they can be in a few years, such as the proposed children’s bedtime story. We are seeing progress in two key areas that enable us to get to that state: better models, and better products.

The first is continuing to refine the underlying LLM models, as there are several large gaps that exist today. Some notable limitations include (1) access to current data (ChatGPT cannot access anything post 2021); (2) accurate responses (the model will confidently give a wrong answer — “hallucinations” in industry parlance — leading to false information, mistakes in basic math, weirdly portrayed objects such as hands, and compositionality issues; (3) multi modal input (prompting these models with text, photos, and video instead of just text); and (4) improved filtering / moderation (so these cannot be used to promote suicide, terrorism, etc.).

The other area we are seeing development is more user-facing products leveraging these models. Up until this year, all the currency in the space has been research and white papers; now, we are at a point where these models have some minimum-viable utility, and as such product and design hackers (with no-to-minimal understanding of the underlying models) can build on top of them. This has led to explosion of new startups now in this space, enabling us as end users to experience the potential of these models. Expect advancements in both of these domains, with the former coming from large organizations, and the latter coming from startups (and the large organizations).

What is the business model (i.e. who fill fund it)?

While this may be a less fun question, continued investment in these models and products likely requires some plausible financial return — especially with shareholders clamoring for cash flow in today’s economic climate. Improving these models costs billions of dollars each year, and that’s if you even have access to the compute power, top researchers, and the data (tokens) to train these models. Once a model is trained, deploying it is expensive too — with each of the millions daily ChatGPT queries costing a few cents to compute (these products won’t be free forever). As such, a perspective on the commercial market can be instructive for understanding the pace of investment and development.

Ultimately, this question comes down to which end-markets will pay for these products. At one end of the spectrum, users could spend more on entirely new products (Hollywood or Advertisers paying for new capabilities; or individual consumers paying $10 to train a model to create personal avatars). In the middle of the spectrum, users may spend more on existing products (this could be the case as gaming moves to the metaverse; or if users engaging more with a product enables greater monetization through channels such as advertising). At the other end of the spectrum, users may spend no additional money while receiving improved services [3]. In this last scenario, some companies may invest to build these capabilities to protect existing products — such as Google Search or Disney Studios — implying a sort of “pay to play” scenario where companies have no other option to maintain market share (as competing services are only one click away).

The answer will of course be a mix of all of these, but having some perspective on where things will land on the net should inform the pace of these investments and whether they will be margin accretive or compressing (are they driving more costs, or more revenue?) for those investing.

To conclude, Generative AI raises many big questions with long-term timeframes, but it is also changing our lives in small ways now. It has the potential to substantially impact how we work and communicate — likely benefiting some while disadvantaging others. It’s inspiring and terrifying at the same time, which is why I encourage you to build a perspective on it, and just as importantly, have some fun with it!

[1] Limiting its ability to write topical late-night show opening monologue.

[2] This raises another question of whether we will see a concentration of top artists (more akin to the music industry), or a democratization of creators (more akin to TikTok and YouTube).

[3] This dynamic is why Hal Varian — and others — believe we are not fully capturing productivity growth.

If you liked this, you can find similar pieces of China and Climate. Special thanks to Jon Ma and Chris Lipcsei for encouraging me to write these.

--

--