A new beginning.
That’s what most of us felt when we tuned in yesterday to OpenAI’s official presentation of GPT-4, the newest generation of their world-famous pre-trained transformer.
And although there are plenty of impressive features to cover, there are also many open questions that were left unanswered, questions that need answering.
Can we answer for ourselves what is OpenAI trying to hide and what the future holds for all of us?
Your best work partner
During the 30-minute presentation, while more than one-hundred-thousand people looked in awe, OpenAI chose very carefully what they showed and how they showed it.
Understandable, but concerning in some aspects.
Truth be told, even though it was a very rehearsed presentation, they still managed to add some extra improvisation by allowing some viewers to test the model, in order to portray an image of going off the script.
But one way or another, they showed some mind-bending features.
Summarizing using the letter Q only
You read that title correctly, not only is GPT-4 capable of summarizing a long text into one sentence in a few seconds, it is capable of doing so using only words that begin with a ‘g’, or with ‘q’ (this last one at the request of a viewer).
Albeit the stupidity of the request, it showcases the amazing linguistical capabilities of this model, far exceeding those of the GPT-3.5 turbo, the one running in ChatGPT’s API.
Also, it proved capable of cross-summarizing and finding a common theme among two different texts, thanks to the fact that GPT can now handle up to 32,000 tokens.
From OpenAI’s own documentation we know that 1,000 tokens equate to approximately 750 words. That means that GPT-4 is capable of receiving and understanding texts of up to 24,000 words, around 14 times the length of this article in one go, which is far more than the previous limit.
This is not trivial, considering that the longer the input is, the better context we can provide to it to generate better responses.
But this was just the beginning of what we were going to see.
A coding beast
Greg Brockman, OpenAI’s chairman, then moved into giving examples of how can GPT-4 improve the coding experience for developers.
And guys, I was speechless.
Firstly, it created a Discord bot from scratch by illustrating one of GPT-4’s most impressive features, instruction following.
By telling the model it now was an “AI programming assistant” and giving it some instructions, the model created the bot in a blink of an eye.
But that wasn’t the most impressive thing by a large margin.
As Greg acknowledged, GPT-4’s training cutoff is 2021, which means that GPT-4 has no knowledge from that point onwards. And the problem was that Discord had updated its API recently in 2022, something GPT-4 of course wasn’t aware of.
With the actual ChatGPT that would be the end of the story, but not with GPT-4.
Demonstrating great few-shot capabilities, the model was capable of using the new information that Greg gave it to update its knowledge and create the bot while meeting Discord’s 2022 requirements.
Again, language and semantic capabilities from another world that, in this case, match a human programmer’s performance (even exceeding if we consider the velocity at which GPT-4 created the bot).
However, the show was far from over, as the model also was capable of debugging its code when Greg sent it an error message.
At this point one is tempted to think, will human programmers still exist in a few years, or will simply serve as code reviewers?
Yet, as with everything, OpenAI had scheduled the best for last… GPT-4's new vision capabilities.
The Multimodal Era
Although Microsoft Germany’s CTO already leaked this, GPT-4 is going to be multimodal, which in simple terms means that the model is capable of receiving and understanding input from text, images, or even video.
For such purposes, they premiered GPT-4's perception capabilities, showing various images (some of them from the viewers) and then having the model describe them in accurate detail.
But probably the most impressive of all examples was when Greg drew a mockup of a website on pen and paper, sent it as a photograph to the model, and asked it to become a programmer again and create an HTML/CSS/JS website based on the mockup.
Et voilà. An interactive website in 10 seconds.
Truly incredible.
Even so, OpenAI was capable of squeezing one more feature into the presentation, one I personally wasn’t expecting.
Fear taxes no more
Taxes. Yes, taxes.
GPT-4 helped with a tax deduction after it was told that its new role was ‘TaxGPT’, and was also given a very long explanation of a tax deduction.
Again, not only GPT-4 managed to understand the whole text, remarkable considering how complex these documents are, but while doing impressive calculations and providing the correct answer.
Considering how poor ChatGPT’s performance has been regarding mathematics this was genuinely spectacular, mainly because it was clarified that no calculator APIs were connected to the model.
However, we must be cautious and see if these capabilities can be easily brought to shame with more complex mathematics.
Arriving to this point the presentation was closed, causing several questions that people were expecting to go unanswered.
And the issue is that some are very concerning.
We. need. answers.
As I said earlier, many questions remain regarding Generative AI in general.
Size, size, and more size
My main getaway from the presentation was that GPT-4 was considerably superior in its language skills to past versions.
Consequently, I’m led to believe that the biggest improvement we’ve seen is due to the fact that the model is simply much bigger.
That is, they are leveraging the fact that Large Language Models benefit from “scaling laws”, which is a fancy way of describing that these models get linguistically better as they get bigger.
From the multimodality standpoint, although we don’t have access to the model, by looking at the Kosmos-1 paper we can assume that they made important changes to the encoder part of the model, especially with regard to the visual transformers used to interpret images better.
But what about one of the most expected new features, video?
Video? Where?
With last week's news, we knew that GPT-4 would handle video. However, after yesterday’s presentation, we still don’t know in what sense.
As an input that it can then describe? Or will it be capable of actually generating videos?
Microsoft Germany’s CTO was unclear on this and so was OpenAI in the official launch. I guess we will have to wait on this matter.
But we still haven’t tackled the most concerning issue:
There’s an overarching issue with Generative AI in general, and that’s none other than its tendency to make stuff up.
Unreliability and business can’t be friends
Working in business environments, besides some specific cases like Programming or Marketing teams, no one will dare leverage Generative AI as a client-faced solution simply because it’s not reliable.
Sincerely, I was expecting that OpenAI would tackle this issue head-on in yesterday’s presentation, and the fact that they didn’t screams that the problem persists.
Yes, models are getting more reliable as they get bigger, and yes improvements on the Reinforcement Layer are allowing them to make better decisions, but the risk is undoubtedly still there.
From a business perspective, I still consider these solutions as great enhancers, solutions to make non-intelligent technologies like process automation or rule-based chatbots more intelligent, but never as standalone solutions.
To me, it still doesn’t seem like an option except for use cases with a high tolerance for error.
I mean, would you risk using a Chatbot that lies to your client? Of course not.
But what happens in those places where GPT-4 does fit?
Oh my.
Serious threat
In spite of Greg’s unwavering efforts to portray GPT-4 as a partner and not as a substitutor, which is something I agree with, the productivity enhancements will be immense.
Therefore, unless you’re different or unique, artists and writers are going to have a tough time with GPT-4. For instance, content writers seem completely substitutable, besides editors and others that will continue to play a role.
In a way, GPT-4 is going to elevate mediocre people into putting a fight against well-established artists and writers. Thus, I feel that only the greats in their fields will continue to thrive.
Having a unique perspective, a unique style, or a unique message will be key differentiators in this AI world that’s coming.
So, are you really special?
A final word
If you’ve read this article, you’re now ahead of 95% of society when it comes to AI.
But that still leaves a lot of people at that level.
So what if you’re capable of being above 99% of society?
That’s a totally different level, and if that’s where you want to be, I have news for you.
Read more: https://medium.com/@ignacio.de.gregorio.noblejas/gpt-4-released-81f8fc697def
All Comments