- Latest and Greatest
- Posts
- Watershed moments in AI...
Watershed moments in AI...
...yesterday was one of them!
Wow - what a big day yesterday was for AI! Huge announcements within a span of 3 hours meant there was a lot to unravel. So that’s what I did. Here’s the juice for you:
Google teases Gemini 1.5
The first big announcement came courtesy Google, with a teaser on Gemini 1.5, with a context window of a whopping wait-for-it 10 million tokens (multimodal).
Source: Google
The first obvious question is how how does it fare on the needle in the haystack eval as it scales on the context window? As per Jeff Dean, Chief Scientist at Google DeepMind, "For text, Gemini 1.5 Pro achieves 100 percent recall up to 530k tokens, 99.7 percent up to 1M tokens, and 99.2 percent accuracy up to 10M tokens." That is impressive - still to be evaluated by the community, but impressive none the less. Check out Jeff’s full twitter thread here
As for its reasoning capabilities, the following demo by Demis Hassabis shows how good it can be at reasoning, huge if true!
Here’s a fun demo of long-context understanding. First, we asked the model to find 3 amusing moments in the 402-page pdf transcription of the iconic Apollo 11 mission. Then we uploaded a simple drawing of a boot and it identified the moment we had in mind: Neil’s one small step!
— Demis Hassabis (@demishassabis)
3:59 PM • Feb 15, 2024
In summary, why is Gemini 1.5 a huge deal? Because the jaw dropping 10M token context:
a) is excellent at retrieval unlike the other SOTA models before
b) Promises real world use case application by generalizing zero-shot to extremely long context data sources
(3) Truly multimodal, shown to work well across text, audio, and video.
Gemini-1.5 Pro has its spotlight stolen today, and people are poking fun at Sora vs Google memes. Well, I think it's the biggest boost in LLM capability so far in 2024. v1.5's 10M token context (1) excels at retrieval; (2) generalizes zero-shot to extremely long instructions like… twitter.com/i/web/status/1…
— Jim Fan (@DrJimFan)
11:42 PM • Feb 15, 2024
A good summary here, as to why this launch is such a huge deal despite OpenAI doing its thing with SORA (more on that below).
So, when can we use it? You may have to wait a while, as this seems like a targeted rollout to some developers and enterprise customers via AI Studio and Vertex AI.
But then, OpenAI did what is does best, outdo! If you ever wondered whether you were living in a simulation, well…wonder harder!
OpenAI releases SORA:
Read more here as the examples shown are truly exceptional, not perfect, but great: https://openai.com/sora
Here are a few to blow your mind:
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
openai.com/sora
Prompt: “Beautiful, snowy… twitter.com/i/web/status/1…
— OpenAI (@OpenAI)
6:14 PM • Feb 15, 2024
sama has sharing a ton on his profile too, take a look and wonder…harder:
The progress in one year has been astounding, not Will Smith eating pasta today, but you get the point:
🚨text to video progress in one year🤯
2023: 2024:
— Sam Sheffer (@samsheffer)
7:03 PM • Feb 15, 2024
Lots of speculation on what is powering SORA, following is a good take and a good starting point for what the future could look like, as this evolves over time:
If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all… twitter.com/i/web/status/1…
— Jim Fan (@DrJimFan)
7:22 PM • Feb 15, 2024
How about this for a prediction, is this how it ends:
> puts on vision pro 4
> opens sora app
> sora 5 runs in real time in 8K video with elevenlabs audio
> voice command to generate whatever you want
> perfect lucid dreaming while wide awake
> most addictive possible experience
> end of human civilization— Siqi Chen (@blader)
3:35 AM • Feb 16, 2024
All in all, if you ever doubted what AI can and will do, this is the watershed moment. It is time to rethink how you want to be positioned for the future. There are multiple opportunities and threats (and insane alpha) in this post. You just need to find that needle in the haystack.
Happy weekend everyone!
Reply