ADVERTISEMENT
I/O 2024: Key takeaways from Google developers' conclaveThe event was all about generative ArtificiaI Intelligence and Google laid out its plans on how it will improve the user experience on all its applications and services.
DH Web Desk
Last Updated IST
<div class="paragraphs"><p>Google I/O 2024 logo.</p></div>

Google I/O 2024 logo.

Photo Credit: Google

Search engine giant Google on Tuesday (May 14) introduced the improved versions of its generative Artificial Intelligence (gen AI) Gemini Large Language Models (LLMs)— Nano, Pro and Advanced.

ADVERTISEMENT

These models feature multimodal capabilities, all-new gen AI media generator tools and more. 

The event focused on AI, with Google outlining its plans to enhance user experience on all its applications and services.

Here are key takeaways of Google's annual I/O developers conclave event:

AI-infused Google Search 

As rumoured, the company is going ahead with deeper integration of Gemini into Google Search. The new updated apps for phones and desktops will be able to deliver search results beyond the list of blue-hued website urls for detailed information on a topic. 

Called AI Overview, it will be capable of offering short summaries for easy understanding. Users can adjust the AI Overview  to simplify the language or break it down in more detail.

With deeper integration of Gemini, search app can understand and respond long and complex queries with multi-step reasoning.

AI Overview feature of Search app.

Photo Credit: Google

For instance, user can ask the search app for location of nearest yoga studio with good reviews and lucrative discounts for new members. The app will list all the highly rated yoga studios with photos, subscription plans (& discounts), walking/driving distance (with approximate travel time) in a clean and visually appealing card form and below, it offers directions via Google Maps link. 

Initially, Google AI Overview will made be available first in the US and it will be expanded to more regions in the coming weeks. 

Google Search with video

With Google Lens feature on Search app, users can point camera and interact with the app to get answers. For instance, you bought a pre-owned keypad-based phone, but struggling to switch it on. You can just open Google Lens and turn on the video option. It will prompt the user to ask question and he/she can ask on how to troubleshoot the phone. It will offer step-by-step guide with voice and text on how to check if the phone is in working condition and how to turn it on.

Searching with video will be available soon for Search Labs users in English in the US. It’ll expand to more regions over time.

Generative AI-powered multimedia applications

Google Veo 

Mountain View-based company announced its first-ever generative video generator application titled Veo. 

It can generate high quality 1080p (full HD) videos with just text prompts. Google says it can understand several cinema-related technical terms such as ‘timelaps’, ‘aerial shots of a landscape and more. The generated footage will be consistent and coherent, to ensure that the people, animals and objects move realistically throughout shots. The demo videos looked very convincing and realistic, particularly that of the jellyfish under the sea and the Timelapse video of the pink lotus blooming in the pond. 

Besides Veo, Google also unveiled new Imagen 3. It can create stunning pictures with just text prompts.

It is said to be better in terms of details, producing photorealistic, photorealistic images, with far fewer distracting visual artifacts than the previous iteration. The sample photo of the wolf was incredibly good and one can even see the hair strands of the fur of the animal clearly.

Image generated by Imagen 3 AI tool.

Photo Credit: Google

Google’s YouTube has announced to build new gen AI tools called Music AI Sandbox to generate music and tunes. It will be collaborating with top musicians, songwriters, and producers for feedback. Google released new experimental music- ‘Right There’— created by Grammy-winner Wyclef Jean, Grammy-nominated songwriter Justin Tranter and electronic musician Marc Rebillet using Music AI Sandbox.

Both Veo and Imagen 3 are available to review for registered developers.

And, every multimedia content generated by Google’s AI models will come with imperceptible digital watermark called SynthID for easy identification.

Improvements to AI features on Android phones

Google has enhanced the Gemini Nano with multimodal capabilities to make it more useful to the device owners. It will be able to understand more information in context such as sights, sounds and spoken language.

Also, ‘Circle to search’ feature is getting better at understanding context to assist the user in getting a task done efficiently than ever before. 

It can even assist kids solve mathematical equations. He/she has to just point the camera on a problem on the book and invoke Gemini AI bot by performing ‘circle to search’ gesture on the screen. It will offer step-by-step guide on solving the problem in less time. 

Also, users will be able to make good use of Gemini in  more ways on Android phones.  For instance, user can drag and drop generated images into Gmail, Google Messages and other places, or tap ‘Ask this video’ to find specific information in a YouTube video. 

And, if the user has access to Gemini Advanced, he/she gets the option to “Ask this PDF” to quickly get answers or summaries without having to scroll through multiple pages. This update will be rolled out to hundreds of millions of compatible devices over the next few months. [Note: Gemini Advanced is available only to premium Google One subscribers] 

Later this year, Google will bring a new privacy security feature to Pixel phones. With multi-modal Gemini Nano, phone will be able to offer real-time alerts during a call if it detects conversation patterns commonly associated with scams.  Google says that this protection mechanism happens on-device so the conversation stays private to owner.

TalkBack feature on Pixel phone.

Photo Credit: Google

Add to that, Google announced to bring improved ‘TalkBack’ accessibility feature to Pixel phones later this year.  Users with low vision will get richer and clearer details of what’s shown in an image—whether it’s about a photo in a text or style of clothes when shopping online. 

Gemini-powered Photos app

On average, people captures hundreds if not more images per year and over time, the numbers will reach thousands on Photos album. And, if we want to find long forgotten old photos, it becomes tedious to scroll through the entire album. To save time of the device owners, Google has infused Gemini to the Photos app. With just text or voice prompt, it can understand the context and flash it on the top of the search results. 

Ask Photos feature on Pixel phone.

Photo Credit: Google

Called ‘Ask Photos’, users can ask for what they’re looking for in a natural way. Say, ‘Show me the best photo from each national park I’ve visited. It will instantly list those photos. 

The new Ask Photos feature is still under testing and will be rolled out to the public in the coming months.

Gemini Pro 1.5 and Flash 1.5 for developers

For programming developers, Google announced the Gemini Pro v1.5 model. It comes with quality improvements across key use cases, such as translation, coding, reasoning and more.

The company also launched brand new Gemini Flash 1.5 model. It is a lighter version of the Gemini Pro to perform for smaller coding tasks and works twice faster than the latter. 

Both models are now available in purview versions in more than 200 countries and territories. It will released to all clients in June.

Gemini Pro 1.5 coming soon to Google Workspace clients

It will be available on Gmail, Docs, Drive, Slides and Sheets. With this, users can ask Gemini to summarise long email threads. As it can understand the context, Gemini can also offer smart replies.

Also, it can analyse complex data sets on Google Sheets and Slides and offer in pointers or easily understandable bytes of information.

Gemini also powers 'Translate for me' feature in Google Meet. It can automatically detect language and do real-time translated captions in more than 60 languages.

It will be available in June on desktop for businesses and consumers as Gemini for Workspace add-ons.

Get the latest news on new launches, gadget reviews, apps, cybersecurity, and more on personal technology only on DH Tech.