- AI Tangle
- Posts
- ☕️ Google's Gemini goes Multimodal To Power The Company's Latest AI Agent
☕️ Google's Gemini goes Multimodal To Power The Company's Latest AI Agent
Not long after the blockbuster release of OpenAI's full o1 model, Google looked to try to take the spotlight as it unveiled its latest prototype AI agent to date, powered by Gemini 2.0 Flash, its de facto new flagship model. Other key takeaways include:
OpenAI releasing Advanced Voice Mode's promised video and screen share capabilities to Pro and Plus users
Google going on a $20 billion spending spree to build renewable energy solutions to power AI's needs
Nvidia ramping up its workforce in China to begin work on autonomous driving technology for AI-driven cars
Join us at AI Tangle as we untangle this week's happenings in AI!
THE BIG AI STORY
On Wednesday this week, Google debuted its most recent move into agentic AI with Mariner. Though it may be experimental, with minor inaccuracies to be expected, Mariner is designed to be used with a "human in the loop" to assist users with browser tasks like automatically navigating websites, interacting with spreadsheets, or even filling shopping carts. Google is trying to better its user experience by embedding its neural network technology as a Chrome extension, and it has a new generation of Gemini to make a difference.
Along with Mariner also came the launch of what is powering it behind the scenes, which was unveiled as a new version of Gemini in the form of Gemini 2.0 Flash. Promising twice the speeds in specific benchmarks along with "significant improvements" in coding, image analysis, math, and "factuality," Gemini 2.0 Flash has, according to Google themselves, effectively taken the crown as the company's flagship AI model from Gemini 1.5 Pro. It also comes with multimodal capabilities right out of the gate, such as image generation and an emphasis on audio generation, launching an API for developers to play around with to boot.
5 QUICK HITS
Ever since it was showcased first back in May, OpenAI has finally released Advanced Voice Mode's (AVM) video and screen share capabilities, enabling users to interact with it using, for example, their phone cameras. During a livestream, OpenAI members demonstrated AVM's visual understanding by guiding them in making pour-over coffee by detailing the step-by-step process and accurately identifying objects. With a broader rollout planned for January, the update to OpenAI's AVM includes a festive Santa voice mode (until the end of the month) and is now available to Plus and Pro users.
Google has partnered up with Intersect Power and TPG Rise Climate to develop gigawatts of renewable energy, battery storage, and grid upgrades to support its AI-driven data centers as part of a $20 billion investment spree. This includes an $800 million equity injection into Intersect Power, with plans for gigawatt-scale renewable parks paired with data centers by 2027. Due to AI's rampant global expansion, energy shortfalls are anticipated, and more companies, including Google now, are taking aim at it while waiting for slower nuclear projects to go online.
Nvidia has recently expanded its workforce in China by about 200 employees this year, bringing its total nearly 600 in Beijing, according to an article by Bloomberg. The company recently opened a new office in Beijing's Zhongguancun tech hub, aiming to boost its research and autonomous driving technology developments in China. Nvidia, which employs around 29,600 people globally, is currently under investigation by Chinese authorities for alleged anti-monopoly violations as a response to recent US-China trade tensions.
Harvard University has launched an AI training dataset of nearly one million public domain books through its Institutional Data Initiative, funded by Microsoft and OpenAI, with the books scanned by Google Books. The dataset includes books ranging from classics to obscure math books, according to Wired, offering legally accessible training material for AI models. However, the dataset's dated content is highlighting the demand for more exclusive, modern data to differentiate AI capabilities and avoid legal risks, especially by news publications as of late.
Popular AI chatbot conversation platform Character AI is looking to rectify its chatrooms with numerous new teen safety tools following lawsuits for allegedly exposing young ones to hyper-sexualized content and promoting suicide. The additions include blocks on some sensitive topics, notifications regarding time spent on the platform, and continuous disclaimers that its AI characters aren't real people. However, the most important one would be the company's separate model for under-18 users that has dialed down responses regarding romance and violence, greatly reducing the chance of inappropriate responses.
AI language learning startup Speak recently announced that it had raised $78 million in Series C funding to double its valuation to $1 billion, the round led by Accel with participation from OpenAI Startup Fund and others. Speak's app uses AI to help users practice English through conversational interactions, offering features like linguistic feedback and customizable roleplays powered by OpenAI's Realtime API. Recent upgrades include adopting Google's Conformer-CTC for more accurate speech recognition, and future plans include expanding into Spanish and French next year.
4 AI TOOLS
Naaia - Naaia is a one-of-a-kind AI compliance and risk management system that transforms regulatory obligations into tangible actions with thorough support and threat prevention measures.
Supabase - Chat with your Postgre database with Supabase, an AI-assistant that helps developers generate, run and debug queries, chart data, create functions, policies and more.
Spellar - Spellar is an AI-driven speech assistant that gives personalized feedback to enhance your speaking skills and refine your style.
Gem - Unify your recruiting tech stack with Gem, a platform powered with AI, CRM, and analytics to provide an all-in-one solution to recruiters' problems.
AI EXTRA READ
To Be Human in A World of AI (6-min read)
The rise of AI challenges humans to refine their moral judgment and intuition in decision-making, but these skills are often underdeveloped. Organizations must begin actively mustering these abilities to elevate their decision-making processes, and an article by Harvard Business Review outlines the five key imperatives for doing so.
Your AI Sherpa, Mark R. Hinkle Enterprise (TheAIE) Network |