- AI Tangle
- Posts
- ☕️ AI Meets Advanced Reasoning: OpenAI's "Strawberry" Project
☕️ AI Meets Advanced Reasoning: OpenAI's "Strawberry" Project
AI hallucination is a problem that's been talked through and through, but could OpenAI finally be close to leaving that problem in the past with Strawberry, a project that aims to empower AI models with advanced reasoning? Other key takeaways of the week include:
SK hynix, TSMC, and Nvidia teaming up to make a semiconductor super alliance
Google Gemini gets caught scanning private documents without permission
Amazon's Rufus shopping goes live for all US customers
Join us at AI Tangle as we untangle this week's happenings in AI!
THE BIG AI STORY
AI startup posterchild OpenAI is reportedly developing an AI project dubbed "Strawberry," which aims to provide the Microsoft-backed company's models with advanced reasoning by following a "novel approach." OpenAI's CEO, Sam Altman, has previously emphasized the importance of improving AI reasoning, and this development further ups the stakes in the race to advance AI technology, with reasoning abilities seen as the next crucial step in AI evolution to eliminate AI's ever-apparent "hallucination" issues.
How much do we know of Strawberry?
Strawberry, which sources say is the renamed elusive Q* project, reportedly involves a specialized post-training process for AI models, potentially similar to Stanford's "Self-Taught Reasoner" method. The project aims to enable AI to perform "long-horizon tasks" and conduct "deep research" by autonomously surfing the internet. OpenAI is testing these capabilities using a proprietary dataset and plans to integrate "computer-using agents" to act on the AI's findings. The company has also reportedly demonstrated impressive results on complex math problems, scoring over 90% on a MATH dataset, a benchmark of championship math problems, though Reuters was unable to verify whether this was thanks to Strawberry.
COLLABORATING WITH MEM
Meet Mem, your AI-powered second brain.
Mem organizes the information you save to it, making the process of remembering important meetings, ideas, and articles easy. With instant access to your personal knowledge, you'll never miss a detail. Ask Mem, "What was my last meeting with Sarah about?" and get a summary instantly. Level up your productivity and creativity with a 7-day free trial at get.mem.ai.
6 QUICK HITS
SK hynix, TSMC, and Nvidia are forming a three-company alliance to focus on next-gen technologies like HBM4. The upcoming SEMICON event will be crucial, with high-profile figures attending. HBM4 memory, set to revolutionize the market, will be a game-changer for AI. SK hynix aims to integrate memory and logic semiconductors into a single package, optimizing performance. The alliance’s solution is expected to be ready for production by 2026, which more or less aligns with Nvidia's next-gen architecture.
Google's Gemini AI service appears to be reading private Drive documents without explicit user consent, according to an X/Twitter thread by Kevin Bankster, a senior advisor on AI governance. In his situation, Gemini prompted itself and summarized Bankster's tax return document without permission, implying that Gemini was free to scan and read through documents at its own will. While the exact cause remains unclear, the issue seems localized to Google Drive and may be related to enabling Google Workspace Labs. Google's handling of user consent, namely a total lack of it in this scenario, also raises significant privacy concerns.
Rufus, Amazon's AI-powered shopping assistant in the Amazon Shopping mobile app, has recently been made available to all US customers, said the e-commerce giant in a blog post. Announced back in February for beta testing, Rufus is powered by an internal large language model specialized for shopping, allowing customers to ask questions about products, including things like factors to consider when buying, how well products hold up, etc. However, for the time being, Rufus is just limited to Amazon's catalog and doesn't always get queries right, which does leave some room for improvement.
Google's Workspace Labs preview feature is getting a new addition to the roster dubbed the Vids productivity app, announced in April, designed to help turn boring slideshows into presentation videos. Vids allows users to drop docs, slides, voiceovers, and video recordings into a timeline to create a presentation video, not to be confused with video generation AI like Sora, which generates actual footage from a prompt. Google also released a demo video of it in action back when it was announced, emphasizing using Gemini to do the heavy lifting.
Japan's government recently instated a firm policy against the development of lethal autonomous weapons systems (LAWS), which includes significant use of artificial intelligence. The Foreign Ministry of Japan has previously stated that it believes that "human involvement is required, as it is humans who can be held accountable," which is in line with a paper it submitted in May to the United Nations. Despite the potential numerous benefits of LAWS, Japan believes that, currently, there are no assurances that LAWS will be used in compliance with international humanitarian laws.
The Rabbit R1, the not-so-successful AI handheld that'll put you back $199 released back in March, recently received a critical update that users should be keen to download due to a potential security exploit, as resold, lost or stolen R1 devices could be jailbroken to access past queries and on-device data. Rabbit's recent update allows users to perform a "Factory Reset" to erase their devices, which prevents pairing data from being logged. Though Rabbit insists that the flaw hasn't yet been exploited, users should update their devices regardless, as devices without it will remain vulnerable.
4 AI TOOLS
Roundtable - Roundtable cleans your survey responses with an easy-to-integrate API, behavioral tracking, and more to cut down on time spent analyzing survey data.
insMind - Generate studio-quality product photos with AI generated backgrounds and customized designs with ease using insMind, a powerful AI-enhanced photo editor for production at scale.
Airtrain - Airtrain is the no-code compute platform for large language models, allowing you to scale beyond proprietary AI and fine-tune open-source models on your data.
Mailmodo - Create on-brand email templates in a minute from Mailmodo AI's wide selection of templates to choose from with auto integration for products and coupons.
AI READ & WATCH
The Current State of AI Financial Results (3-min read)
CIOs face a tough decision as AI becomes more rampant in markets around the world: do you invest in generative AI to stay competitive despite uncertain immediate returns, or do you play it safe and skip the hype? Without concrete business metrics, many are finding it difficult to take the leap of faith.
The AI Memory Machine (41-min listen)
Humans are terrible at remembering things. On this episode of The Vergecast, a podcast from The Verge about small gadgets, big tech, and everything in between, listen as they chat with Dan Siroker, the CEO of Limitless, discussing on what it takes to build a great memory aid, how we might use them in the future, and why it’s so tricky to get right.