Elon Musk launches Grok 1.5 Vision: What is it and can it compete with GPT-4, Gemini 1.5 Pro
Elon Musk’s AI venture, xAI, recently introduced an upgraded version of the Grok 1.5 model – Grok 1.5 Vision. This new model integrates computer vision capabilities, allowing it to interpret visual content and answer questions about images. This development comes shortly after OpenAI introduced the GPT-4 model, which also has computer vision features.
xAI announced the upgrade via their official X account (formerly Twitter), sharing details about the model’s capabilities via a blog post. While the core features of Grok 1.5 remain consistent with this updated version, additional vision capabilities promise to open new horizons in AI interaction with the real world.
Also read: Apple gets a big AI boost with iOS 18 update: Check out what AI features your iPhone can get
Benchmarks and performance
Benchmark tests performed by xAI, show the Grok 1.5 Vision’s performance against a variety of metrics, including the company’s proprietary RealWorldQA benchmark. This benchmark evaluates the model’s “real-world spatial understanding.” Additionally, the model has been evaluated in other tests such as MMMU and ChartQA. Impressively, in RealWorldQA, Grok outperformed OpenAI’s GPT-4 with Vision and Google’s Gemini 1.5 Pro, although it fell behind in other tests.
Also read: OpenAI announces new Tokyo office, hires former Amazon employee to advance AI
Understanding computer vision
Computer vision is an exciting field in computer science that focuses on enabling computers, including AI models, to recognize and interpret real-world objects through images and videos. Essentially, it aims to empower machines with human-like visual capabilities.
Several leading technology companies are investing heavily in developing vision-focused AI models. Google’s Gemini 1.5 Pro and OpenAI’s GPT-4 with Vision are notable competitors in this space.
The potential applications for computer vision are vast and transformative. For example, Healthify, an Indian platform for tracking calories and nutrition, recently integrated a feature called ‘Snap’. Here, users can take photos of dishes and AI suggests recipe modifications as well as healthier exercise regimens to offset calorie intake. In addition, computer vision holds promise for medical diagnostics, autonomous vehicles, etc
One more thing! We are now on WhatsApp Channel! Follow us there to never miss any updates from the world of technology. To follow HT Tech channel on WhatsApp, click This to join now!