Close Menu
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
  • Home
  • AI
  • Crypto
  • Cybersecurity
  • IT
  • Energy
  • Robotics
  • TechCrunch
  • Technology
What's Hot

Investors trust Google more than Meta when comes to spending on AI

April 30, 2026

Paragon is not collaborating with Italian authorities probing spyware attacks, report says

April 28, 2026

Microsoft cuts OpenAI revenue share as their AI alliance loosens

April 28, 2026
Facebook X (Twitter) Instagram
Trending
  • Investors trust Google more than Meta when comes to spending on AI
  • Paragon is not collaborating with Italian authorities probing spyware attacks, report says
  • Microsoft cuts OpenAI revenue share as their AI alliance loosens
  • Robotically assembled building blocks could make construction more efficient and sustainable | MIT News
  • AI showdown: Musk and Altman go to trial in fight over OpenAI’s beginnings
  • U.S., Iran seize ships as war evolves into standoff over Strait of Hormuz
  • Google launches training and inference TPUs in latest shot at Nvidia
  • Zoom teams up with World to verify humans in meetings
  • Home
  • About Us
  • Advertise
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech InnovationsRoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
Thursday, May 7
  • Home
  • AI
  • Crypto
  • Cybersecurity
  • IT
  • Energy
  • Robotics
  • TechCrunch
  • Technology
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
Home » Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks

Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks

GTBy GTNovember 13, 2025 AI No Comments4 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email


Baidu’s latest ERNIE model, a super-efficient multimodal AI, is beating GPT and Gemini on key benchmarks and targets enterprise data often ignored by text-focused models.

For many businesses, valuable insights are locked in engineering schematics, factory-floor video feeds, medical scans, and logistics dashboards. Baidu’s new model, ERNIE-4.5-VL-28B-A3B-Thinking, is designed to fill this gap.

What’s interesting to enterprise architects is not just its multimodal capability, but its architecture. It’s described as a “lightweight” model, activating only three billion parameters during operation. This approach targets the high inference costs that often stall AI-scaling projects. Baidu is betting on efficiency as a path to adoption, training the system as a foundation for “multimodal agents” that can reason and act, not just perceive.

Complex visual data analysis capabilities supported by AI benchmarks

Baidu’s multimodal ERNIE AI model excels at handling dense, non-text data. For example, it can interpret a “Peak Time Reminder” chart to find optimal visiting hours, a task that reflects the resource-scheduling challenges in logistics or retail.

ERNIE 4.5 also shows capability in technical domains, like solving a bridge circuit diagram by applying Ohm’s and Kirchhoff’s laws. For R&D and engineering arms, a future assistant could validate designs or explain complex schematics to new hires.

This capability is supported by Baidu’s benchmarks, which show ERNIE-4.5-VL-28B-A3B-Thinking outperforming competitors like GPT-5-High and Gemini 2.5 Pro on some key tests:

MathVista: ERNIE (82.5) vs Gemini (82.3) and GPT (81.3)ChartQA: ERNIE (87.1) vs Gemini (76.3) and GPT (78.2)VLMs Are Blind: ERNIE (77.3) vs Gemini (76.5) and GPT (69.6)

It’s worth noting, of course, that AI benchmarks provide a guide but can be flawed. Always perform internal tests for your needs before deploying any AI model for mission-critical applications.

Baidu shifts from perception to automation with its latest ERNIE AI model

The primary hurdle for enterprise AI is moving from perception (“what is this?”) to automation (“what now?”). ERNIE 4.5 claims to address this by integrating visual grounding with tool use.

Asking the multimodal AI to find all people wearing suits in an image and return their coordinates in JSON format works. The model generates the structured data, a function easily transferable to a production line for visual inspection or to a system auditing site images for safety compliance.

The model also manages external tools and can autonomously zoom in on a photograph to read small text. If it faces an unknown object, it can trigger an image search to identify it. This represents a less passive form of AI that could power an agent to not only flag a data centre error, but also zoom in on the code, search the internal knowledge base, and suggest the fix.

Unlocking business intelligence with multimodal AI

Baidu’s latest ERNIE AI model also targets corporate video archives from training sessions and meetings to security footage. It can extract all on-screen subtitles and map them to their precise timestamps.

It also demonstrates temporal awareness, finding specific scenes (like those “filmed on a bridge”) by analysing visual cues. The clear end-goal is making vast video libraries searchable, allowing an employee to find the exact moment a specific topic was discussed in a two-hour webinar they may have dozed off a couple of times during.

Baidu provides deployment guidance for several paths, including transformers, vLLM, and FastDeploy. However, the hardware requirements are a major barrier. A single-card deployment needs 80GB of GPU memory. This is not a tool for casual experimentation, but for organisations with existing and high-performance AI infrastructure.

For those with the hardware, Baidu’s ERNIEKit toolkit allows fine-tuning on proprietary data; a necessity for most high-value use cases. Baidu is providing its latest ERNIE AI model with an Apache 2.0 licence that permits commercial use, which is essential for adoption.

The market is finally moving toward multimodal AI that can see, read, and act within a specific business context, and the benchmarks suggest it’s doing so with impressive capability. The immediate task is to identify high-value visual reasoning jobs within your own operation and weigh them against the substantial hardware and governance costs.

See also: Wiz: Security lapses emerge amid the global AI race

Banner for AI & Big Data Expo by TechEx events.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.



Source link

GT
  • Website

Keep Reading

Enterprise users swap AI pilots for deep integrations

Google, Sony Innovation Fund, and Okta back Resemble AI deepfake detection plan

Platform corrects AI algorithmic bias for eKYC

What ByteDance’s Launch Means for Enterprise

UK and Germany plan to commercialise quantum supercomputing

Frontier AI agents replace chatbots

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Investors trust Google more than Meta when comes to spending on AI

April 30, 2026

Google launches training and inference TPUs in latest shot at Nvidia

April 27, 2026

Meta tracks employee usage on Google, LinkedIn AI training project

April 25, 2026

Meta will cut 10% of workforce as company pushes deeper into AI

April 24, 2026
Latest Posts

Malicious Chrome Extension Steal ChatGPT and DeepSeek Conversations from 900K Users

April 1, 2026

Top 10 Best Server Monitoring Tools

April 1, 2026

10 Best Cybersecurity Risk Management Tools

March 31, 2026

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to RoboNewsWire, your trusted source for cutting-edge news and insights in the world of technology. We are dedicated to providing timely and accurate information on the most important trends shaping the future across multiple sectors. Our mission is to keep you informed and ahead of the curve with deep dives, expert analysis, and the latest updates in key industries that are transforming the world.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 Robonewswire. Designed by robonewswire.

Type above and press Enter to search. Press Esc to cancel.