Close Menu
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
  • Home
  • AI
  • Crypto
  • Cybersecurity
  • IT
  • Energy
  • Robotics
  • TechCrunch
  • Technology
What's Hot

Investors trust Google more than Meta when comes to spending on AI

April 30, 2026

Paragon is not collaborating with Italian authorities probing spyware attacks, report says

April 28, 2026

Microsoft cuts OpenAI revenue share as their AI alliance loosens

April 28, 2026
Facebook X (Twitter) Instagram
Trending
  • Investors trust Google more than Meta when comes to spending on AI
  • Paragon is not collaborating with Italian authorities probing spyware attacks, report says
  • Microsoft cuts OpenAI revenue share as their AI alliance loosens
  • Robotically assembled building blocks could make construction more efficient and sustainable | MIT News
  • AI showdown: Musk and Altman go to trial in fight over OpenAI’s beginnings
  • U.S., Iran seize ships as war evolves into standoff over Strait of Hormuz
  • Google launches training and inference TPUs in latest shot at Nvidia
  • Zoom teams up with World to verify humans in meetings
  • Home
  • About Us
  • Advertise
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech InnovationsRoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
Saturday, May 9
  • Home
  • AI
  • Crypto
  • Cybersecurity
  • IT
  • Energy
  • Robotics
  • TechCrunch
  • Technology
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
Home » DeepSeek’s AI reward models: What humans really want

DeepSeek’s AI reward models: What humans really want

GTBy GTApril 9, 2025 AI No Comments5 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email


Chinese AI startup DeepSeek has solved a problem that has frustrated AI researchers for several years. Its breakthrough in AI reward models could improve dramatically how AI systems reason and respond to questions.

In partnership with Tsinghua University researchers, DeepSeek has created a technique detailed in a research paper, titled “Inference-Time Scaling for Generalist Reward Modeling.” It outlines how a new approach outperforms existing methods and how the team “achieved competitive performance” compared to strong public reward models.

The innovation focuses on enhancing how AI systems learn from human preferences – a important aspect of creating more useful and aligned artificial intelligence.

What are AI reward models, and why do they matter?

AI reward models are important components in reinforcement learning for large language models. They provide feedback signals that help guide an AI’s behaviour toward preferred outcomes. In simpler terms, reward models are like digital teachers that help AI understand what humans want from their responses.

“Reward modeling is a process that guides an LLM towards human preferences,” the DeepSeek paper states. Reward modeling becomes important as AI systems get more sophisticated and are deployed in scenarios beyond simple question-answering tasks.

The innovation from DeepSeek addresses the challenge of obtaining accurate reward signals for LLMs in different domains. While current reward models work well for verifiable questions or artificial rules, they struggle in general domains where criteria are more diverse and complex.

The dual approach: How DeepSeek’s method works

DeepSeek’s approach combines two methods:

Generative reward modeling (GRM): This approach enables flexibility in different input types and allows for scaling during inference time. Unlike previous scalar or semi-scalar approaches, GRM provides a richer representation of rewards through language.

Self-principled critique tuning (SPCT): A learning method that fosters scalable reward-generation behaviours in GRMs through online reinforcement learning, one that generates principles adaptively.

One of the paper’s authors from Tsinghua University and DeepSeek-AI, Zijun Liu, explained that the combination of methods allows “principles to be generated based on the input query and responses, adaptively aligning reward generation process.”

The approach is particularly valuable for its potential for “inference-time scaling” – improving performance by increasing computational resources during inference rather than just during training.

The researchers found that their methods could achieve better results with increased sampling, letting models generate better rewards with more computing.

Implications for the AI Industry

DeepSeek’s innovation comes at an important time in AI development. The paper states “reinforcement learning (RL) has been widely adopted in post-training for large language models […] at scale,” leading to “remarkable improvements in human value alignment, long-term reasoning, and environment adaptation for LLMs.”

The new approach to reward modelling could have several implications:

More accurate AI feedback: By creating better reward models, AI systems can receive more precise feedback about their outputs, leading to improved responses over time.

Increased adaptability: The ability to scale model performance during inference means AI systems can adapt to different computational constraints and requirements.

Broader application: Systems can perform better in a broader range of tasks by improving reward modelling for general domains.

More efficient resource use: The research shows that inference-time scaling with DeepSeek’s method could outperform model size scaling in training time, potentially allowing smaller models to perform comparably to larger ones with appropriate inference-time resources.

DeepSeek’s growing influence

The latest development adds to DeepSeek’s rising profile in global AI. Founded in 2023 by entrepreneur Liang Wenfeng, the Hangzhou-based company has made waves with its V3 foundation and R1 reasoning models.

The company upgraded its V3 model (DeepSeek-V3-0324) recently, which the company said offered “enhanced reasoning capabilities, optimised front-end web development and upgraded Chinese writing proficiency.” DeepSeek has committed to open-source AI, releasing five code repositories in February that allow developers to review and contribute to development.

While speculation continues about the potential release of DeepSeek-R2 (the successor to R1) – Reuters has speculated on possible release dates – DeepSeek has not commented in its official channels.

What’s next for AI reward models?

According to the researchers, DeepSeek intends to make the GRM models open-source, although no specific timeline has been provided. Open-sourcing will accelerate progress in the field by allowing broader experimentation with reward models.

As reinforcement learning continues to play an important role in AI development, advances in reward modelling like those in DeepSeek and Tsinghua University’s work will likely have an impact on the abilities and behaviour of AI systems.

Work on AI reward models demonstrates that innovations in how and when models learn can be as important increasing their size. By focusing on feedback quality and scalability, DeepSeek addresses one of the fundamental challenges to creating AI that understands and aligns with human preferences better.

See also: DeepSeek disruption: Chinese AI innovation narrows global technology divide

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.



Source link

GT
  • Website

Keep Reading

Enterprise users swap AI pilots for deep integrations

Google, Sony Innovation Fund, and Okta back Resemble AI deepfake detection plan

Platform corrects AI algorithmic bias for eKYC

What ByteDance’s Launch Means for Enterprise

UK and Germany plan to commercialise quantum supercomputing

Frontier AI agents replace chatbots

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Investors trust Google more than Meta when comes to spending on AI

April 30, 2026

Google launches training and inference TPUs in latest shot at Nvidia

April 27, 2026

Meta tracks employee usage on Google, LinkedIn AI training project

April 25, 2026

Meta will cut 10% of workforce as company pushes deeper into AI

April 24, 2026
Latest Posts

Malicious Chrome Extension Steal ChatGPT and DeepSeek Conversations from 900K Users

April 1, 2026

Top 10 Best Server Monitoring Tools

April 1, 2026

10 Best Cybersecurity Risk Management Tools

March 31, 2026

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to RoboNewsWire, your trusted source for cutting-edge news and insights in the world of technology. We are dedicated to providing timely and accurate information on the most important trends shaping the future across multiple sectors. Our mission is to keep you informed and ahead of the curve with deep dives, expert analysis, and the latest updates in key industries that are transforming the world.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 Robonewswire. Designed by robonewswire.

Type above and press Enter to search. Press Esc to cancel.