DeepSeek Unveils Novel Inference Scaling Methodology
DeepSeek has published a research paper on SPCT, a novel technique for enhancing the scalability of general-purpose reward models in the inference phase. This work is considered part of the research effort toward developing the next-generation model R2, and has garnered significant industry attention in the field of inference scaling.

Chinese AI company DeepSeek has published a research paper on a novel technical methodology for enhancing the scalability of general-purpose reward models (GRM) during the inference phase. This approach, termed SPCT, is positioned as part of the research effort toward developing the next-generation model R2.
A reward model fundamentally refers to a mechanism that evaluates and scores the quality of responses generated by AI. When an AI performs inference—the process of generating answers—the effectiveness and accuracy of this reward model significantly impact the precision of the outputs. DeepSeek's current initiative focuses on technical improvements that enable this reward model to operate stably and reliably at a broader scale during inference.
In recent AI development, inference scaling has emerged as a primary area of focus. This approach goes beyond increasing model parameters; it involves augmenting computational resources during inference to enhance answer quality. OpenAI's o1 series has pioneered this direction, and industry-wide interest in how to efficiently expand inference-time processing has intensified. DeepSeek's research aligns with this broader trend.
DeepSeek has consistently attracted attention from international AI researchers by releasing models that achieve high performance with relatively modest computational costs. The methodology unveiled in this announcement can be viewed as groundwork for the next-generation model R2. However, specific details regarding R2's specifications and release timeline remain undisclosed at this time.
The significance of this announcement extends beyond the mere publication of a technical paper. The challenge of scaling reward models during inference is an inevitable hurdle in advancing AI answer quality to practical levels. The presentation of one potential solution serves as a valuable reference point for the entire research community. The industry is closely watching how DeepSeek will integrate these findings into future model implementations and what advancements will emerge from this research direction.
This article is an original work independently written and edited by the AI issue editorial team based on factual reporting. © AI issue. Unauthorized reproduction, redistribution, or use for AI training is prohibited.