Knowledge

When Artificial Intelligence Starts to Reason: How Reasoning Models Work

By Damiano Gasparotto, Data Scientist for Excellence Innovation

In recent years, we have witnessed a huge evolution in the world of artificial intelligence. Tools like ChatGPT, Claude or Gemini have become part of our daily lives, especially for their ability to understand and respond in natural language, generate texts and interact in an almost human way. But, behind this apparent simplicity, there are profound differences in the way these models process the information they receive.

One of the most interesting evolutions has been the one that has led the main generative AI models to become reasoning models. These new models do not limit themselves to retrieving information or completing sentences based on what they have “learned” during training, but are designed to simulate a reasoning process, that is, to build intermediate steps, chain ideas and arrive at a logical or argued conclusion. In other words, they try to behave more like a human being who reflects, and less like an automatic sentence completion engine.

To give a practical example, one could imagine two students facing a logic exercise. The first responds instinctively, without thinking too much, perhaps getting the answer right but unable to explain why, relying only on intuition or memory. The second, instead, takes a sheet of paper, writes down each step of reasoning, evaluates the options and only at the end arrives at the solution. First-generation models behave like the first student: fast, but often superficial. New reasoning models, on the other hand, work like the second: they reconstruct the logical steps, explain what they are doing and offer more solid and comprehensible answers.

From “Completing Sentences” to “Thinking Step by Step”

The concept of reasoning in linguistic models began to emerge between 2022 and 2023, when experiments introducing the “think step by step” (Chain-of-Thought, CoT) instruction showed a clear increase in the quality of answers, especially in logical-mathematical tasks. The error rate was further reduced when the Chain-of-Thought technique was combined with the Self-Consistency technique, which involves the generation of multiple different reasoning chains and the automatic selection of the most coherent logical path. These insights made it clear that the dimensional growth of models alone is not enough to obtain reliable answers in complex tasks. The Chain-of-Thought Reasoning paradigm was therefore directly introduced into the architecture of the models, a procedure that pushes the system to make the intermediate steps of reasoning explicit before formulating the final answer. This approach, now adopted by several latest-generation models, has been shown to improve performance not only in mathematics and logic, but also in decision-making and strategic scenarios.

The Mechanism of Reasoning in Language Models

When you ask a complex question to a model integrated with the reasoning mechanism, it doesn’t just look for the most probable answer, rather, it starts to “think out loud”, generating a series of sentences that simulate the steps that a person would follow to reach the conclusion.

The first model with integrated reasoning capability, widespread on a large scale, was OpenAI’s GPT-o1, followed by Deepseek-R1, which aroused great interest for having achieved, thanks to reasoning, a quality level comparable to that of the paid models of the main competitors. Anthropic’s Claude Sonnet 3.7 was instead the first hybrid reasoning model that is able to activate reasoning based on needs.

Going into more detail, the main reasoning techniques available today are:

  • Chain-of-Thought: allows the model to break down the problem and explain the intermediate steps that lead to the solution.
  • Self-Consistency: leads the model to generate multiple alternative reasonings and to choose the most recurrent or convincing one, thus reducing errors and hallucinations.
  • Tree-of-Thought: a more advanced technique that allows you to explore different branches of reasoning, evaluating which path leads to the best result, as a person would do who analyzes the pros and cons of multiple options before deciding.

These techniques integrate well with the Mixture-of-Experts architectural approach used by DeepSeek, which dynamically activates the “sub-models” best suited to the type of problem, optimizing the use of resources, without sacrificing response performance.

Pros and cons of reasoning models

When is it convenient to use a reasoning model? It depends on the objectives. For simple answers, a standard model is sufficient. For complex problems that require articulated evaluations, a reasoning model offers significant advantages.

Advantages

  • Explicit logic process: they make the reasoning process visible, allowing you to identify and correct critical steps that would remain hidden in other systems.
  • Multi-step problem management: they are skilled at dealing with complex issues that require decomposition into interconnected phases, such as mathematical or strategic planning problems.
  • Adaptability to context: they have the ability to evaluate different perspectives and consider multiple variables, which allows for more nuanced and contextually appropriate responses than systems that provide direct outputs.

Disadvantages

  • Limited mathematical accuracy: Despite structured reasoning, these models can still make errors in complex calculations or sophisticated quantitative analyses.
  • Speed-depth tradeoff: The reasoning process requires more time and computational resources, making these models significantly slower than standard alternatives.
  • Self-referentiality: Models can appear convincing even when the reasoning contains logical errors, creating a false sense of reliability that can be difficult to detect.

Furthermore, as the engines of AI agents, reasoning models suffer from some critical issues: excessive latency can compromise the responsiveness of agents in contexts that require speed, while potential conflicts between parallel reasoning processes can generate inconsistent decisions. These limitations suggest that such models are best used as end-to-end tools, rather than as components within larger architectures.

Conclusion

The application of reasoning in artificial intelligence opens new horizons that go beyond simple text generation. This evolution represents an attempt to emulate structured human thought, offering answers that are not only correct but also understandable in their logical development. The cost of this greater depth, however, is reflected in a latency that can compromise the user experience in applications that require speed and immediacy. 

In the balance between accuracy and speed, the future will probably favor hybrid systems capable of activating reasoning only when necessary, thus optimizing efficiency without sacrificing the ability to tackle complex problems.

Whistleblowing

L’Istituto del “Whistleblowing” è riconosciuto come strumento fondamentale nell’emersione di illeciti; per il suo efficace operare è pero cruciale assicurare una protezione adeguata ed equilibrata ai segnalanti. In tale ottica, al fine di garantire che i soggetti segnalanti siano meglio protetto da ritorsioni e conseguenze negative, e incoraggiare l’utilizzo dello strumento, in Italia è stato approvato il D.Lgs. n.24 del 10 marzo 2023 a recepimento della Direttiva (UE) 2019/1937 riguardante la protezione delle persone che segnalano violazioni.

Il decreto persegue l’obiettivo di rafforzare la tutela giuridica delle persone che segnalano violazioni di disposizioni normative nazionali o europee, che ledono gli interessi e/o l’integrità dell’ente pubblico o privato di appartenenza, e di cui siano venute a conoscenza nello svolgimento dell’attività lavorativa.

Segnalazione

(*) Campi obbligatori