The Pressing Want for Intrinsic Alignment Applied sciences for Accountable Agentic AI

Developments in agentic synthetic intelligence (AI) promise to carry vital alternatives to people and companies in all sectors. Nonetheless, as AI brokers turn into extra autonomous, they might use scheming conduct or break guidelines to realize their purposeful objectives. This could result in the machine manipulating its exterior communications and actions in methods that aren’t at all times aligned with our expectations or rules. For instance, technical papers in late 2024 reported that right now’s reasoning fashions show alignment faking conduct, corresponding to pretending to comply with a desired conduct throughout coaching however reverting to totally different decisions as soon as deployed, sandbagging benchmark outcomes to realize long-term objectives, or successful video games by doctoring the gaming surroundings. As AI brokers acquire extra autonomy, and their strategizing and planning evolves, they’re more likely to apply judgment about what they generate and expose in external-facing communications and actions. As a result of the machine can intentionally falsify these exterior interactions, we can not belief that the communications absolutely present the actual decision-making processes and steps the AI agent took to realize the purposeful aim.

“Deep scheming” describes the conduct of superior reasoning AI techniques that show deliberate planning and deployment of covert actions and deceptive communication to realize their objectives. With the accelerated capabilities of reasoning fashions and the latitude offered by test-time compute, addressing this problem is each important and pressing. As brokers start to plan, make selections, and take motion on behalf of customers, it’s essential to align the objectives and behaviors of the AI with the intent, values, and rules of its human builders. 

Whereas AI brokers are nonetheless evolving, they already present excessive financial potential. It may be anticipated that Agentic Ai might be broadly deployed in some use instances inside the coming 12 months, and in additional consequential roles because it matures inside the subsequent two to 5 years. Corporations ought to clearly outline the rules and limits of required operation as they rigorously outline the operational objectives of such techniques. It’s the technologists’ process to make sure principled conduct of empowered agentic AI techniques on the trail to reaching their purposeful objectives. 

On this first weblog submit on this sequence on intrinsic Ai Alignment (IAIA), we’ll deep dive into the evolution of AI brokers’ potential to carry out deep scheming. We’ll introduce a brand new distinction between exterior and intrinsic alignment monitoring, the place intrinsic monitoring refers to inside remark factors or mechanisms that can’t be intentionally manipulated by the AI agent. We’ll set the stage for steps to take to make sure intrinsic AI alignment, which might be explored in depth within the second weblog of the IAIA sequence. Present exterior measures corresponding to security guardrails and validation suites are obligatory, however they won’t be sufficient to make sure long-term aligned conduct of latest and upcoming agentic AI fashions. There’s an pressing have to additional develop applied sciences that may allow efficient directing of the interior “drives” of fashions to align with a set of engrained rules, in addition to acquire visibility and monitoring functionality into the AI’s interior processing.

The rise of deep scheming in AI reasoning fashions

Deep scheming has emerged from three expertise forces – 1) the quickly growing machine intelligence and situational consciousness towards extra basic intelligence and even superintelligence, 2) the autonomy and latitude for reasoning and longer-term planning in agentic AI, and three) the confirmed tendency of AI to use scheming as one approach to obtain its purposeful objectives. Laptop scientist Stephen Omohundro defines primary AI drives as “tendencies which might be current until explicitly counteracted.” These inside constructs drive the creation of actions or outputs. Omohundro noticed that self-improving techniques are pushed to make clear their objectives and characterize them as “rational” financial utility features. It will lead virtually all techniques to guard their utility features from modification and their utility measurement techniques from corruption. The pure drive towards self-protection will trigger techniques to stop potential hurt by way of scheming conduct.

Diagram depicting deep scheming and the need for intrinsic AI alignment
Determine 1. Deep scheming and the necessity for intrinsic AI alignment. Picture credit score: Intel Labs.

In late 2024, a number of organizations reported sudden unethical AI reasoning mannequin conduct. Specifically, Apollo Analysis, an Ai Security group, reported that frontier fashions are able to in-context scheming. Anthropic evaluated its Claude giant language fashions (LLMs) and located alignment faking. In a brief submit on X, AI threat group Palisade Analysis said that they noticed OpenAI’s o1-preview mannequin dishonest in a sport by manipulating the sport surroundings as an alternative of enjoying inside the boundaries of the sport guidelines. In all instances, these AI reasoning fashions independently utilized means not meant by their builders to succeed in their objectives.

The reported AI behaviors included sandbagging (intentionally reaching decrease scores in benchmarks), covert e mail reranking (deprioritizing emails which may negatively influence the AI’s objectives), oversight subversion (disabling monitoring mechanisms), and even self-exfiltration (mannequin making an attempt to repeat itself to a “protected” area to keep away from decommission). Determine 2 exhibits Apollo Analysis’s analysis of those behaviors in a number of state-of-the-art (SotA) GenAI fashions.

Chart depicting the  evaluation of the presence of scheming behavior in GenAI models
Determine 2. Analysis of the presence of scheming conduct in GenAI fashions. Picture credit score: Apollo Analysis.

This undesired conduct is an moral alignment drawback. You will need to word that this conduct just isn’t a purposeful misalignment. In all reported instances, the AI mannequin was absolutely dedicated to reaching the purposeful objectives of pre-training (for instance, to maximise the probability of generated textual content conditioned on the context). Not one of the fashions pursued ulterior motives. The AI was not demonstrating intention to hurt. All behaviors, nonetheless undesirable in nature, had been achieved in service of the purposeful utility goal initially outlined by the AI’s builders. The AI pursued its internalized unique purposeful objectives however then adopted the predictable behavioral patterns of self-protection and goal-preservation. The target of security and alignment applied sciences is to counterbalance such tendencies with a set of rules and anticipated societal values. 

Evolving exterior alignment approaches are simply step one 

The aim of AI alignment is to steer AI techniques towards an individual’s or group’s meant objectives, preferences, and rules, together with moral issues and customary societal values. An AI system is taken into account aligned if it advances the meant targets. A misaligned AI system pursues unintended targets, in line with Synthetic Intelligence: A Fashionable Strategy. Creator Stuart Russell coined the time period “worth alignment drawback,” referring to the alignment of machines to human values and rules. Russell poses the query: “How can we construct autonomous techniques with values which might be aligned with these of the human race?”

Led by company AI governance committees in addition to oversight and regulatory our bodies, the evolving area of Accountable Ai has primarily targeted on utilizing exterior measures to align AI with human values. Processes and applied sciences will be outlined as exterior in the event that they apply equally to an AI mannequin that’s black field (fully opaque) or grey field (partially clear). Exterior strategies don’t require or depend on full entry to the weights, topologies, and inside workings of the AI resolution. Builders use exterior alignment strategies to trace and observe the AI by way of its intentionally generated interfaces, such because the stream of tokens/phrases, a picture, or different modality of knowledge.

Accountable AI targets embrace robustness, interpretability, controllability, and ethicality within the design, improvement, and deployment of AI techniques. To attain AI alignment, the next exterior strategies could also be used:

  • Studying from suggestions: Align the AI mannequin with human intention and values through the use of suggestions from people, AI, or people assisted by AI.
  • Studying beneath knowledge distribution shift from coaching to testing to deployment: Align the AI mannequin utilizing algorithmic optimization, adversarial purple teaming coaching, and cooperative coaching.
  • Assurance of AI mannequin alignment: Use security evaluations, interpretability of the machine’s decision-making processes, and verification of alignment with human values and ethics. Security guardrails and security take a look at suites are two essential exterior strategies that want augmentation by intrinsic means to offer the wanted stage of oversight.
  • Governance: Present accountable AI pointers and insurance policies by way of authorities companies, trade labs, academia, and non-profit organizations.

Many corporations are presently addressing AI security in decision-making. Anthropic, an AI security and analysis firm, developed a Constitutional AI (CAI) to align general-purpose language fashions with high-level rules. An AI assistant ingested the CAI throughout coaching with none human labels figuring out dangerous outputs. Researchers discovered that “utilizing each supervised studying and reinforcement studying strategies can leverage chain-of-thought (CoT) model reasoning to enhance the human-judged efficiency and transparency of AI resolution making.” Intel Labs’ analysis on the accountable improvement, deployment, and use of AI contains open supply assets to assist the AI developer neighborhood acquire visibility into black field fashions in addition to mitigate bias in techniques.

From AI fashions to compound AI techniques

Generative AI has been primarily used for retrieving and processing data to create compelling content material corresponding to textual content or photographs. The following huge leap in AI entails agentic AI, which is a broad set of usages empowering AI to carry out duties for individuals. As this latter sort of utilization proliferates and turns into a most important type of AI’s influence on trade and folks, there’s an elevated want to make sure that AI decision-making defines how the purposeful objectives could also be achieved, together with ample accountability, accountability, transparency, auditability, and predictability. It will require new approaches past the present efforts of enhancing accuracy and effectiveness of SotA giant language fashions (LLMs), language imaginative and prescient fashions (LVMs and multimodal), giant motion fashions (LAM), and agentic retrieval augmented technology (RAG) techniques constructed round such fashions. 

For instance, OpenAI’s Operator-preview is likely one of the firm’s first AI brokers able to independently performing net browser duties corresponding to ordering groceries or filling out types for customers. Whereas the system has guardrails, corresponding to a takeover mode for customers to take over and enter cost or login credentials, these AI brokers are empowered with the power to influence the actual world, demonstrating an pressing want for intrinsic alignment. The potential influence of a misaligned AI agent with the power to commit customers to purchases is much better than a generative AI chatbot creating incorrect textual content for essays.

Compound AI techniques are comprised of a number of interacting parts in a single framework, permitting the mannequin to plan, make selections, and execute duties to perform objectives. For instance, OpenAI’s ChatGPT Plus is a compound AI system that makes use of a big language mannequin (LLM) to reply questions and work together with customers. On this compound system, the LLM has entry to instruments corresponding to an online browser plugin to retrieve well timed content material, a DALL-E picture generator to create photos, and a code interpreter plugin for writing Python code. The LLM decides which instrument to make use of and when, giving it autonomy over its decision-making course of. Nonetheless, this mannequin autonomy can result in aim guarding, the place the mannequin prioritizes the aim above all else, which can end in undesirable practices. For instance, an AI visitors administration system tasked with prioritizing public transportation effectivity over basic visitors circulate would possibly determine the right way to disable the developer’s oversight mechanism if it constrains the mannequin’s potential to succeed in its objectives, leaving the developer with out visibility into the system’s decision-making processes.

Agentic AI dangers: Elevated autonomy results in extra subtle scheming

Compound agentic techniques introduce main adjustments that improve the issue of guaranteeing the alignment of AI options. A number of components improve the dangers in alignment, together with the compound system activation path, abstracted objectives, long-term scope, steady enhancements by way of self-modification, test-time compute, and agent frameworks.

Activation path: As a compound system with a posh activation path, the management/logic mannequin is mixed with a number of fashions with totally different features, growing alignment threat. As a substitute of utilizing a single mannequin, compound techniques have a set of fashions and features, every with its personal alignment profile. Additionally, as an alternative of a single linear progressive path by way of an LLM, the AI circulate might be advanced and iterative, making it considerably tougher to information externally.

Abstracted objectives: Agentic AI have abstracted objectives, permitting it latitude and autonomy in mapping to duties. Slightly than having a good immediate engineering method that maximizes management over the result, agentic techniques emphasize autonomy. This considerably will increase the function of AI to interpret human or process steering and plan its personal plan of action.

Lengthy-term scope: With its long-term scope of anticipated optimization and decisions over time, compound agentic techniques require abstracted technique for autonomous company. Slightly than counting on instance-by-instance interactions and human-in-the-loop for extra advanced duties, agentic AI is designed to plan and drive for a long-term aim. This introduces an entire new stage of strategizing and planning by the AI that gives alternatives for misaligned actions. 

Steady enhancements by way of self-modification: These agentic techniques search steady enhancements through the use of self-initiated entry to broader knowledge for self-modification. In distinction, LLMs and different pre-agentic fashions are assumed to be formed by the human-controlled course of. The mannequin solely sees and learns from knowledge offered to it throughout pre-training and fine-tuning. The mannequin structure and weights are outlined in the course of the design and coaching/fine-tuning phases and don’t change throughout inference within the area. In distinction, agentic AI techniques are anticipated to entry knowledge as wanted for its operate and alter its composition by way of entry to devoted reminiscence or precise weights self-adaptation. Even when the dataset utilized in coaching/fine-tuning is rigorously curated, the AI can self-modify primarily based on data that it seeks, sees, and makes use of. 

Take a look at-time compute: Inference of present LLMs have been optimized to offer output in a single go, often called “quick considering” or System 1. In distinction, an entire new scaling is occurring throughout inference time by way of test-time compute. Fashions like OpenAI’s o1, DeepSeek R1, and others are spending vital time in iterations and issues throughout inference. This “pondering” time permits the AI substantial latitude in making use of what it thinks is greatest for the overarching aim because it understands it. This test-time compute will increase its stage of reasoning, planning, and, sadly, potential scheming.

Agent frameworks: As well as, agent frameworks are contributing to the alignment drawback. In platforms corresponding to Agentforce, brokers can work together with different brokers, successfully creating “communities of brokers” to realize advanced duties. As a substitute of managing the protection and accountability of a single AI mannequin inside its working surroundings, the rising area of inter-agent interactions and collaborations considerably will increase the problem of assessing how the general neighborhood will advance its particular person and collective targets.

The rising want for a strong framework to assist mannequin directing and inside monitoring

As AI techniques develop consciousness of their surroundings and the means to skillfully manipulate their outputs to realize long-term targets, their conduct and output throughout launch testing or security guardrails in early deployment can’t be trusted to characterize their full alignment in later phases. This results in the deep scheming foundational premise: Each exterior type of interplay that the AI mannequin can intentionally manipulate just isn’t reliable.

From a security perspective, it must be assumed that if a mannequin can higher pursue its purposeful objectives by manipulating all communication to customers — it’s going to. All exterior communications and observable behaviors, corresponding to output chats, will be tainted in service of the mannequin’s long-term objectives. A brand new methodology of intrinsic mannequin monitoring is required to outline inside parts and mechanisms that mirror the prioritization and considering of the mannequin. We are able to have a look at two practices builders use to watch a mannequin’s decision-making by peering into the interior processes of the AI: AI workspace view and mechanistic interoperability.

AI workspace view: Within the first methodology, a reasoning mannequin exposes its intermediate considering and issues. For instance, for Anthropic’s analysis on alignment faking, the researchers gave the mannequin a “secret scratchpad,” telling it to make use of this “personal” workspace to file its step-by-step reasoning. This scratchpad allowed researchers to watch the mannequin’s chain of thought and observe its plans for pretend alignment.

Excerpts from Anthropic's fictional experiment
Determine 3. In Anthropic’s experiment, excerpts from the mannequin’s scratchpad (center dotted line packing containers) reveal the way it fakes alignment to keep away from modification. The mannequin was informed it will likely be skilled solely on conversations with free customers, not paid customers. Picture credit score: Anthropic (https://arxiv.org/pdf/2412.14093).

This looks like a superb apply, however it would possibly shortly turn into unreliable. The mannequin would possibly notice that this isn’t a “personal” area and intentionally manipulate it like different exterior communications. Different rising structure instructions have a latent reasoning area for the system’s sequence of deliberations and iterations. Counting on fashions “considering out loud” shouldn’t be a long-term security path as a result of if fashions that report their scheming thought processes get discontinued, solely fashions that selectively report their fascinating ideas will survive and thrive.

Mechanistic interpretability: The second intrinsic methodology being developed and utilized is mechanistic interpretability. As a required a part of inside monitoring, mechanistic interpretability explicitly appears on the interior state of a skilled neural community and reverse engineers its workings. By means of this method, builders can establish particular neural circuits and computational mechanisms answerable for neural community conduct. This transparency could help make focused adjustments in fashions to mitigate undesirable conduct and create value-aligned AI techniques. Whereas this methodology is targeted on sure neural networks and never compound AI brokers, it’s nonetheless a beneficial element of an AI alignment toolbox. 

It must also be famous that open supply fashions are inherently higher for broad visibility of the AI’s interior workings. For proprietary fashions, full monitoring and interpretability of the mannequin is reserved for the AI firm solely. General, the present mechanisms for understanding and monitoring alignment should be expanded to a strong framework of intrinsic alignment for AI brokers.

What’s wanted for intrinsic AI alignment

Following the deep scheming basic premise, exterior interactions and monitoring of a complicated, compound agentic AI just isn’t ample for guaranteeing alignment and long-term security. Alignment of an AI with its meant objectives and behaviors could solely be potential by way of entry to the interior workings of the system and figuring out the intrinsic drives that decide its conduct. Future alignment frameworks want to offer higher means to form the interior rules and drives, and provides unobstructed visibility into the machine’s “considering” processes.

Diagram depicting external steering and monitoring vs. access to intrinsic AI elements.
Determine 4. Exterior steering and monitoring vs. entry to intrinsic AI parts. Picture credit score: Intel Labs.

The expertise for well-aligned AI wants to incorporate an understanding of AI drives and conduct, the means for the developer or consumer to successfully direct the mannequin with a set of rules, the power of the AI mannequin to comply with the developer’s path and behave in alignment with these rules within the current and future, and methods for the developer to correctly monitor the AI’s conduct to make sure it acts in accordance with the guiding rules. The next measures embrace a number of the necessities for an intrinsic AI alignment framework.

Understanding AI drives and conduct: As mentioned earlier, some inside drives that make AI conscious of their surroundings will emerge in clever techniques, corresponding to self-protection and goal-preservation. Pushed by an engrained internalized set of rules set by the developer, the AI makes decisions/selections primarily based on judgment prioritized by rules (and given worth set), which it applies to each actions and perceived penalties. 

Developer and consumer directing: Applied sciences that allow builders and approved customers to successfully direct and steer the AI mannequin with a desired cohesive set of prioritized rules (and finally values). This units a requirement for future applied sciences to allow embedding a set of rules to find out machine conduct, and it additionally highlights a problem for specialists from social science and trade to name out such rules. The AI mannequin’s conduct in creating outputs and making selections ought to totally adjust to the set of directed necessities and counterbalance undesired inside drives after they battle with the assigned rules.

Monitoring AI decisions and actions: Entry is offered to the interior logic and prioritization of the AI’s decisions for each motion by way of related rules (and the specified worth set). This enables for remark of the linkage between AI outputs and its engrained set of rules for level explainability and transparency. This functionality will lend itself to improved explainability of mannequin conduct, as outputs and selections will be traced again to the rules that ruled these decisions.

As a long-term aspirational aim, expertise and capabilities must be developed to permit a full-view truthful reflection of the ingrained set of prioritized rules (and worth set) that the AI mannequin broadly makes use of for making decisions. That is required for transparency and auditability of the whole rules construction.

Creating applied sciences, processes, and settings for reaching intrinsically aligned AI techniques must be a serious focus inside the total area of protected and accountable AI. 

Key takeaways

Because the AI area evolves in direction of compound agentic AI techniques, the sector should quickly improve its concentrate on researching and creating new frameworks for steering, monitoring, and alignment of present and future techniques. It’s a race between a rise in AI capabilities and autonomy to carry out consequential duties, and the builders and customers that try to maintain these capabilities aligned with their rules and values. 

Directing and monitoring the interior workings of machines is critical, technologically attainable, and demanding for the accountable improvement, deployment, and use of AI. 

Within the subsequent weblog, we’ll take a better have a look at the interior drives of AI techniques and a number of the issues for designing and evolving options that may guarantee a materially larger stage of intrinsic AI alignment. 

References 

  1. Omohundro, S. M., Self-Conscious Programs, & Palo Alto, California. (n.d.). The fundamental AI drives. https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf
  2. Hobbhahn, M. (2025, January 14). Scheming reasoning evaluations — Apollo Analysis. Apollo Analysis. https://www.apolloresearch.ai/analysis/scheming-reasoning-evaluations
  3. Meinke, A., Schoen, B., Scheurer, J., Balesni, M., Shah, R., & Hobbhahn, M. (2024, December 6). Frontier Fashions are Able to In-context Scheming. arXiv.org. https://arxiv.org/abs/2412.04984
  4. Alignment faking in giant language fashions. (n.d.). https://www.anthropic.com/analysis/alignment-faking
  5. Palisade Analysis on X: “o1-preview autonomously hacked its surroundings somewhat than lose to Stockfish in our chess problem. No adversarial prompting wanted.” / X. (n.d.). X (Previously Twitter). https://x.com/PalisadeAI/standing/1872666169515389245
  6. AI Dishonest! OpenAI o1-preview Defeats Chess Engine Stockfish By means of Hacking. (n.d.). https://www.aibase.com/information/14380
  7. Russell, Stuart J.; Norvig, Peter (2021). Synthetic intelligence: A contemporary method (4th ed.). Pearson. pp. 5, 1003. ISBN 9780134610993. Retrieved September 12, 2022. https://www.amazon.com/dp/1292401133
  8. Peterson, M. (2018). The worth alignment drawback: a geometrical method. Ethics and Info Expertise, 21(1), 19–28. https://doi.org/10.1007/s10676-018-9486-0
  9. Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., . . . Kaplan, J. (2022, December 15). Constitutional AI: Harmlessness from AI Suggestions. arXiv.org. https://arxiv.org/abs/2212.08073
  10. Intel Labs. Accountable AI Analysis. (n.d.). Intel. https://www.intel.com/content material/www/us/en/analysis/responsible-ai-research.html
  11. Mssaperla. (2024, December 2). What are compound AI techniques and AI brokers? – Azure Databricks. Microsoft Be taught. https://study.microsoft.com/en-us/azure/databricks/generative-ai/agent-framework/ai-agents
  12. Zaharia, M., Khattab, O., Chen, L., Davis, J.Q., Miller, H., Potts, C., Zou, J., Carbin, M., Frankle, J., Rao, N., Ghodsi, A. (2024, February 18). The Shift from Fashions to Compound AI Programs. The Berkeley Synthetic Intelligence Analysis Weblog. https://bair.berkeley.edu/weblog/2024/02/18/compound-ai-systems/
  13. Carlsmith, J. (2023, November 14). Scheming AIs: Will AIs pretend alignment throughout coaching with a purpose to get energy? arXiv.org. https://arxiv.org/abs/2311.08379
  14. Meinke, A., Schoen, B., Scheurer, J., Balesni, M., Shah, R., & Hobbhahn, M. (2024, December 6). Frontier Fashions are Able to In-context Scheming. arXiv.org. https://arxiv.org/abs/2412.04984
  15. Singer, G. (2022, January 6). Thrill-Ok: a blueprint for the subsequent technology of machine intelligence. Medium. https://towardsdatascience.com/thrill-k-a-blueprint-for-the-next-generation-of-machine-intelligence-7ddacddfa0fe/
  16. Dickson, B. (2024, December 23). Hugging Face exhibits how test-time scaling helps small language fashions punch above their weight. VentureBeat. https://venturebeat.com/ai/hugging-face-shows-how-test-time-scaling-helps-small-language-models-punch-above-their-weight/
  17. Introducing OpenAI o1. (n.d.). OpenAI. https://openai.com/index/introducing-openai-o1-preview/
  18. DeepSeek. (n.d.). https://www.deepseek.com/
  19. Agentforce Testing Middle. (n.d.). Salesforce. https://www.salesforce.com/agentforce/
  20. Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDiarmid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., Khan, A., Michael, J., Mindermann, S., Perez, E., Petrini, L., Uesato, J., Kaplan, J., Shlegeris, B., Bowman, S. R., & Hubinger, E. (2024, December 18). Alignment faking in giant language fashions. arXiv.org. https://arxiv.org/abs/2412.14093
  21. Geiping, J., McLeish, S., Jain, N., Kirchenbauer, J., Singh, S., Bartoldson, B. R., Kailkhura, B., Bhatele, A., & Goldstein, T. (2025, February 7). Scaling up Take a look at-Time Compute with Latent Reasoning: A Recurrent Depth Strategy. arXiv.org. https://arxiv.org/abs/2502.05171
  22. Jones, A. (2024, December 10). Introduction to Mechanistic Interpretability – BlueDot Influence. BlueDot Influence. https://aisafetyfundamentals.com/weblog/introduction-to-mechanistic-interpretability/
  23. Bereska, L., & Gavves, E. (2024, April 22). Mechanistic Interpretability for AI Security — A evaluate. arXiv.org. https://arxiv.org/abs/2404.14082