Amazon will present its framework for engineering trustworthy AI agents at VB Transform 2026
Summary
<p>AI agents are increasingly proficient at executing business tasks autonomously, but IT leaders are cautious about granting permissions to access enterprise systems. </p><p>Part of the challenge lies in how <a href="https://venturebeat.com/technology/karpathys-march-of-nines-shows-why-90-ai-reliability-isnt-even-close-to">AI reliability</a> is measured. Industry standards often rely on EVAL scores, which provide a static snapshot of performance rather than a measure of overall reliability. These metrics can fail to capture predictability across prompts, environments, and input types, said Bryan Silverthorn, director of the AGI Autonomy research lab at Amazon.</p><p>Amazon’s AGI autonomy research lab is moving beyond raw performance benchmarks, focusing instead on a structured framework centered on consistency, robustness, predictability, and safety, Silverthorn told VentureBeat during an interview ahead of his session at <a href="https://venturebeat.com/vbtransform2026">VB Transform 2026</a>.</p><p>Rather than assuming that models can be harnessed into safety, Amazon’s approach emphasizes decoupled systems, such as sandboxed environments where agents propose changes that are reviewed by humans before implementation. </p><p>This strategy aims to bridge the trust gap by prioritizing verifiable interactions, even in highly sensitive domains like finance, where the potential damage an agent can cause is significant.</p><p>In VentureBeat’s Q2 Pulse Research survey of over 100 senior technology leaders and buyers, just 4% said they are comfortable relying on model guardrails alone. When asked what worries them most about model guardrails, 40% said unauthorized access to tools or data and 27% cited prompt manipulation or injection.</p><p>At VB Transform, Silverthorn will share details of Amazon’s approach to trustworthy agentic AI and how companies can move from single-agent wrappers to multi-tool architectures that can self-correct mid-execution during his session titled <b>Closing the capability-reliability gap: Inside Amazon’s framework for engineering trustworthy agents</b>.</p><p>Another agentic ops and evals-focused session at VentureBeat’s flagship conference, happening July 14 and 15 in Menlo Park, is <b>Intelligence at scale: How Waymo builds safe, efficient AI for the physical world</b> with speaker Manasi Joshi, director of systems intelligence and machine learning at Waymo. </p><p><i>Interested in attending VB Transform 2026? A select number of complimentary passes are also available to senior technology leaders. </i><a href="mailto:events@venturebeat.com"><i>Contact us </i></a><i>to get yours. You can also purchase tickets </i><a href="https://web.cvent.com/event/27401f5a-f49e-46fc-90a3-eee31c2a4818/register"><i>here</i></a><i>.</i> </p>