Archive 02.10.2025

Page 8 of 8
1 6 7 8

Unstructured document prep for agentic workflows

If you’ve ever burned hours wrangling PDFs, screenshots, or Word files into something an agent can use, you know how brittle OCR and one-off scripts can be. They break on layout changes, lose tables, and slow launches.

This isn’t just an occasional nuisance. Analysts estimate that ~80% of enterprise data is unstructured. And as retrieval-augmented generation (RAG) pipelines mature, they’re becoming “structure-aware,” because flat OCR collapse under the weight of real-world documents.

Unstructured data is the bottleneck. Most agent workflows stall because documents are messy and inconsistent, and parsing quickly turns into a side project that expands scope. 

But there’s a better option: Aryn DocParse, now integrated into DataRobot, lets agents turn messy documents into structured fields reliably and at scale, without custom parsing code.

What used to take days of scripting and troubleshooting can now take minutes: connect a source — even scanned PDFs — and feed structured outputs straight into RAG or tools. Preserving structure (headings, sections, tables, figures) reduces silent errors that cause rework, and answers improve because agents retain the hierarchy and table context needed for accurate retrieval and grounded reasoning.

Why this integration matters

For developers and practitioners, this isn’t just about convenience. It’s about whether your agent workflows make it to production without breaking under the chaos of real-world document formats.

The impact shows up in three key ways:

Easy document prep
What used to take days of scripting and cleanup now happens in a single step. Teams can add a new source — even scanned PDFs — and feed it into RAG pipelines the same day, with fewer scripts to maintain and faster time to production.

Structured, context-rich outputs
DocParse preserves hierarchy and semantics, so agents can tell the difference between an executive summary and a body paragraph, or a table cell and surrounding text. The result: simpler prompts, clearer citations, and more accurate answers.

More reliable pipelines at scale
A standardized output schema reduces breakage when document layouts change. Built-in OCR and table extraction handle scans without hand-tuned regex, lowering maintenance overhead and cutting down on incident noise.

What you can do with it

Under the hood, the integration brings together four capabilities practitioners have been asking for:

Broad format coverage
From PDFs and Word docs to PowerPoint slides and common image formats, DocParse handles the formats that usually trip up pipelines — so you don’t need separate parsers for every file type.

Layout preservation for precise retrieval
Document hierarchy and tables are retained, so answers reference the right sections and cells instead of collapsing into flat text. Retrieval stays grounded, and citations actually point to the right spot.

Seamless downstream use
Outputs flow directly into DataRobot workflows for retrieval, prompting, or function tools. No glue code, no brittle handoffs — just structured inputs ready for agents.

One place to build, operate, and govern AI agents

This integration isn’t just about cleaner document parsing. It closes a critical gap in the agent workflow. Most point tools or DIY scripts stall at the handoffs, breaking when layouts shift or pipelines expand. 

This integration is part of a bigger shift: moving from toy demos to agents that can reason over real enterprise knowledge, with governance and reliability built in so they can stand up in production.

That means you can build, operate, and govern agentic applications in one place, without juggling separate parsers, glue code, or fragile pipelines. It’s a foundational step in enabling agents that can reason over real enterprise knowledge with confidence.

From bottleneck to building block

Unstructured data doesn’t have to be the step that stalls your agent workflows. With Aryn now integrated into DataRobot, agents can treat PDFs, Word files, slides, and scans like clean, structured inputs — no brittle parsing required.

Connect a source, parse to structured JSON, and feed it into RAG or tools the same day. It’s a simple change that removes one of the biggest blockers to production-ready agents.

The best way to understand the difference is to try it on your own messy PDFs, slides, or scans,  and see how much smoother your workflows run when structure is preserved end to end.

Start a free trial and experience how quickly you can turn unstructured documents into structured, agent-ready inputs. Questions? Reach out to our team

The post Unstructured document prep for agentic workflows appeared first on DataRobot.

Shape-changing robots: New AI-driven design tool optimizes performance and functionality

Like octopuses squeezing through a tiny sea cave, metatruss robots can adapt to demanding environments by changing their shape. These mighty morphing robots are made of trusses composed of hundreds of beams and joints that rotate and twist, enabling astonishing volumetric transformations.

Princeton’s AI reveals what fusion sensors can’t see

A powerful new AI tool called Diag2Diag is revolutionizing fusion research by filling in missing plasma data with synthetic yet highly detailed information. Developed by Princeton scientists and international collaborators, this system uses sensor input to predict readings other diagnostics can’t capture, especially in the crucial plasma edge region where stability determines performance. By reducing reliance on bulky hardware, it promises to make future fusion reactors more compact, affordable, and reliable.

Rethinking how robots move: Light and AI drive precise motion in soft robotic arm

Photo credit: Jeff Fitlow/Rice University

By Silvia Cernea Clark

Researchers at Rice University have developed a soft robotic arm capable of performing complex tasks such as navigating around an obstacle or hitting a ball, guided and powered remotely by laser beams without any onboard electronics or wiring. The research could inform new ways to control implantable surgical devices or industrial machines that need to handle delicate objects.

In a proof-of-concept study that integrates smart materials, machine learning and an optical control system, a team of Rice researchers led by materials scientist Hanyu Zhu used a light-patterning device to precisely induce motion in a robotic arm made from azobenzene liquid crystal elastomer ⎯ a type of polymer that responds to light.

According to the study published in Advanced Intelligent Systems, the new robotic system incorporates a neural network trained to predict the exact light pattern needed to create specific arm movements. This makes it easier for the robot to execute complex tasks without needing similarly complex input from an operator.

“This was the first demonstration of real-time, reconfigurable, automated control over a light-responsive material for a soft robotic arm,” said Elizabeth Blackert, a Rice doctoral alumna who is the first author on the study.

Elizabeth Blackert and Hanyu Zhu (Photo credit: Jeff Fitlow/Rice University).

Conventional robots typically involve rigid structures with mobile elements like hinges, wheels or grippers to enable a predefined, relatively constrained range of motion. Soft robots have opened up new areas of application in contexts like medicine, where safely interacting with delicate objects is required. So-called continuum robots are a type of soft robot that forgoes mobility constraints, enabling adaptive motion with a vastly expanded degree of freedom.

“A major challenge in using soft materials for robots is they are either tethered or have very simple, predetermined functionality,” said Zhu, assistant professor of materials science and nanoengineering. “Building remotely and arbitrarily programmable soft robots requires a unique blend of expertise involving materials development, optical system design and machine learning capabilities. Our research team was uniquely suited to take on this interdisciplinary work.”

The team created a new variation of an elastomer that shrinks under blue laser light then relaxes and regrows in the dark ⎯ a feature known as fast relaxation time that makes real-time control possible. Unlike other light-sensitive materials that require harmful ultraviolet light or take minutes to reset, this one works with safer, longer wavelengths and responds within seconds.

“When we shine a laser on one side of the material, the shrinking causes the material to bend in that direction,” Blackert said. “Our material bends toward laser light like a flower stem does toward sunlight.”

To control the material, the researchers used a spatial light modulator to split a single laser beam into multiple beamlets, each directed to a different part of the robotic arm. The beamlets can be turned on or off and adjusted in intensity, allowing the arm to bend or contract at any given point, much like the tentacles of an octopus. This technique can in principle create a robot with virtually infinite degrees of freedom ⎯ far beyond the capabilities of traditional robots with fixed joints.

“What is new here is using the light pattern to achieve complex changes in shape,” said Rafael Verduzco, professor and associate chair of chemical and biomolecular engineering and professor of materials science and nanoengineering. “In prior work, the material itself was patterned or programmed to change shape in one way, but here the material can change in multiple ways, depending on the laser beamlet pattern.”

To train such a multiparameter arm, the team ran a small number of combinations of light settings and recorded how the robot arm deformed in each case, using the data to train a convolutional neural network ⎯ a type of artificial intelligence used in image recognition. The model was then able to output the exact light pattern needed to create a desired shape such as flexing or a reach-around motion.

The current prototype is flat and moves in 2D, but future versions could bend in three dimensions with additional sensors and cameras.

Photo credit: Jeff Fitlow/Rice University

“This is a step towards having safer, more capable robotics for various applications ranging from implantable biomedical devices to industrial robots that handle soft goods,” Blackert said.

Page 8 of 8
1 6 7 8