The robots build now, too

Sep 24, 2025

At Generalist, we're working towards a future where robots can "just do anything," and we're excited to share a step in this direction.

One of our newest internal benchmark tasks is one-shot assembly. Our team constructs a small structure, and the robot copies it. We're evaluating our models on how well they can build Legos – end-to-end, from pixels to 100Hz actions. No task-specific engineering, no custom instructions: it sees what you build and replicates it.

Why this matters:

Visual understanding: the model figures out “what to build” by looking at what is in front of it. Pixels in, Lego copies out.
Next-level dexterity: Lego assembly demands sub-millimeter precision, careful re-grasps, nudges, and forceful interactions e.g. presses timed to the instant studs align (see one 3rd-party perspective).
A recent perspective slots this task into the highest level of sophistication of general-purpose robots: "Level 4 represents the final evolution where robots can perform force-dependent, delicate tasks with pinpoint accuracy. These tasks require the Dexterity to understand and react with nuance to the physical forces of the environment."
Sequential reasoning: for each brick, the model must choose the right one, orient it, stage it, and attach it correctly.

We were inspired by your suggestions on our previous Lego throwing demos, and as far as we know this is the world's first robot to assemble Legos with end-to-end visuomotor control. If you have tasks that you want to see robots do, we'd love to hear.

Note: there are expected bounds to the generalization of what's shown in the video: we've only tested model capabilities for 4-colored, 3-brick structures of two-by-four Lego bricks. Calculating how many possibilities this presents is not easy. (If this is easy for you, please reach out for a job.) If we agree that uncolored 3-brick combinations of two-by-four Lego bricks have 1,560 combinations, then having 4 color options for each of the 3 bricks gives 4*4*4*1,560 = 99,840 possible combinations.

Research Preview

Jun 17, 2025

Today we're excited to share a glimpse of what we're building at Generalist.

As a first step towards our mission of deploying general purpose robots, we are pushing the frontiers of what end-to-end AI models can achieve in the real world. We've been training models and evaluating their capabilities for dexterous sensorimotor policies across different embodiments, environments, and physical interactions. We're sharing capability demonstrations on tasks stressing different aspects of manipulation: fine motor control, spatial and temporal precision, generalization across robots and settings, and robustness to external disturbances.

In each of these videos, the robot is fully autonomous and controlled in real time by an end-to-end deep neural network mapping pixels and other sensor data to 100Hz actions. The entire hardware and software stack jointly enables reactive, smooth, and precise dexterous control from neural networks.

We're using these tasks to test model capabilities along various axes of autonomous dexterity. The tasks require nuanced behaviors like pushing, pulling, twisting, and multi-step re-grasping. Bi-manual coordination allows actions like stabilizing and breaking apart Lego structures, tensioning flexible materials, and dynamically creating funnels for small part handling. High-frequency control is important for real-time behaviors like wiggling, throwing, or adjusting in-flight grasps. Precision is a requisite for being able to close a box with millimeter-level tolerances.

Further, the cross-embodied model transfers across different arms (e.g., 7-DoF Flexiv Rizon 4, and 6-DoF UR5), and generalizes well to entirely new environments. For example, the fasteners task used no data from UR5 arms, and zero data for that task inside the same evaluated environment.

We're encouraged by the early results and the potential this system demonstrates. More to come.

Task: Pick & sort fasteners

Evaluates the ability of an end-to-end model to quickly pick and sort small, thin objects from clutter, and place them oriented into corresponding compartments. Hardware torque is the limiting factor for cycle time.

Task: Fold a box, pack a bike chain lock & close

Evaluates capabilities in handling articulated and deformable objects over long-horizon sequences and with precision; adapts to disturbances and modulates force precisely. After assembling the box, the long bike lock chain needs to be deformably coiled into the box in order to fit. Precision is particularly tested in closing the box, which requires aligning flaps on both sides of the box simultaneously, each with millimeter-level tolerance. Note also that the arm is strong enough to crush the box at any moment.

Task: Get the screws back into the glass jar

Evaluates tool use, precision, and bi-manual coordination across a number of maneuvers. The robot is tasked with efficiently getting all the shiny M4 screws back into a clear container. As needed, it can scrape them off a magnetic bit holder, bend the paper plate to form a makeshift funnel to pour them, or pick them up one by one. Scraping requires precise interhand coordination (e.g. when does a scrape become a grasp?), as does forming and transporting the funnel without spilling.

Task: Break apart, sort, & throw Legos

Evaluates capabilities in precise regrasping, forceful interhand coordination, generalization, and high-velocity maneuvers. The robot is tasked with deconstructing assembled legos and sorting the bricks into their color-corresponding bins. This can require re-grasping the bricks to get a better grip, before wiggling and twisting them apart. The robot generalizes over a distribution of brick formations and works with any ordering and positioning of the bins, via visual conditioning. This task can't be done slowly, due to the physics of throwing bricks.