arXiv digest

2026-06-03 · 4 papers · curated by paper-scout

First digest from the paper-scout pipeline. Heavy 3D day on cs.CV — trajectory-conditioned dynamic shape generation (T2Mo) is the standout: explicit spatial guidance over text-only conditioning is a pattern I expect to see everywhere in 3D generation this year.

★ Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text

2606.05162

Jaeyeong Kim, Ines Kim, Jahyeok Koo, Seungryong Kim · cs.CV

We introduce T2Mo, a feed-forward framework for controllable dynamic 3D shape generation conditioned on 3D trajectories and text. Due to the inherent ambiguity of language, generating precisely intended motions using text alone remains challenging. To address this, we adopt 3D trajectories as controllable spatial guidance, specifying the exact paths along which selected points should move.

take · Text-to-3D-motion has always been mushy because language underspecifies geometry — anchoring generation to explicit 3D point trajectories is the right fix. Feed-forward (no per-scene optimization) makes this actually usable in a pipeline.

abs · pdf · html · ar5iv

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

2606.05142

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl · cs.CV cs.AI

Recent developments in multi-view image editing with generative models have brought us a step closer toward general 3D content generation and customization. Most existing works focus on rigid or appearance-only edits by utilizing the geometry of the unedited scene. This naturally limits these methods to edits that preserve the underlying scene structure.

take · Multi-view consistency for *nonrigid* edits is the hard version of the problem — most methods cheat by keeping the original geometry. Worth a read if you care about editable digital twins.

abs · pdf · html · ar5iv

Reinforcement Learning from Rich Feedback with Distributional DAgger

2606.05152

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad · cs.LG cs.AI cs.CL

Reasoning models have advanced rapidly, but the dominant reinforcement learning from verifiable rewards (RLVR) recipe remains surprisingly narrow: sample many responses and reward each with a single bit indicating whether the final answer is correct. Yet many settings provide rich feedback, including execution traces, tool outputs, expert corrections, and model self-evaluations.

take · RLVR throws away everything except one bit per rollout — execution traces and tool outputs are sitting right there. Using them is obvious in hindsight, which is usually the mark of a good idea.

abs · pdf · html · ar5iv

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

2606.05145

Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar et al. · cs.LG cs.AI cs.CL

When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend more compute on additional attempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where more rollouts help, while others are structural and resist resampling regardless of budget.

take · Separating "unlucky sampling" failures from structural ones before you burn test-time compute is a genuinely useful triage — resampling a structurally-broken trace is just paying to fail again.

abs · pdf · html · ar5iv