The latest edition of Import AI, newsletter 461, raises significant concerns about the trajectory of AI alignment research, stating bluntly that "alignment is not on track." The newsletter points to a growing gap between the rapid pace of model capabilities and the slower progress in ensuring these systems behave safely and as intended.
Alongside the alignment warning, the issue introduces FrontierCode — a new benchmark designed to evaluate code generation models on complex, real-world programming tasks. FrontierCode aims to push beyond existing coding benchmarks by testing models on multi-file, dependency-heavy scenarios that more closely resemble actual software engineering workflows.
A notable development highlighted is the emergence of "synthetic research interns" — AI systems trained to autonomously conduct literature reviews, generate hypotheses, and even run simulated experiments. These agents are being used in academic and industrial labs to accelerate early-stage research, though their reliability and creativity remain under scrutiny.
The newsletter also observes that many of these AI agents are already operating in the wild, raising questions about oversight and accountability. "Where are your agents right now?" the author asks, underscoring the difficulty of tracking deployed autonomous systems.
Despite the promise of tools like FrontierCode and synthetic interns, the alignment gap remains a central worry. Critics argue that without concrete safeguards, the deployment of such agents could outpace our ability to control them, echoing broader debates about AI safety and regulation.