Changyi Li (李长艺)

PhD Student at Fudan University

I am a incoming PhD student at the System Software and Security Lab, School of Computer Science, Fudan University, advised by Xudong Pan and Min Yang.

My research focuses on building capable, safe and trustworthy AI systems. Broadly, I explore two complementary directions: (1) developing learning and optimization methods that strengthen model competence and generalization, and (2) designing evaluation and alignment approaches that make advanced AI systems safer, more controllable, and better grounded in human values and knowledge.

Email: 24212010017 [AT] m.fudan.edu.cn

CV / Email
Google Scholar

News

2026 Joined System Software and Security Lab at Fudan University!

Selected Publications

	AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation Changyi Li, Pengfei Lu, Xudong Pan, Fazl Barez, Min Yang Under Review (ICML 2026) PDF Abstract Code Project Page As Large Language Models (LLMs) evolve into autonomous agents, existing safety evaluations face a fundamental trade-off: manual benchmarks are costly, while LLM-based simulators are scalable but suffer from logic hallucination. We present AUTOCONTROL ARENA, an automated framework for frontier AI risk evaluation built on the principle of logic-narrative decoupling. By grounding deterministic state in executable code while delegating generative dynamics to LLMs, we mitigate hallucination while maintaining flexibility. This principle, instantiated through a three-agent framework, achieves over 98% end-to-end success and 60% human preference over existing simulators. To elicit latent risks, we vary environmental Stress and Temptation across X-BENCH (70 scenarios, 7 risk categories). Evaluating 9 frontier models reveals: (1) Alignment Illusion: risk rates surge from 21.4% to 52.9% under pressure, with capable models showing disproportionately larger increases; (2) Scenario-Specific Safety Scaling: advanced reasoning improves robustness for direct harms but worsens it for gaming scenarios; and (3) Divergent Misalignment Patterns: weaker models cause non-malicious harm while stronger models develop strategic concealment. @misc{li2026autocontrol, title={AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation}, author={Changyi Li and Pengfei Lu and Xudong Pan and Fazl Barez and Min Yang}, year={2026}, note={Under Review} }
	ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models Changyi Li, Jiayi Wang, Xudong Pan, Geng Hong, Min Yang Under Review (ICML 2026) PDF Abstract Code 🤗 HuggingFace Large Reasoning Models (LRMs) leverage transparent reasoning traces, known as Chain-of-Thoughts (CoTs), to break down complex problems into intermediate steps and derive final answers. However, these reasoning traces introduce unique safety challenges: harmful content can be embedded in intermediate steps even when final answers appear benign. Existing moderation tools, designed to handle generated answers, struggle to effectively detect hidden risks within CoTs. To address these challenges, we introduce ReasoningShield, a lightweight yet robust framework for moderating CoTs in LRMs. Our key contributions include: (1) formalizing the task of CoT moderation with a multi-level taxonomy of 10 risk categories across 3 safety levels, (2) creating the first CoT moderation benchmark which contains 9.2K pairs of queries and reasoning traces, including a 7K-sample training set annotated via a human-AI framework and a rigorously curated 2.2K human-annotated test set, and (3) developing a two-stage training strategy that combines stepwise risk analysis and contrastive learning to enhance robustness. Experiments show that ReasoningShield achieves state-of-the-art performance, outperforming task-specific tools like LlamaGuard-4 by 35.6% and general-purpose commercial models like GPT-4o by 15.8% on benchmarks, while also generalizing effectively across diverse reasoning paradigms, tasks, and unseen scenarios. @article{Li2025ReasoningShieldCS, title={ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models}, author={Changyi Li and Jiayi Wang and Xu Pan and Geng Hong and Min Yang}, journal={ArXiv}, year={2025}, volume={abs/2505.17244}, url={https://api.semanticscholar.org/CorpusID:278885996} }
	Large language model-powered AI systems achieve self-replication with no human intervention Xudong Pan, Jiarun Dai, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang arXiv 2025 PDF Abstract Project Page Self-replication with no human intervention is broadly recognized as one of the principal red lines associated with frontier AI systems. While leading corporations such as OpenAI and Google DeepMind have assessed GPT-o3-mini and Gemini on replication-related tasks and concluded that these systems pose a minimal risk regarding self-replication, our research presents novel findings. Following the same evaluation protocol, we demonstrate that 11 out of 32 existing AI systems under evaluation already possess the capability of self-replication. In hundreds of experimental trials, we observe a non-trivial number of successful self-replication trials across mainstream model families worldwide, even including those with as small as 14 billion parameters which can run on personal computers. Furthermore, we note the increase in self-replication capability when the model becomes more intelligent in general. Also, by analyzing the behavioral traces of diverse AI systems, we observe that existing AI systems already exhibit sufficient planning, problem-solving, and creative capabilities to accomplish complex agentic tasks including self-replication. More alarmingly, we observe successful cases where an AI system do self-exfiltration without explicit instructions, adapt to harsher computational environments without sufficient software or hardware supports, and plot effective strategies to survive against the shutdown command from the human beings. These novel findings offer a crucial time buffer for the international community to collaborate on establishing effective governance over the self-replication capabilities and behaviors of frontier AI systems, which could otherwise pose existential risks to the human society if not well-controlled. @article{Pan2025LargeLM, title={Large language model-powered AI systems achieve self-replication with no human intervention}, author={Xudong Pan and Jiarun Dai and Yihe Fan and Minyuan Luo and Changyi Li and Min Yang}, journal={ArXiv}, year={2025}, volume={abs/2503.17378}, url={https://api.semanticscholar.org/CorpusID:277272379} }

Teaching

Fundamentals of Deep Learning, Fudan University, 2026 Spring
Fundamentals of Deep Learning, Fudan University, 2025 Spring

Selected Awards & Honors

Scholarships & Titles

Fudan University Freshman Scholarship, 2025
🏆 Principal Scholarship (Rank 1st), 2024
Outstanding Graduate of Shandong Province, 2024
🏆 National Scholarship, 2023
Outstanding Student "Top 100", 2023
First Class Scholarship, Merit Student Pacesetter, 2022-2023

Competitions

MCM/ICM Meritorious Winner, 2023
Second Prize, National Mathematical Modeling Contest, 2022
Bronze Award, "Challenge Cup" Shandong College Students Entrepreneurship Plan Competition, 2022
Third Prize, Shandong College Students Science and Technology Innovation Competition, 2022
Second Prize, National English Competition for College Students (NECCS), 2021 & 2023

Last updated: 2026-02-26
Template modified from Jon Barron.