- Jasaxion一只大雄

主页

📫 Email

GitHub

Twitter/X

🙋‍ Zhan Shaoxiong (詹少雄)
📧 zhansx24@mails.tsinghua.edu.cn / jasaxion@gmail.com
📍 Shenzhen, China · Tsinghua SIGS

I am an M.S. student at the Knowledge Engineering Lab at Tsinghua Shenzhen International Graduate School, supervised by Prof. Haitao Zheng. My research interests lie in natural language processing, information retrieval, and large and vision-language models. I received my B.Eng. in Computer Science from Huazhong Agricultural University.

🔥 News

• 2026.01 — Submitted a new paper 3ViewSense

• 2025.12 — Joined JD Explore Academy as LLM research intern

• 2025.11 — MathSmith accepted to AAAI'26 🎉

• 2025.11 — RoutingGen accepted to AAAI'26 🎉

• 2025.10 — HSPIM accepted to Information Sciences 🎉

• 2025.07 — LexSemBridge accepted to ECAI'25 🎉

• 2025.06 — IRSC Benchmark accepted to NLPCC'25 🎉

• 2025.02 — QAEA-DR accepted to IEEE TKDE 🎉

📝 Selected Papers

AAAI 2026 First Author Data Synthesis / RL

MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
Shaoxiong Zhan, Yanlin Lai, Ziyu Lu, Dahua Lin, Ziqing Yang, Fei Tan
RL-based framework to forge olympiad-level math problems from concept-explanation pairs. 9.8%–18.1% relative gains on AIME/Olympiad benchmarks.

arXiv:2508.05592

ECAI 2025 First Author Knowledge Repr. / Embedding

LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation
Shaoxiong Zhan, Hai Lin, Hongming Tan, Xiaodong Cai, Hai-Tao Zheng
Lexical-semantic bridging to enhance fine-grained matching in dense retrieval without modifying backbone encoders.

arXiv:2508.17858

(Under Review) First Author Spatial Reasoning / VLM

3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in VLMs
Shaoxiong Zhan, Yanlin Lai, Zheng Liu, Zijian Lin, Hai Lin, Xiaodong Cai, Shen Li, Hai-Tao Zheng
Identified VLM bottleneck in spatial tasks as lacking view-consistent intermediate representations; proposed "simulate-then-reason" mechanism with orthographic views.

arXiv:2603.07751

IEEE TKDE 2025 Co-first Author RAG / Dense Retrieval

QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval
Hongming Tan*, Shaoxiong Zhan*, Hai Lin, Hai-Tao Zheng, Wai Kin Chan
First unified text augmentation framework for dense retrieval via LLM-generated QA pairs and event structures.

DOI: 10.1109/TKDE.2025.3543203

Information Sciences 2025 Co-first Author AI Agent / Domain

A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models
Hongming Tan*, Shaoxiong Zhan*, Fengwei Jia, Hai-Tao Zheng, Wai Kin Chan
Hierarchical paper-section-QA decomposition framework for measuring research innovation with confidence-weighted aggregation.

arXiv:2504.14620

NLPCC 2025 Co-first Author RAG / Benchmark

IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in RAG Scenarios
Hai Lin*, Shaoxiong Zhan*, Junyou Su, Hai-Tao Zheng, Hui Wang
Zero-shot retrieval benchmark with 5 tasks and cross-lingual evaluation; introduced SSCI and RCCI metrics.

arXiv:2409.15763

AAAI 2026 Co-author Code Generation

IntentionChain-of-Thought Prompting with Dynamic Routing for Code Generation
Shen Li, Li Huang, Shaoxiong Zhan, Weifeng Sun, Tao Yin, Zhongxin Liu, Meng Yan
Difficulty-aware dynamic routing to avoid overthinking in code generation; 46% token reduction with SOTA performance.

arXiv:2512.14048

🎖 Honors and Awards

• National Scholarship in China (×3): 2021, 2022, 2023

• Gratitude Scholarship for Chinese Scientists, 2023

• MCM Finalist (Top 1%), 2023

• China University Big Data Challenge National Second Prize, 2022

• MathorCup National Second Prize, 2022

💻 Internships

JD Explore Academy – LLM Algorithm Research, Foundation Model / Knowledge & Cognitive Data, Beijing

Dec 2025 – Present

Building pretraining data pipelines for code domain: collected high-quality code dataset for pretrain & mid-train stages.
Restructured GitHub Commit data processing pipeline; studying how data distribution impacts code generation capability.
Proposed a multimodal issue localization benchmark, extending software engineering fault localization to scenarios involving UI screenshots and complex error logs, systematically studying VLM capabilities in code understanding.

SenseTime Research – LLM Algorithm Research, Model Foundation Group, Shenzhen

Mar – Oct 2025

Focused on math reasoning enhancement for LLM foundation models via data-centric approaches.
Designed RL-based hard-problem synthesis strategy, producing the MathSmith dataset that pushed benchmark performance on public math benchmarks. Published as first-author paper at AAAI'26.
Contributed to VLM foundation model iteration for the "photo-solve" product line. Identified VLM spatial reasoning gaps, leading to first-author paper 3ViewSense.

xDAN Tech (Startup) – LLM RAG Algorithm Engineer, Shenzhen

Dec 2023 – Mar 2024

Delivered enterprise-grade RAG backend system with LangChain, embedding fine-tuning, RAG-fusion, and reranking.
Implemented OCR + multimodal document processing; contributed to system design docs and client demos that secured partnerships & funding.

🛠 Skills

Python C/C++ PyTorch Deep Learning LLM Fine-tuning Reinforcement Learning Git Linux Vibe Coding Open Claw NAS Electronic Enthusiast

English: CET-6 · Proficient in academic reading & writing
Tools: Claude Code, Cursor, Codex · Experienced with NAS, soft routers, and self-hosted infra

🎨 Miscellaneous

👋I'm a hands-on tech enthusiast who enjoys tinkering with gadgets and experimenting with new ideas💡—even if things break along the way. Outside of work, you'll find me playing badminton🏸, swimming🏊‍♀️, or dancing💃 (hiphop/kpop). Always happy to connect—feel free to add me on WeChat: Jasaxion_Taurus0405 🤝

👥 Student Leadership

• Head of the Youth Media Center in HZAU, 2022–2023

• Deputy Party Branch Secretary in HZAU, 2023–2024

• Class President in HZAU, 2023–2024