Research
Overview. My research focuses on data management, graph mining, and large language models for big data. My current research topics, aligned with my publications and ongoing projects, include:
🧭 Graph-based LLM Systems. Cost-efficient graph-based RAG systems, graph memory, and agentic retrieval for knowledge-intensive tasks.
🔨 Large (Language) Models for Data. LLM-powered and pretrained methods for data systems, including dataset search, latency prediction, cardinality estimation, and automated DBMS testing.
⚡️ Graph Mining & Algorithms. Scalable algorithms for densest subgraph discovery, community search, clique counting/listing, temporal graph analytics, and graph edit distance estimation.
Representative Research Topics
Graph-based LLM Systems
I study graph-based retrieval, memory, and reasoning for large language model systems, with a focus on efficient RAG frameworks, indexing, and tuning.
- Graph-based RAG prototype systems: VLDB 2025.
- Automatic RAG optimization systems: SIGMOD 2026.
- Graph-based RAG methods: ICDE 2026, AAAI 2026.
Large (Language) Models for Data
I build LLM-powered and pretrained methods for data systems, covering dataset search, latency prediction, cardinality estimation, and DBMS testing.
- Pretrained models for database optimization: VLDB 2025, VLDB Journal 2026.
- LLM-based test-case generation for DBMS: ICSE 2026.
Graph Mining and Graph Algorithms
I design scalable graph mining algorithms for densest subgraph discovery, community search, clique counting/listing, and graph similarity tasks.
- Densest subgraph discovery: SIGMOD 2024, 2 * VLDB 2025, 2 * SIGMOD 2026.
- Community search: VLDB 2023, VLDB 2026, SIGMOD 2026.
- Clique counting/listing: VLDB 2024, VLDB 2025, VLDB 2026.
Selected Open-Source Projects
🧭 Graph-based LLM Systems
- DIGIMON / GraphRAG: a graph-based RAG system for structured retrieval and reasoning.
- EraRAG: a unified benchmark and analysis framework for graph-based RAG.