Research

Overview. My research focuses on data management, graph mining, and large language models for big data. My current research topics, aligned with my publications and ongoing projects, include:

🧭 Graph-based LLM Systems. Cost-efficient graph-based RAG systems, graph memory, and agentic retrieval for knowledge-intensive tasks.

🔨 Large (Language) Models for Data. LLM-powered and pretrained methods for data systems, including dataset search, latency prediction, cardinality estimation, and automated DBMS testing.

⚡️ Graph Mining & Algorithms. Scalable algorithms for densest subgraph discovery, community search, clique counting/listing, temporal graph analytics, and graph edit distance estimation.

Representative Research Topics

Graph-based LLM systems
Topic 1

Graph-based LLM Systems

I study graph-based retrieval, memory, and reasoning for large language model systems, with a focus on efficient RAG frameworks, indexing, and tuning.

  • Graph-based RAG prototype systems: VLDB 2025.
  • Automatic RAG optimization systems: SIGMOD 2026.
  • Graph-based RAG methods: ICDE 2026, AAAI 2026.
AI for data systems
Topic 2

Large (Language) Models for Data

I build LLM-powered and pretrained methods for data systems, covering dataset search, latency prediction, cardinality estimation, and DBMS testing.

  • Pretrained models for database optimization: VLDB 2025, VLDB Journal 2026.
  • LLM-based test-case generation for DBMS: ICSE 2026.
Graph mining and algorithms
Topic 3

Graph Mining and Graph Algorithms

I design scalable graph mining algorithms for densest subgraph discovery, community search, clique counting/listing, and graph similarity tasks.

  • Densest subgraph discovery: SIGMOD 2024, 2 * VLDB 2025, 2 * SIGMOD 2026.
  • Community search: VLDB 2023, VLDB 2026, SIGMOD 2026.
  • Clique counting/listing: VLDB 2024, VLDB 2025, VLDB 2026.

Selected Open-Source Projects

🧭 Graph-based LLM Systems