Jongha Kim

M.S & Ph.D Integrated Student in MLV Lab, advised by Prof. Hyunwoo J. Kim.

Department of Computer Science and Engineering at Korea University, Seoul, Republic of Korea.

My research focuses on multimodal foundation models that understand and reason over real-world human-generated data (e.g., videos, documents, and the web). I am interested in advancing training, inference, and systems for multimodal reasoning, knowledge grounding, and scalable real-world deployment. For more information, please see my CV.

If you are interested in collaboration, opportunities, or just a quick chat, please feel free to reach out to me via email.

selected publications [full list]

(*) denotes equal contribution

WACV

Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering

Jongha Kim, Byungoh Ko, Jeehye Na, Jinsung Yoon, and Hyunwoo J Kim

In IEEE/CVF Conference on Winter Conference on Applications of Computer Vision (WACV 2026)

PDF
AAAI

TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing

Jongha Kim, Minseong Bae, Sanghyeok Lee, Jinsung Yoon, and Hyunwoo J Kim

In AAAI Conference on Artificial Intelligence (AAAI 2026)

PDF
InfoScience

Improved Query Specialization for Transformer-based Visual Relationship Detection

Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, and Hyunwoo J Kim

In Information Sciences (2026)

PDF
AAAI

VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning

Ji Soo Lee*, Jongha Kim*, Jeehye Na, Jinyoung Park, and Hyunwoo J Kim

In AAAI Conference on Artificial Intelligence (AAAI 2025)

PDF Code
CVPR

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Jongha Kim*, Jihwan Park*, Jinyoung Park*, Jinyoung Kim, Sehyung Kim, and Hyunwoo J Kim

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

PDF Code