Jongha Kim
M.S & Ph.D Integrated Student in MLV Lab, advised by Prof. Hyunwoo J. Kim.
Department of Computer Science and Engineering at Korea University, Seoul, Republic of Korea.
My research focuses on multimodal foundation models that understand and reason over real-world human-generated data (e.g., videos, documents, and the web). I am interested in advancing training, inference, and systems for multimodal reasoning, knowledge grounding, and scalable real-world deployment. For more information, please see my CV.
If you are interested in collaboration, opportunities, or just a quick chat, please feel free to reach out to me via email.
selected publications [full list]
(*) denotes equal contribution- WACVRelevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question AnsweringIn IEEE/CVF Conference on Winter Conference on Applications of Computer Vision (WACV 2026)
- AAAITabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token FocusingIn AAAI Conference on Artificial Intelligence (AAAI 2026)
- InfoScienceImproved Query Specialization for Transformer-based Visual Relationship DetectionIn Information Sciences (2026)