I am M.S/Ph.D. student at the Machine Intelligence Lab at Seoul National University, advised by Prof. Kyomin Jung. My research focuses on Natural Language Processing.
In this age where AI is rapidly becoming akin to a trusty sidekick, I believe it’s paramount that we can rely on it without a doubt. It’s about making sure that as we develop these advanced systems, they not only understand us but do so in a way that’s unbiased and reliable. My research is fueled by the challenge of reducing biases and ensuring that the knowledge shared by language models isn’t just vast, but also trustworthy.
Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as instance comprehension, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language prior, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors.
MVMR: A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple Distractors