论文标题
部分可观测时空混沌系统的无模型预测
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
论文作者
论文摘要
在视觉问题回答(VQA)中,机器必须回答给定图像的问题。最近,可访问性研究人员探索了是否可以在现实世界中部署VQA,在该环境中,视觉障碍的用户通过捕捉视觉环境并提出问题来了解其环境。但是,用于VQA的大多数现有基准测试数据集都集中在机器“理解”上,并且尚不清楚这些数据集的进度如何对应于这种现实世界中用例中的改进。我们的目标是通过评估机器“理解”数据集(VQA-V2)和可访问性数据集(VIZWIZ)之间的差异来回答这个问题。根据我们的发现,我们讨论了VQA中的机会和挑战,以获取可访问性,并为将来的工作提供指示。
In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine "understanding" and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine "understanding" datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.