论文标题

使用基于物理学的渲染使用对比度学习对任何地方的任何材料的一声识别

One-shot recognition of any material anywhere using contrastive learning with physics-based rendering

论文作者

Drehwald, Manuel S., Eppel, Sagi, Li, Jolina, Hao, Han, Aspuru-Guzik, Alan

论文摘要

材料及其状态的视觉识别对于理解世界上大多数方面,从确定食物是烹饪,金属生锈或化学反应的情况下至关重要。但是,当前的图像识别方法仅限于特定的类和属性,无法处理世界上大量的材料状态。为了解决这个问题,我们介绍了Matsim:基于计算机视觉的识别材料和纹理之间的相似性和过渡的第一个数据集和基准,重点是使用一个或几个示例在任何情况下识别任何材料。数据集包含合成图像和自然图像。合成图像是使用计算机图形艺术家生成的巨大纹理,对象和环境的巨大集合来渲染的。我们使用材料之间的混合物和逐渐过渡,以使系统可以学习各种状态之间平稳过渡的案例(例如逐渐煮熟的食物)。我们还将图像带有透明容器内部的材料,以支持饮料和化学实验室用例。我们使用此数据集训练一个暹罗网,该网络在不同对象,混合物和环境中识别相同的材料。该网络生成的描述符可用于使用单个图像来识别材料状态及其子类。我们还提供了前几种材料识别基准,其中包括来自各个领域的图像,包括食品和饮料的状态,地面类型以及许多其他用例。我们表明,在MATSIM合成数据集上训练的网优于最先进的模型,例如基准上的剪辑,并且在其他无监督的材料分类任务上也取得了良好的结果。

Visual recognition of materials and their states is essential for understanding most aspects of the world, from determining whether food is cooked, metal is rusted, or a chemical reaction has occurred. However, current image recognition methods are limited to specific classes and properties and can't handle the vast number of material states in the world. To address this, we present MatSim: the first dataset and benchmark for computer vision-based recognition of similarities and transitions between materials and textures, focusing on identifying any material under any conditions using one or a few examples. The dataset contains synthetic and natural images. The synthetic images were rendered using giant collections of textures, objects, and environments generated by computer graphics artists. We use mixtures and gradual transitions between materials to allow the system to learn cases with smooth transitions between states (like gradually cooked food). We also render images with materials inside transparent containers to support beverage and chemistry lab use cases. We use this dataset to train a siamese net that identifies the same material in different objects, mixtures, and environments. The descriptor generated by this net can be used to identify the states of materials and their subclasses using a single image. We also present the first few-shot material recognition benchmark with images from a wide range of fields, including the state of foods and drinks, types of grounds, and many other use cases. We show that a net trained on the MatSim synthetic dataset outperforms state-of-the-art models like Clip on the benchmark and also achieves good results on other unsupervised material classification tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源