FoodMem: A Fast and precise food video segmentation

Galán Pacheco, Adrián

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/215722

Title:	FoodMem: A Fast and precise food video segmentation
Author:	Galán Pacheco, Adrián
Director/Tutor:	Radeva, Petia Rodrigues Sepúlveda Marques, Ricardo Jorge Ahmad Almughrabi, Sami Mohammad
Keywords:	Reconeixement de formes (Informàtica) Visualització tridimensional Aliments Visió per ordinador Programari Treballs de fi de grau Pattern recognition systems Three-dimensional display systems Food Computer vision Computer software Bachelor's theses
Issue Date:	10-Jun-2024
Abstract:	Food segmentation is crucial in various research fields, such as health, agriculture, and food biotechnology. Segmenting and tracking different types of food in images or videos is undeniably a significant achievement, and it is currently considered a newly emerging topic in today’s society. Our study aims to find and develop a production-grade framework for segmenting and tracking various types of food in a given set of images or videos at high-quality performance and near-real-time speed with minimum hardware resources. This unlocks many challenges in real-world applications, such as food volume estimation, calorie estimation, 3D reconstruction, augmented and virtual reality, or digital twins. We introduce FoodMem, a novel framework for segmenting food in 360° scenes. Our framework can effectively segment food portions in a given video and generate accurate masks. Most semantic segmentation models, especially for food-related tasks, have limitations that affect their performance, such as handling different camera locations that did not exist in the training set. Plus, the inference speed for individual images does not fit real-world applications, especially those that focus on video processing. In contrast, memory-based models are becoming popular in object-tracking applications because of their performance and speed. Still, they are limited since they rely on user input, such as the user drawing the input mask manually, which indicates a lack of automation. To overcome these limitations, we propose FoodMem, a novel food video segmentation framework that combines the (1) SETR model to generate segment one- or few- masks of the food portions in a given scene and (2) XMem++, a memory-based tracking model, to track the food masks in complex scenes. Our framework performs better than the state-of-the-art food segmentation frameworks in segmenting food in different camera-capturing locations, illumination, reflection, scene complexity, and food diversity, achieving significant segmentation noise reduction, artifact elimination, and completing the missing parts. We also introduce an annotated food dataset, which covers new challenging use cases not found in previous benchmarks. We conduct extensive experiments on Nutrition5k and Vegetables & Fruits datasets, showing that FoodMem improves the state-of-the-art by 2.5% mean average precision in food video segmentation. Moreover, FoodMem is 58 times faster than the state-of-the-art on average for both datasets. The source code is accessible at: https://amughrabi.github.io/foodmem/
Note:	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Petia Radeva, Ricardo Jorge Rodrigues Sepúlveda Marques i Sami Mohammad Ahmad Almughrabi
URI:	https://hdl.handle.net/2445/215722
Appears in Collections:	Treballs Finals de Grau (TFG) - Enginyeria Informàtica Programari - Treballs de l'alumnat

Files in This Item:

File	Description	Size	Format
tfg_galan_pacheco_adrian.pdf	Memòria	7.43 MB	Adobe PDF	View/Open
codi.zip	Codi font	49.13 MB	zip	View/Open

Show full item record

This item is licensed under a Creative Commons License