Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/215722
Title: FoodMem: A Fast and precise food video segmentation
Author: Galán Pacheco, Adrián
Director/Tutor: Radeva, Petia
Rodrigues Sepúlveda Marques, Ricardo Jorge
Ahmad Almughrabi, Sami Mohammad
Keywords: Reconeixement de formes (Informàtica)
Visualització tridimensional
Aliments
Visió per ordinador
Programari
Treballs de fi de grau
Pattern recognition systems
Three-dimensional display systems
Food
Computer vision
Computer software
Bachelor's theses
Issue Date: 10-Jun-2024
Abstract: Food segmentation is crucial in various research fields, such as health, agriculture, and food biotechnology. Segmenting and tracking different types of food in images or videos is undeniably a significant achievement, and it is currently considered a newly emerging topic in today’s society. Our study aims to find and develop a production-grade framework for segmenting and tracking various types of food in a given set of images or videos at high-quality performance and near-real-time speed with minimum hardware resources. This unlocks many challenges in real-world applications, such as food volume estimation, calorie estimation, 3D reconstruction, augmented and virtual reality, or digital twins. We introduce FoodMem, a novel framework for segmenting food in 360° scenes. Our framework can effectively segment food portions in a given video and generate accurate masks. Most semantic segmentation models, especially for food-related tasks, have limitations that affect their performance, such as handling different camera locations that did not exist in the training set. Plus, the inference speed for individual images does not fit real-world applications, especially those that focus on video processing. In contrast, memory-based models are becoming popular in object-tracking applications because of their performance and speed. Still, they are limited since they rely on user input, such as the user drawing the input mask manually, which indicates a lack of automation. To overcome these limitations, we propose FoodMem, a novel food video segmentation framework that combines the (1) SETR model to generate segment one- or few- masks of the food portions in a given scene and (2) XMem++, a memory-based tracking model, to track the food masks in complex scenes. Our framework performs better than the state-of-the-art food segmentation frameworks in segmenting food in different camera-capturing locations, illumination, reflection, scene complexity, and food diversity, achieving significant segmentation noise reduction, artifact elimination, and completing the missing parts. We also introduce an annotated food dataset, which covers new challenging use cases not found in previous benchmarks. We conduct extensive experiments on Nutrition5k and Vegetables & Fruits datasets, showing that FoodMem improves the state-of-the-art by 2.5% mean average precision in food video segmentation. Moreover, FoodMem is 58 times faster than the state-of-the-art on average for both datasets. The source code is accessible at: https://amughrabi.github.io/foodmem/
Note: Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Petia Radeva, Ricardo Jorge Rodrigues Sepúlveda Marques i Sami Mohammad Ahmad Almughrabi
URI: https://hdl.handle.net/2445/215722
Appears in Collections:Treballs Finals de Grau (TFG) - Enginyeria Informàtica
Programari - Treballs de l'alumnat

Files in This Item:
File Description SizeFormat 
tfg_galan_pacheco_adrian.pdfMemòria7.43 MBAdobe PDFView/Open
codi.zipCodi font49.13 MBzipView/Open


This item is licensed under a Creative Commons License Creative Commons