FoodMem: A Fast and precise food video segmentation

Galán Pacheco, Adrián

FoodMem: A Fast and precise food video segmentation

dc.contributor.advisor	Radeva, Petia
dc.contributor.advisor	Rodrigues Sepúlveda Marques, Ricardo Jorge
dc.contributor.advisor	Ahmad Almughrabi, Sami Mohammad
dc.contributor.author	Galán Pacheco, Adrián
dc.date.accessioned	2024-10-14T07:15:04Z
dc.date.available	2024-10-14T07:15:04Z
dc.date.issued	2024-06-10
dc.description	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Petia Radeva, Ricardo Jorge Rodrigues Sepúlveda Marques i Sami Mohammad Ahmad Almughrabi	ca
dc.description.abstract	Food segmentation is crucial in various research fields, such as health, agriculture, and food biotechnology. Segmenting and tracking different types of food in images or videos is undeniably a significant achievement, and it is currently considered a newly emerging topic in today’s society. Our study aims to find and develop a production-grade framework for segmenting and tracking various types of food in a given set of images or videos at high-quality performance and near-real-time speed with minimum hardware resources. This unlocks many challenges in real-world applications, such as food volume estimation, calorie estimation, 3D reconstruction, augmented and virtual reality, or digital twins. We introduce FoodMem, a novel framework for segmenting food in 360° scenes. Our framework can effectively segment food portions in a given video and generate accurate masks. Most semantic segmentation models, especially for food-related tasks, have limitations that affect their performance, such as handling different camera locations that did not exist in the training set. Plus, the inference speed for individual images does not fit real-world applications, especially those that focus on video processing. In contrast, memory-based models are becoming popular in object-tracking applications because of their performance and speed. Still, they are limited since they rely on user input, such as the user drawing the input mask manually, which indicates a lack of automation. To overcome these limitations, we propose FoodMem, a novel food video segmentation framework that combines the (1) SETR model to generate segment one- or few- masks of the food portions in a given scene and (2) XMem++, a memory-based tracking model, to track the food masks in complex scenes. Our framework performs better than the state-of-the-art food segmentation frameworks in segmenting food in different camera-capturing locations, illumination, reflection, scene complexity, and food diversity, achieving significant segmentation noise reduction, artifact elimination, and completing the missing parts. We also introduce an annotated food dataset, which covers new challenging use cases not found in previous benchmarks. We conduct extensive experiments on Nutrition5k and Vegetables & Fruits datasets, showing that FoodMem improves the state-of-the-art by 2.5% mean average precision in food video segmentation. Moreover, FoodMem is 58 times faster than the state-of-the-art on average for both datasets. The source code is accessible at: https://amughrabi.github.io/foodmem/	ca
dc.format.extent	83 p.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/2445/215722
dc.language.iso	eng	ca
dc.rights	memòria: cc-nc-nd (c) Adrián Galán Pacheco, 2024
dc.rights	codi: MIT (c) Adrián Galán Pacheco, 2024
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.uri	https://opensource.org/license/mit	*
dc.source	Treballs Finals de Grau (TFG) - Enginyeria Informàtica
dc.subject.classification	Reconeixement de formes (Informàtica)	ca
dc.subject.classification	Visualització tridimensional	ca
dc.subject.classification	Aliments	ca
dc.subject.classification	Visió per ordinador	ca
dc.subject.classification	Programari	ca
dc.subject.classification	Treballs de fi de grau	ca
dc.subject.other	Pattern recognition systems	en
dc.subject.other	Three-dimensional display systems	en
dc.subject.other	Food	en
dc.subject.other	Computer vision	en
dc.subject.other	Computer software	en
dc.subject.other	Bachelor's theses	en
dc.title	FoodMem: A Fast and precise food video segmentation	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca

Fitxers

Paquet original

Mostrant 1 - 2 de 2

Nom:: tfg_galan_pacheco_adrian.pdf
Mida:: 7.26 MB
Format:: Adobe Portable Document Format
Descripció:: Memòria

Descarregar

Nom:: codi.zip
Mida:: 47.98 MB
Format:: ZIP file
Descripció:: Codi font

Descarregar

Col·leccions

Treballs Finals de Grau (TFG) - Enginyeria Informàtica
Programari - Treballs de l'alumnat