FoodMem: A Fast and precise food video segmentation

dc.contributor.advisorRadeva, Petia
dc.contributor.advisorRodrigues Sepúlveda Marques, Ricardo Jorge
dc.contributor.advisorAhmad Almughrabi, Sami Mohammad
dc.contributor.authorGalán Pacheco, Adrián
dc.date.accessioned2024-10-14T07:15:04Z
dc.date.available2024-10-14T07:15:04Z
dc.date.issued2024-06-10
dc.descriptionTreballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Petia Radeva, Ricardo Jorge Rodrigues Sepúlveda Marques i Sami Mohammad Ahmad Almughrabica
dc.description.abstractFood segmentation is crucial in various research fields, such as health, agriculture, and food biotechnology. Segmenting and tracking different types of food in images or videos is undeniably a significant achievement, and it is currently considered a newly emerging topic in today’s society. Our study aims to find and develop a production-grade framework for segmenting and tracking various types of food in a given set of images or videos at high-quality performance and near-real-time speed with minimum hardware resources. This unlocks many challenges in real-world applications, such as food volume estimation, calorie estimation, 3D reconstruction, augmented and virtual reality, or digital twins. We introduce FoodMem, a novel framework for segmenting food in 360° scenes. Our framework can effectively segment food portions in a given video and generate accurate masks. Most semantic segmentation models, especially for food-related tasks, have limitations that affect their performance, such as handling different camera locations that did not exist in the training set. Plus, the inference speed for individual images does not fit real-world applications, especially those that focus on video processing. In contrast, memory-based models are becoming popular in object-tracking applications because of their performance and speed. Still, they are limited since they rely on user input, such as the user drawing the input mask manually, which indicates a lack of automation. To overcome these limitations, we propose FoodMem, a novel food video segmentation framework that combines the (1) SETR model to generate segment one- or few- masks of the food portions in a given scene and (2) XMem++, a memory-based tracking model, to track the food masks in complex scenes. Our framework performs better than the state-of-the-art food segmentation frameworks in segmenting food in different camera-capturing locations, illumination, reflection, scene complexity, and food diversity, achieving significant segmentation noise reduction, artifact elimination, and completing the missing parts. We also introduce an annotated food dataset, which covers new challenging use cases not found in previous benchmarks. We conduct extensive experiments on Nutrition5k and Vegetables & Fruits datasets, showing that FoodMem improves the state-of-the-art by 2.5% mean average precision in food video segmentation. Moreover, FoodMem is 58 times faster than the state-of-the-art on average for both datasets. The source code is accessible at: https://amughrabi.github.io/foodmem/ca
dc.format.extent83 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/215722
dc.language.isoengca
dc.rightsmemòria: cc-nc-nd (c) Adrián Galán Pacheco, 2024
dc.rightscodi: MIT (c) Adrián Galán Pacheco, 2024
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.urihttps://opensource.org/license/mit*
dc.sourceTreballs Finals de Grau (TFG) - Enginyeria Informàtica
dc.subject.classificationReconeixement de formes (Informàtica)ca
dc.subject.classificationVisualització tridimensionalca
dc.subject.classificationAlimentsca
dc.subject.classificationVisió per ordinadorca
dc.subject.classificationProgramarica
dc.subject.classificationTreballs de fi de grauca
dc.subject.otherPattern recognition systemsen
dc.subject.otherThree-dimensional display systemsen
dc.subject.otherFooden
dc.subject.otherComputer visionen
dc.subject.otherComputer softwareen
dc.subject.otherBachelor's thesesen
dc.titleFoodMem: A Fast and precise food video segmentationca
dc.typeinfo:eu-repo/semantics/bachelorThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
tfg_galan_pacheco_adrian.pdf
Mida:
7.26 MB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
codi.zip
Mida:
47.98 MB
Format:
ZIP file
Descripció:
Codi font