Deep Learning and Uncertainty Modeling in Visual Food Analysis

Bolaños Solà, Marc

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/177330

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Radeva, Petia	-
dc.contributor.author	Bolaños Solà, Marc	-
dc.contributor.other	Universitat de Barcelona. Departament de Matemàtiques i Informàtica	-
dc.date.accessioned	2021-05-17T10:01:56Z	-
dc.date.available	2021-05-17T10:01:56Z	-
dc.date.issued	2021-04-09	-
dc.identifier.uri	https://hdl.handle.net/2445/177330	-
dc.description.abstract	[eng] The world of Machine Learning and Computer Vision has experienced a revolution since the last years. The appearance of Deep Learning algorithms and Convolutional Neural Networks, altogether with the increased processing capabilities provided by modern GPUs and the enormous amounts of annotated data publicly available, have allowed a boost in the field as never seen before. These notable improvements achieved in the Machine Learning world have led to the appearance of new fields like the Multimodal Learning, which encompasses and learns from many subfields. Additionally, new applications have taken profit of these advancements in order to reach high levels of performance. The huge results improvement of the currently available algorithms have allowed not only revolutionizing the academic world, but also bringing AI-based solutions to the market that looked like science fiction barely 10 years ago. This thesis, which is written as a papers compendium, focuses on delving deeper into the novel topic of Deep Multimodal Learning by proposing new algorithms and solutions for both already existing and newly defined problems. From the applications perspective, most of the papers presented can be divided in two areas of applicability. From the one hand, Egocentric Vision and Storytelling, which consists in acquiring images from the daily life of a person in order to analyse its behaviour patterns like social interactions, activities and events, interactions with objects, etc. And on the other hand, Food Recognition and Analysis, which consists in visually analysing and recognizing the food appearing on images in multiple contexts and with different levels of complexity, from food groups recognition to nutritional analysis. In both applications, the final purpose of the proposed papers is building tools that provide information that could lead to a better quality of life of the users.	ca
dc.description.abstract	[spa] El mundo del Machine Learning y la Visión por Computador ha experimentado una revolución los últimos años. La aparición de algoritmos de Deep Learning y Convolutional Neural Networks, junto con las mayores capacidades de procesamiento proporcionadas por GPU modernas y las enormes cantidades de datos anotados disponibles públicamente, han permitió un impulso en el campo como nunca antes se había visto.Estas notables mejoras logradas en el mundo del Machine Learning han llevado a la aparición de nuevos campos como el Aprendizaje Multimodal, que engloba y aprende de muchos subcampos. Además, nuevas aplicaciones han aprovechado estos avances para alcanzar altos niveles de rendimiento. La enorme mejora en los resultados de los algoritmos disponibles actualmente ha permitido no solo revolucionar el mundo académico, sino también llevar al mercado soluciones basadas en IA que parecían ciencia ficción hace apenas 10 años.Esta tesis, que está escrita como un compendio de artículos, se enfoca en profundizar en el novedoso tema del Aprendizaje Multimodal Profundo al proponer nuevos algoritmos y soluciones para problemas ya existentes y recientemente definidos. Desde la perspectiva de las aplicaciones, la mayoría de los trabajos presentados se pueden dividir en dos áreas de aplicabilidad. Por un lado, la Visión Egocéntrica y el Storytelling, que consiste en la adquisición de imágenes de la vida diaria de una persona para analizar su comportamiento y extraer patrones asociadas a estos como por ejemplo interacciones sociales, actividades y eventos, interacciones con objetos, etc. Y por otro lado, el Reconocimiento y Análisis de Alimentos, que consiste en visualmente analizar y reconocer la comida que aparece en imágenes en múltiples contextos y con diferentes niveles de complejidad, desde el reconocimiento de grupos de alimentos hasta el análisis nutricional.En ambas aplicaciones, el propósito final de los artículos propuestos es construir herramientas que brinden información que pueda conducir a una mejor calidad de vida de los usuarios.	ca
dc.format.extent	260 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	ca
dc.publisher	Universitat de Barcelona	-
dc.rights	(c) Bolaños Solà, Marc, 2021	-
dc.source	Tesis Doctorals - Departament - Matemàtiques i Informàtica	-
dc.subject.classification	Aprenentatge automàtic	-
dc.subject.classification	Algorismes	-
dc.subject.classification	Adquisició del coneixement (Sistemes experts)	-
dc.subject.classification	Percepció de les formes	-
dc.subject.classification	Xarxes neuronals convolucionals	-
dc.subject.other	Machine learning	-
dc.subject.other	Algorithms	-
dc.subject.other	Knowledge acquisition (Expert systems)	-
dc.subject.other	Form perception	-
dc.subject.other	Convolutional neural networks	-
dc.title	Deep Learning and Uncertainty Modeling in Visual Food Analysis	ca
dc.type	info:eu-repo/semantics/doctoralThesis	ca
dc.type	info:eu-repo/semantics/publishedVersion	-
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
dc.identifier.tdx	http://hdl.handle.net/10803/671672	-
Appears in Collections:	Tesis Doctorals - Departament - Matemàtiques i Informàtica

Files in This Item:

File	Description	Size	Format
MBS_PhD_THESIS.pdf		50.39 MB	Adobe PDF	View/Open

Show simple item record