Assessment of the Resemblance Metrics for Synthetic data validation

dc.contributor.advisorCortés Martínez, Jordi
dc.contributor.advisorFernández Martínez, Daniel
dc.contributor.authorChen, Xinnuo
dc.date.accessioned2026-02-04T11:30:20Z
dc.date.available2026-02-04T11:30:20Z
dc.date.issued2025
dc.descriptionTreballs Finals de Grau en Estadística UB-UPC, Facultat d'Economia i Empresa (UB) i Facultat de Matemàtiques i Estadística (UPC), Curs: 2024-2025, Tutors: Jordi Cortés Martínez ; Daniel Fernández Martínez
dc.description.abstractIn the context of the constant growth of artificial intelligence, the requirement for large volumes of data has become one of the main challenges. Using synthetic data is a viable alternative for addressing both the scarcity of real data and the need to protect information privacy. For synthetic data to be useful, it is essential to validate that the characteristics of the original data are preserved. This study analyses the reliability of the SPECKS metric for measuring similarity between real and synthetic data in cluster analysis. Several factors affecting the ability of algorithms to repli cate the structure of the original clusters are examined through simulations. The relationship between SPECKS and clustering metrics that allow the similarity of the clusters’ structure to be evaluated is also studied to determine whether SPECKS can be a good indicator of the quality of structural preservation in synthetic data clusters. The results suggest that SPECKS is insensitive to structural changes and is therefore not a suitable metric for evaluating structural quality in cluster analysis.
dc.format.extent51 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/226616
dc.language.isoeng
dc.rightscc-by-nc-nd (c) Chen, 2025
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.classificationIntel·ligència artificialcat
dc.subject.classificationMètodes de simulaciócat
dc.subject.classificationComputació distribuïdacat
dc.subject.classificationTreballs de fi de grau
dc.subject.otherArtificial intelligenceeng
dc.subject.otherSimulation methodseng
dc.subject.otherComputational grids (Computer systems)eng
dc.subject.otherBachelor's theseseng
dc.titleAssessment of the Resemblance Metrics for Synthetic data validation
dc.typeinfo:eu-repo/semantics/bachelorThesis

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
TFG-EST_ChenXinnuo_2025.pdf
Mida:
15.3 MB
Format:
Adobe Portable Document Format