Minimum cost multicuts for image and motion segmentation

Kardoost, Amirhossein

Citation link: http://dx.doi.org/10.25819/ubsi/10287

Files in This Item:

File	Description	Size	Format
Dissertation_Kardoost_Amirhossein.pdf		31.86 MB	Adobe PDF	View/Open

Dokument Type:	Doctoral Thesis
metadata.dc.title:	Minimum cost multicuts for image and motion segmentation
Other Titles:	Minimum Cost Multicuts für Bild- und Bewegungssegmentierung
Authors:	Kardoost, Amirhossein
Institute:	Department Elektrotechnik - Informatik
Free keywords:	Image segmentation, Motion segmentation, Minimum cost multicut, Uncertainty estimation, Self-supervised learning
Dewey Decimal Classification:	004 Informatik
GHBS-Clases:	TVVC TDB TUH
Issue Date:	2023
Publish Date:	2023
Abstract:	Clustering and its application in computer vision, such as image, mesh data, video, and motion segmentation, are the main topics we discuss in this dissertation. The clustering of the entities plays a crucial role in higher-level tasks such as action recognition, robot navigation, scene understanding, and 3D reconstruction. One well-known and widely used clustering framework is the minimum cost lifted multicut problem. This framework has recently found many applications, such as image and mesh decomposition or multiple object tracking. It addresses such issues in a graph-based model, where real-valued costs are assigned to the edges between entities such that the minimum cut decomposes the graph into an optimal number of segments. Solving the multicut problem is NP-hard and computationally expensive. Therefore, we propose two variants of a heuristic solver (primal feasible heuristic), which greedily generate solutions within a bounded time. Driven by a probabilistic formulation of the minimum cost multicuts, we provide a measure for the uncertainties of the decisions made during the optimization. We argue that access to such uncertainties is crucial for many practical applications and evaluate the proposed uncertainty measure on image and motion segmentation. To track the object masks in the video, we use low-level cues such as optical flow information and image boundaries and study the importance of such cues in providing competing and high-quality results. While high-end computer vision methods for this task rely on sequence-specific training of dedicated Convolutional Neural Network (CNN) architectures, we show the potential of a variational model based on generic video information from motion and color. The optical flow information is also used for the motion segmentation task, where observable motion in videos can give rise to the definition of objects moving with respect to the scene. This problem is usually tackled either by aggregating motion information in long, sparse point trajectories or directly producing dense segmentations per frame, relying on large amounts of training data. In this dissertation, we address the problem with the sparse motion trajectories and emphasize that generic cues such as optical flow information and image boundaries are crucial to address this and similar tasks. The complex motion patterns, such as out-of-plane rotation or scaling movement of the objects, add ambiguities to the segmentation problem. Utilizing the hyper-graphs resolve such ambiguities by modeling translational motion to Euclidean or affine transformations. We evaluate our proposed methods on well-known datasets of the addressed task and show that the integration of the low-level cues improves the result on the higher-level tasks. Clustering und seine Anwendung in Computer Vision, wie Bild-, 3D Meshdaten-, Video- und Bewegungssegmentierung, sind die Hauptthemen, die wir in dieser Dissertation behandeln. Das Clustering von Entitäten spielt eine entscheidende Rolle bei übergeordneten Aufgaben wie Aktivitätserkennung, Roboternavigation, Szenenverständnis und 3D-Rekonstruktion. Ein bekanntes und weit verbreitetes Clustering-Verfahren ist das Minimum Cost Lifted Multicut Problem. Dieses Framework hat in letzter Zeit viele Anwendungen gefunden, wie z.B. die Zerlegung von Bildern und Meshes oder das Tracking von Objekten. Es behandelt solche Probleme in einem Graph-basierten Modell, bei dem den Kanten zwischen Entitäten reellwertige Kosten zugewiesen werden, sodass der minimaler Schnitt den Graphen in eine optimale Anzahl von Segmenten zerlegt. Die Lösung des Multicut-Problems ist NP-hart und rechenaufwändig. Daher schlagen wir zwei Varianten eines heuristischen Lösers (primal feasible Heuristik) vor, die innerhalb einer begrenzten Zeit Lösungen "greedy"' erzeugen. Angetrieben durch eine probabilistische Formulierung des Minimum Cost Multicuts liefern wir ein Maß für die Unsicherheiten der Entscheidungen, die während der Optimierung getroffen werden. Wir argumentieren, dass der Zugang zu solchen Unsicherheiten für viele praktische Anwendungen von entscheidender Bedeutung ist und evaluieren das vorgeschlagene Unsicherheitsmaß im Kontext von Bild- und Bewegungssegmentierung. Um die Objektmasken im Video zu verfolgen, verwenden wir niedriges Niveau-Hinweise wie optische Flussinformationen und Bildgrenzen und untersuchen die Bedeutung solcher Hinweise für die Bereitstellung konkurrierender und qualitativ hochwertiger Ergebnisse. Während High-End-Computer-Vision-Methoden für diese Aufgabe auf sequenzspezifisches Training spezieller Faltungsneuronales Netzwerk (CNN)-Architekturen angewiesen sind, zeigen wir das Potenzial eines Variationsmodells, das auf generischen Videoinformationen aus Bewegung und Farbe basiert. Die optischen Flussinformationen werden auch für die Bewegungssegmentierung verwendet, bei der beobachtbare Bewegungen in Videos zur Definition von Objekten führen können, die sich in Bezug auf die Szene bewegen. Dieses Problem wird in der Regel entweder durch die Aggregation von Bewegungsinformationen in langen, spärlichen Punkttrajektorien oder durch die direkte Erstellung von dichten Segmentierungen pro Bild angegangen, wobei große Mengen von Trainingsdaten benötigt werden. In dieser Dissertation befassen wir uns mit dem Problem der spärlichen Bewegungstrajektorien und betonen, dass allgemeine Hinweise wie optische Flussinformationen und Bildgrenzen entscheidend sind, um diese und ähnliche Aufgaben zu lösen. Die komplexen Bewegungsmuster, wie z.B. Rotation außerhalb der Ebene oder Skalierung der Objekte, fügen dem Segmentierungsproblem Unklarheiten hinzu. Die Verwendung von Hypergraphen löst solche Mehrdeutigkeiten durch die Modellierung von Translationsbewegungen mit euklidischen oder affinen Transformationen. Wir evaluieren die von uns vorgeschlagenen Methoden an bekannten Datensätzen der adressierten Aufgabe und zeigen, dass die Integration der niedriges Niveau-Hinweise das Ergebnis bei den höherwertigen Aufgaben verbessert.
DOI:	http://dx.doi.org/10.25819/ubsi/10287
URN:	urn:nbn:de:hbz:467-24806
URI:	https://dspace.ub.uni-siegen.de/handle/ubsi/2480
Appears in Collections:	Hochschulschriften

This item is protected by original copyright

View License

Show full item record

Page view(s)

411

checked on Dec 26, 2024

Download(s)

171

checked on Dec 26, 2024

Google Scholar^TM

Check

Opus Siegen

Files in This Item:

Page view(s)

Download(s)

Google Scholar^TM

Altmetric

Opus Siegen

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Google Scholar^TM