Download Segmentation and indexation of complex objects in comic book images PDF

TitleSegmentation and indexation of complex objects in comic book images
LanguageEnglish
File Size4.0 MB
Total Pages182
Table of Contents
                            Acknowledgement
Abstract
Résumé
Resumen
Introduction
	Presentation
	Motivations
	Objectives and contributions
	Outlines
State-of-the-art
	Document image analysis
	Comic book image analysis
		Balloon segmentation and tail detection
		Text extraction and recognition
		Comic character detection
	Holistic understanding
	Existing applications
	Conclusion
Sequential information extraction
	Introduction
	Panel and text
	From text to balloon
		Implicit balloon extraction
Independent information extraction
	Introduction
	Panel extraction
	Text localisation and recognition
		Introduction
		Bi-level segmentation
		Text/graphics separation
	Balloon extraction and classification
		Balloon classification
	Comic character spotting
		Colour quantization
		Input query
		Query descriptor
Knowledge-driven analysis
	Introduction
	Proposed models
		Image processing domain
		Comics domain
		Model interactions
	Expert system for contextual analysis
		Interactions between low and high level processing
		Constrains for the low level extractions
	Processing sequence
		Simple element extraction
Experimentations
	Dataset and ground truth
		Dataset description
		Ground truth information
	Metrics
		Object localisation metric
		Object segmentation metric
		Text recognition metric
		Tail detection metric
		Semantic links metric
	Panel extraction evaluation
		Arai's method
		Comparison and analysis
	Text extraction evaluation
		Arai's method
		Sequential approach
		Independent approach
		Knowledge-driven approach
		Comparison and analysis
	Text recognition evaluation
	Balloon extraction evaluation
		Arai's method
		Ho's method Ho2012
		Sequential approach
		Independent approach
		Knowledge-driven approach
		Comparison and analysis
	Balloon classification
	Comic character extraction evaluation
		Independent approach
		Knowledge-driven approach
		Comparison and analysis
	Knowledge-driven analysis overall evaluation
		Comics model evaluation
		Framework evaluation
	Conclusions
Conclusions
	Summary and contributions
	Future perspectives
Pre-processing
	Segmentation
		Region-growing
		Split and merge
		Contour-based
		Bi-level grey-scale thresholding
		Multi-level colour thresholding
Feature extraction
	Connected-component labelling
Dataset
	Image categories
Ground truth
	Ground truth construction
		Visual annotation
		Semantic annotation
	Ground truth quality assessment
	Terms of use
List of Publications
Bibliography
                        
Document Text Contents
Page 1

LogoUAB


Segmentation and indexation of complex
objects in comic book images

A dissertation submitted by Christophe
Rigaud at Universitat Aut�onoma de
Barcelona to ful�l the degree of Doctor of
Philosophy.

Bellaterra, December 11, 2014

http://www.christophe-rigaud.com
http://www.christophe-rigaud.com

Page 2

UNIVERSITÉ DE LA ROCHELLE

ÉCOLE DOCTORALE S2IM
Laboratoire Informatique, Image et Interaction (L3i)

THÈSE présentée par :

Christophe RIGAUD

soutenue le :11 décembre 2014
pour obtenir le grade de :Docteur de l’université de La Rochelle

Discipline :informatique et applications

Segmentation et indexation d’objets complexes dans les
images de bandes déssinées

JURY :
Bart LAMIROY Professeur associé, Université de Lorraine (France), Examina-

teur, Président du jury
Simone MARINAI Professeur associé, Université de Forence (Italie), Rapporteur
Apostolos ANTONACOPOULOS Professeur associé, Université de Salford (Grande-Bretagne),

Rapporteur
Koichi KISE Professeur, Université d’Osaka (Japon), Examinateur
Jean-Philippe DOMENGER Professeur, Université de Bordeaux (France), Examinateur
Jean-Christophe BURIE Professeur, Université de La Rochelle (France), Directeur de

thèse
Dimosthenis KARATZAS Professeur associé, Université Autonome de Barcelone, Enca-

drant de thèse
Jean-Marc OGIER Professeur, Université de La Rochelle, Encadrant de thèse

Page 91

66 KNOWLEDGE-DRIVEN ANALYSIS

with a reading order from top to bottom. Similarly to panels and balloons concepts,
the attribute hasRank indicates their position in the reading order inside the balloon
through the property hasNextTextLine. Text transcription is stored via the attribute
hasText.

The concepts Panel, Balloon, Tail, TextLine and Character are disjoint, each
element can only be an instance of one of them. Figure 5.5 illustrates the addition of
these concepts to the ontology.

Figure 5.5: Integration of concepts Panel, Balloon, Tail, TextLine and Char-
acter to the initial model Figure 5.4.

Several properties are introduced into our ontology to represent the links between
the various components of a panel. A panel being relative to the page, the property
hasPanel binds an instance of Page to an instance of Panel.

Properties hasBalloon and hasCharacter are formally de�ned and represents the
existing membership between, on one hand, a box and, on the other hand, an instance
of Balloon and Character. The property hasTextLine represents the link between a
text line and a balloon.

Page 92

5.2. Proposed models 67

Specialisation of the content The semantic level of the presented concepts
remains at a degree of granularity quite rude, here we present how to re�ne them.
Balloons might be categorized into two subsets according to their relation to comics
character or not. On one hand the balloons emitted by characters (spoken or thoughts)
or elements of the scene (radio, television, etc.), and on the other hand, narrative bal-
loons. The shape of the speech balloon varies from one author to another. One feature
which seems to be a consensus to discriminate narrative balloons from others, is the
presence or not of a tail pointing to the source of the sound. Concepts SpeechBalloon
and NarrativeBalloon are introduced to represent balloons equipped with a tail or not
respectively.

The semantic of text lines in these newly specialised balloons can then be re�ned
accordingly. Some text lines are carrying elements of speech, while others are for sto-
rytelling. The two corresponding concepts are SpokenTextLine and NarrativeTextLine
respectively. They are simply de�ned as text lines belonging to an instance of Speech-
Balloon or NarrativeBalloon.

Speech balloons are usually issued by a character present in the panel. The link
between a character and a speech balloon is expressed through the property says,
whose domain Character and range SpeechBalloon. The concept Speaker represents a
character which is emitting a speech balloon.

Figure 5.6 illustrates the relations de introduced for the concepts Balloon, TextLine
et Character.

Figure 5.6: Speci�cation of concepts Character, Balloon and TextLine

5.2.3 Model interactions

In this section we present the interactions between the image and comics ontologies so
that they can communicate and combine their reasoning capabilities. We call Oimage
and Ocomics the ontologies of image and comics respectively presented Sections 5.2.1
and 5.2.2.

Page 181

156 BIBLIOGRAPHY

[180] J.J. Weinman, E. Learned-Miller, and A.R. Hanson. Scene text recognition
using similarity and a lexicon with sparse belief propagation. Pattern Analysis
and Machine Intelligence, IEEE Transactions on, 31(10):1733 {1746, 2009. 5,
15

[181] C Xu and J L Prince. Snakes, shapes, and gradient vector
ow. IEEE Trans-
actions on Image Processing, 7(3):359{369, 1998. 14, 28, 97, 120

[182] Li Xu, Cewu Lu, Yi Xu, and Jiaya Jia. Image smoothing via l0 gradient min-
imization. In Proceedings of the 2011 SIGGRAPH Asia Conference, SA '11,
pages 174:1{174:12, New York, NY, USA, 2011. ACM. 54, 55, 107

[183] Pengfei Xu, Hongbo Fu, Oscar Kin-Chung Au, and Chiew-Lan Tai. Lazy se-
lection: A scribble-based tool for smart shape elements selection. ACM Trans.
Graph., 31(6):142:1{142:9, November 2012. 54

[184] Masashi Yamada, Rahmat Budiarto, Mamoru Endo, and Shinya Miyazaki.
Comic image decomposition for reading comics on cellular phones. IEICE
Transactions, 87-D(6):1370{1376, 2004. 15

[185] B. Yao, X. Yang, and S.C. Zhu. Introduction to a large-scale general purpose
ground truth database: methodology, annotation tool and benchmarks. In
Energy Minimization Methods in Computer Vision and Pattern Recognition,
pages 169{183. Springer, 2007. 127

[186] Peng Ye and D. Doermann. Document image quality assessment: A brief sur-
vey. In Proceedings of International Conference on Document Analysis and
Recognition (ICDAR), pages 723{727, Aug 2013. 12

[187] S. Zinger, C. Millet, B. Mathieu, G. Grefenstette, P. H�ede, and P. a. Mo•ellic.
Extracting an ontology of portrayable objects from wordnet. In MUSCLE /
ImageCLEF workshop on Image and Video retrieval evaluation, pages 17{23,
2005. 65

Page 182

CVC_CAT_CMYK


Segmentation et indexation d’objets complexes dans les images de bandes déssinées

Résumé :

Dans ce manuscrit de thèse, nous détaillons et illustrons les différents défis scientifiques liés à l’analyse
automatique d’images de bandes dessinées, de manière à donner au lecteur tous les éléments concernant les
dernières avancées scientifiques en la matière ainsi que les verrous scientifiques actuels.

Nous proposons trois approches pour l’analyse d’image de bandes dessinées composé de différents traitements.
La première approche est dite “séquentielle” car le contenu de l’image est décrit progressivement et de manière
intuitive. Dans cette approche, l’extraction des éléments se succède, en commençant par les plus simples tels que
les cases, le texte et les bulles qui servent ensuite à guider l’extraction d’éléments complexes tels que la queue
des bulles et les personnages au sein des cases. La seconde approche propose des extractions indépendantes les
unes des autres de manière à éviter la propagation d’erreur entre les traitements. D’autres éléments tel que la
classification du type de bulle et la reconnaissance de texte y sont associés. La troisième approche introduit un
système fondé sur une base de connaissance à priori du contenu des images de bandes dessinées qui permet de
construire une description sémantique de l’image. Ce système dirigé par les modèles de connaissances, combine
les avantages des deux approches précédentes et permet une description sémantique de haut niveau pouvant
inclure des informations telles que l’ordre de lecture des cases, du texte et des bulles, des relations entre les bulles
et leurs locuteurs ainsi que la distinction entre les personnages.

Mots clés : traitement d’images, reconnaissance de formes, analyse de documents, compréhension de bandes des-
sinées.

Segmentation and indexation of complex objects in comic book images

Summary :

In this thesis, we review, highlight and illustrate the challenges related to comic book image analysis in order to
give to the reader a good overview about the last research progress in this field and the current issues.

We propose three different approaches for comic book image analysis that are composed by several processing.
The first approach is called “sequential” because the image content is described in an intuitive way, from simple
to complex elements using previously extracted elements to guide further processing. Simple elements such as
panel text and balloon are extracted first, followed by the balloon tail and then the comic character position in
the panel. The second approach addresses independent information extraction to recover the main drawback
of the first approach : error propagation. This second method is called “independent” because it is composed
by several specific extractors for each elements of the image without any dependence between them. Extra
processing such as balloon type classification and text recognition are also covered. The third approach introduces
a knowledge-driven and scalable system of comics image understanding. This system called “expert system” is
composed by an inference engine and two models, one for comics domain and an other one for image processing,
stored in an ontology. This expert system combines the benefits of the two first approaches and enables high level
semantic description such as the reading order of panels and text, the relations between the speech balloons and
their speakers and the comic character identification.

Keywords : image processing, pattern recognition, document analysis, comics understanding.

Laboratoire L3i - Informatique, Image, Interaction

Pôle Sciences et Technologies, Université de La Rochelle,
avenue Michel Crépeau

17042 La Rochelle - Cedex 01 - France

Similer Documents