Tolga Kayadelen
Research Areas
Authored Publications
Sort By
A Gold Standard Dependency Treebank for Turkish
Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association" (2020), pp. 5156-5163
Preview abstract
We introduce TWT; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations. To date, it is the largest publicly available human-annotated morpho-syntactic Turkish treebank in terms of the annotated word count. It is also the first large Turkish dependency treebank that has a dedicated
Wikipedia section. We present the tagsets and the methodology that are used in annotating the treebank and also the results of the baseline experiments on Turkish dependency parsing with this treebank.
View details
The Practical Challenges of Active Learning: A Case Study from Live Experimentation
Jean-François Kagy
ICML Workshop on Human In the Loop Learning (2019)
Preview abstract
We tested, in a production setting, the use of active learning for selecting text documents for human annotations used to train a Thai segmentation machine learning model. In our study, two concurrent annotated samples were constructed, one through random sampling of documents from a text corpus, and the other through model-based scoring and ranking of documents from the same corpus. We observed that several of the assumptions forming the basis of offline (simulated) evaluation largely failed in the live setting. We present these challenges and propose guidelines addressing each of them which can be used for the design of live experimentation of active learning, and more generally for the application of active learning in live settings.
View details
A Syntactically Expressive Morphological Analyzer for Turkish
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, Association for Computational Linguistics, Dresden, Germany (2019), pp. 65-75
Preview abstract
We present a broad coverage model of Turkish morphology and an open-source morphological analyzer that implements it. The model captures intricacies of Turkish morphology-syntax interface, thus could be used as a baseline that guides language model development. It introduces a novel fine part-of-speech tagset, a fine-grained affix inventory and represents morphotactics without zero-derivations. The morphological analyzer is freely available. It consists of modular reusable components of human-annotated gold standard lexicons, implements Turkish morphotactics as finite-state transducers using OpenFst and morphophonemic processes as Thrax grammars.
View details
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Preview
Daniel Zeman
Martin Popel
Milan Straka
Jan Hajic
Joakim Nivre
Filip Ginter
Juhani Luotolahti
Sampo Pyysalo
Slav Petrov
Martin Potthast
Francis Tyers
Elena Badmaeva
Memduh Gokırmak
Anna Nedoluzhko
Silvie Cinkova
Jan Hajic jr.
Jaroslava Hlava
Vaclava Kettnerov
Zdenka Ure
Jenna Kanerva
Stina Ojala
Anna Missil
Christopher Manning
Sebastian Schuster
Siva Reddy
Dima Taji
Nizar Habash
Herman Leung
Marie-Catherine de Marneffe
Manuela Sanguinetti
Maria Simi
Hiroshi Kanayama
Valeria de Paiva
Kira Droganova
Hector Martinez Alonso,
Çağrı Çöltekin
Umut Sulubacak
Hans Uszkoreit
Vivien Macketanz
Aljoscha Burchardt
Kim Harris Katrin Marheinecke
Georg Rehm
Mohammed Attia
Ali Elkahky
Zhuoran Yu
Emily Pitler
Saran Lertpradit
Michael Mandl
Jesse Kirchner
Hector Fernandez Alcalde
Jana Strnadova
Esha Banerjee
Ruli Manurung
Antonio Stella
Atsuko Shimada
Sookyoung Kwak
Gustavo Mendonça
Tatiana Lando
Rattima Nitisaroj
Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2017), pp. 1-19
Preview abstract
The aim of this document is to provide a list of dependency tags that are to be used for the Arabic dependency annotation task, with examples provided for each tag. The dependency representation is a simple description of the grammatical relationships in a sentence. It represents all sentence relations uniformly typed as dependency relations. The dependencies are all binary relations between a governor (also known the head) and a dependant (any complement of or modifier to the head).
View details