Tolga Kayadelen

Tolga Kayadelen

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    A Gold Standard Dependency Treebank for Turkish
    Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association" (2020), pp. 5156-5163
    Preview abstract We introduce TWT; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations. To date, it is the largest publicly available human-annotated morpho-syntactic Turkish treebank in terms of the annotated word count. It is also the first large Turkish dependency treebank that has a dedicated Wikipedia section. We present the tagsets and the methodology that are used in annotating the treebank and also the results of the baseline experiments on Turkish dependency parsing with this treebank. View details
    Preview abstract We tested, in a production setting, the use of active learning for selecting text documents for human annotations used to train a Thai segmentation machine learning model. In our study, two concurrent annotated samples were constructed, one through random sampling of documents from a text corpus, and the other through model-based scoring and ranking of documents from the same corpus. We observed that several of the assumptions forming the basis of offline (simulated) evaluation largely failed in the live setting. We present these challenges and propose guidelines addressing each of them which can be used for the design of live experimentation of active learning, and more generally for the application of active learning in live settings. View details
    A Syntactically Expressive Morphological Analyzer for Turkish
    Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, Association for Computational Linguistics, Dresden, Germany (2019), pp. 65-75
    Preview abstract We present a broad coverage model of Turkish morphology and an open-source morphological analyzer that implements it. The model captures intricacies of Turkish morphology-syntax interface, thus could be used as a baseline that guides language model development. It introduces a novel fine part-of-speech tagset, a fine-grained affix inventory and represents morphotactics without zero-derivations. The morphological analyzer is freely available. It consists of modular reusable components of human-annotated gold standard lexicons, implements Turkish morphotactics as finite-state transducers using OpenFst and morphophonemic processes as Thrax grammars. View details
    CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
    Daniel Zeman
    Martin Popel
    Milan Straka
    Jan Hajic
    Joakim Nivre
    Filip Ginter
    Juhani Luotolahti
    Sampo Pyysalo
    Slav Petrov
    Martin Potthast
    Francis Tyers
    Elena Badmaeva
    Memduh Gokırmak
    Anna Nedoluzhko
    Silvie Cinkova
    Jan Hajic jr.
    Jaroslava Hlava
    Vaclava Kettnerov
    Zdenka Ure
    Jenna Kanerva
    Stina Ojala
    Anna Missil
    Christopher Manning
    Sebastian Schuster
    Siva Reddy
    Dima Taji
    Nizar Habash
    Herman Leung
    Marie-Catherine de Marneffe
    Manuela Sanguinetti
    Maria Simi
    Hiroshi Kanayama
    Valeria de Paiva
    Kira Droganova
    Hector Martinez Alonso,
    Çağrı Çöltekin
    Umut Sulubacak
    Hans Uszkoreit
    Vivien Macketanz
    Aljoscha Burchardt
    Kim Harris Katrin Marheinecke
    Georg Rehm
    Mohammed Attia
    Ali Elkahky
    Zhuoran Yu
    Emily Pitler
    Saran Lertpradit
    Michael Mandl
    Jesse Kirchner
    Hector Fernandez Alcalde
    Jana Strnadova
    Esha Banerjee
    Ruli Manurung
    Antonio Stella
    Atsuko Shimada
    Sookyoung Kwak
    Gustavo Mendonça
    Tatiana Lando
    Rattima Nitisaroj
    Josie Li
    Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2017), pp. 1-19
    Preview
    Preview abstract The aim of this document is to provide a list of dependency tags that are to be used for the Arabic dependency annotation task, with examples provided for each tag. The dependency representation is a simple description of the grammatical relationships in a sentence. It represents all sentence relations uniformly typed as dependency relations. The dependencies are all binary relations between a governor (also known the head) and a dependant (any complement of or modifier to the head). View details