Poio API in linguistic analysis

Example of word order analysis with Poio API

This week we published new versions of our software libraries Poio API (v. 0.3.6) and graf-python (v. 0.3.1). One of the use cases of Poio API is in linguistic analysis, as Poio API allows unified access to a diversity of file formats like Elan’s EAF, Toolbox .TXT/.XML and CSV files. This allows the linguist to focus on research questions instead of managing file formats and data.

To test Poio API in a real world project we are working together with Diana Forker, who is currently working on the projects “Agreement in Discourse” and “Documenting Dargi languages in Daghestan” at the University of Bamberg. Diana collected and annotated several stories in Hinuq and Avar, two languages of the Nakh-Daghestanian language family. To annotate the data Diana developed her own customized workflow, wíth the final GRAID annotation done in Excel. We could easily export the Excel data to a CSV file and start to analyze the data with the help of Poio API.

One analysis document of the cooperation is now available online as an IPython notebook. The resulting document is a mixture of research questions (as headlines, by Diana), Python code and results of quantitative analysis (by Peter Bouda at CIDLeS) and can be viewed online here:

http://nbviewer.ipython.org/github/pbouda/notebooks/blob/master/Diana%20Hinuq%20Word%20Order.ipynb

The analyis mainly consists of summarized counts and statistical tests of the annotations on different tiers. This is work in progress, but already demonstrates that Poio API can be useful in the analysis of data from language documentation. Any feedback is very weclome, either as comment to this blog post or as e-mail to Peter (pbouda@cidles.eu). More information about Poio API, other use cases and examples are available in the official documentation:

https://poio-api.readthedocs.org/en/latest/

Leave a Reply