A puzzling issue of linguistic theory is how humans map speech, which is characterized by notable diversity, into abstract linguistic categories. I explore the extent to which both phonological and phonetic models and machine learning methods, such as deep neural architectures, can offer insights into the cognitive foundations of speech that make this mapping possible.  In the following, I organize my research into four projects.

The Phonology of Prosody

Prosody refers to the melodic patterns of speech. The melodic units of speech are characterized by continuous dynamic properties. A long-standing problem in phonology of prosody is to provide a formal definition of the units, a.k.a., pitch accents that make up the melodic pattern of speech, a.k.a., intonation. A relevant problem in phonological theory is how pitch accents are timed with respect to vowels and consonants. Several hypotheses were proposed that aim to define the timing of pitch accents with the segmental structure. By exploring the timing of the Cypriot Greek L*+H prenuclear pitch accent, in Themistocleous (2016), I investigated the predictions of three hypotheses: the invariance hypothesis, the segmental anchoring hypothesis, and the segmental anchorage hypothesis using two experiments: the first of which manipulates the syllable patterns of the stressed syllable, and the second modifies the distance of the L*+H from the following pitch accent. In the study, I show that the findings on the alignment of the low tone (L) corroborate the predictions of the segmental anchoring hypothesis: the L persistently aligns inside the onset consonant, a few milliseconds before the stressed vowel. However, the findings about the alignment of the high tone (H) are both intriguing and unexpected: the alignment of the H depends on the number of unstressed syllables that follow the prenuclear pitch accent. The ‘wandering’ of the H over multiple syllables is extremely rare among languages, and casts doubt on the invariance hypothesis and the segmental anchoring hypothesis, as well as indicating the need for a modified version of the segmental anchorage hypothesis. To address the alignment of the H, we suggest that it aligns within a segmental anchorage–the area that follows the prenuclear pitch accent–in such a way as to protect the paradigmatic contrast between the L*+H prenuclear pitch accent and the L+H* nuclear pitch accent.

Another relevant issue of the phonology of intonation is the formal description of pitch accents that make up the phonemic inventory of a language or dialect. Themistocleous (2011b) provides the first description of Cypriot Greek pitch accents and compares their production with those of Standard Modern Greek. Specifically, Themistocleous (2011) provides a model of prenuclear and nuclear pitch accent accents of Cypriot Greek. He also shows that there is significant dialectal variation in the realization of pitch accents with respect to their alignment and pitch range. Overall, Themistocleous (2011) provided a model about the way pitch accents manifest categories of information structure, namely information focus, topic, contrastive focus, and contrastive topics (see also Themistocleous, 2012).

Prosodic structure refers to the phonological constituents of prosodic hierarchy, such as the syllable, the foot, the phonological words and intermediate and intonational phrase. These phonological constituents account for various phenomena that take place at their edges, one of these phenomena is final lengthening. In Themistocleous (2014), I conducted two experiments that investigate the interaction of edge-tones and final lengthening. The study shows that in Cypriot Greek the following occur: (a) lengthening applies primarily on the syllable nucleus not the syllable onset, which suggests variety specific effects of lengthening; (b) lengthening depends on the edge-tones, namely, polar questions trigger more lengthening than statements and wh-questions; (c) lengthening provides support for at least two distinct prosodic domains over the phonological word, the intonational phrase and the intermediate phrase; greater lengthening associates with the first and shorter lengthening with the latter; (d) finally, syllable duration depends on the syllable distance from the boundary, i.e., lengthening locally applies on penultimate and ultimate syllables whereas antepenultimates are affected the least. Additionally, by pointing to the distinct lengthening effects of edge-tones and domain-boundaries, the findings provide evidence for dinstinct lengthening devices.

Phonetics and Phonology of Vowels and Consonants

Vowels and consonants are the basic units of speech as they make up the segmental composition of an utterance. The acoustics of vowels and consonants are extremely complex. The reason is that their acoustic manifestation varies greatly depending on the physiological properties of the speaker, the language variety, and the segmental context. For instance, there are assimilation and dissimilation processes depending on the preceding and following sound. In several studies, I explore how dialects affect the acoustic structure of consonants and vowel.

Specifically, in Themistocleous (2017), I investigate the acoustic properties of vowels of Standard Modern Greek (SMG) and Cypriot Greek (CG). The study shows the two varieties differ in their vowels. Specifically, (1) stressed vowels are more peripheral than unstressed vowels, (2) SMG unstressed /i u/ vowels are more raised than CG vowels, (3) SMG unstressed vowels are shorter than CG unstressed vowels, and (4) SMG /i o u/ are more rounded than the corresponding CG vowels. Moreover, it shows that variation applies to specific subsystems, as it is the unstressed vowels that vary cross-varietally whereas the stressed vowels display only minor differences. The implications of these findings with respect to vowel raising and vowel reduction are discussed.

Another issue that I explore are the properties of vowel dynamics. What I show in a submitted article is that Standard Modern Greek and Cypriot Greek differ in their vowel dynamics. The findings show that although Modern Greek vowels are considered relatively monothoptongal, their formants change during articulation significantly. Low and middle vowels shift from low to high; the high vowel [i] remains relatively static and the vowel [u] becomes more fronted. Most importantly the vowel quality, the stress, and the dialect of the speaker affect significantly the formant dynamics. The study argues that the dynamic approaches can constitute a significant methodological improvement over static approaches of vowels.

In a short article published in the Journal of the Acoustical Society of America (JASA), I investigated the effects of the dialect of the speaker on the spectral properties of stop bursts. The findings show that besides linguistic information, i.e., the place of articulation and the stress, the speech signals of bursts can encode social information, i.e., the dialects. A classification model using decision trees showed that skewness and standard deviation have a major contribution for the classification of bursts across dialects.

In the following submitted articles, I explore the dynamic effects of Greek vowels and show that vowel dynamics convey dialectal information even for languages that are characterized by relatively monopthongal vowels without significant transitions such as Greek. I also, explore the acoustics of fricatives and their coarticulatory properties with the following vowels. Themistocleous, Charalambos (submitted/a).

From Phonetics and Phonology to Written Representations

Cypriot Greek is a language variety that has no official writing system. Phonetics and phonology can inform the way we can write non-stardized varieties such as Cypriot Greek. A proposal for the codification of Cypriot Greek (graphemes and writing rules), which was incorporated in the dictionary Themistocleous, 2017; Themistocleous, Katsoyannou, Armosti, & Christodoulou, 2012a, 2012b is Armosti, Christodoulou, Katsoyannou, & Themistocleous, 2014.

The International Phonetic Alphabet (IPA) is a system for representing the sounds of world languages produced by typical and atypical speakers. Themistocleous (2011a) is a computational phonological grammar and computer application that allows the conversion of a text written in Standard Modern Greek and Cypriot Greek into IPA. The software application is titled IPAGreek and contains both the lexical and postlexical phonological rules of Greek. The phonological grammar includes both lexical and postlexical rules (e.g., assimilation, dissimilation, palatalization). The application has been employed for the transcription of the pronunciation in the Cypriot Greek dictionary mentioned above.

Speech and Machine Learning

Deep Neural Networks, SVMs, HMMs are behind the impressive achievements in text-to-speech, speech perception, machine translation, image description generation, and semantic interpretation. The can also offer insights about human perception and cognition, namely how humans process, store, retrieve information from speech signals, texts, vision, etc. I explore these architectures to understand how humans map speech, which is characterized by notable diversity, into abstract linguistic categories.

Specifically, in Themistocleous (forthcoming), I provide a classification model of two Modern Greek dialects, namely Athenian Greek and Cypriot Greek, using information from formant dynamics of F1, F2, F3, F4 and vowel duration. The measurements were employed in classification experiments, using three classifiers: Linear Discriminant Analysis, Flexible Discriminant Analysis, and C5.0. The latter outperformed the other classification models, resulting in a higher classification accuracy of the dialect. C5.0 classification shows that duration and the zeroth coefficient of F2, F3 and F4 contribute more to the classification of the dialect than the other measurements; it also shows that formant dynamics are important for the classification of dialect.

I am excited to conduct research that combines seemingly distinct research fields phonology, sociophonetics, signal processing, statistics and machine learning, because I believe that new knowledge rises at the intersections of research fields. The findings can have promising applications in conversational agents and speech technology. They also address issues in typical and atypical speech production and perception. For instance, the models can be implemented in applications for the diagnosis of atypical speech productions (e.g. forms of aphasia, SLI etc.).

For a complete list of my publications please see my CV.

