Auslautverhärtung: They Sound the Same to Me
by Claire Lutrick
Final Devoicing, or Auslautverhärtung in German, is the linguistic phenomenon, where voiced obstruents in the syllable final and word final positions are devoiced. This paper analyzes three English L1, German L2 speakers at differing levels of fluency and one German L1, English L3 speaker to ascertain the degree of final devoicing in speech. Because the German language has Auslautverhärtung, but the English language does not have it, it is difficult for German L2 speakers to acquire the ability to express Auslautverhärtung in their speech. Furthermore, because it is not present in English, oftentimes its existence is entirely unknown to German L2 speakers, solidifying its absence, due to lack of knowledge. This paper argues that as one progresses on the scale of speaking ability from nonnative speaker, native-like speaker, Heritage Speaker, to native speaker, Auslautverhärtung becomes more prominent. Measuring Final Voice Onset Time (VOT) using the software Praat, VOT Final times are recorded in milliseconds, in order to obtain precise lengths of individual sounds.
The current work is a study on variation in spoken German, assessing patterns of application in the consonants /d/, /t/, /g/, and /k/ through close examination of minimal pairs, such as bang and bank in English. Where these aforementioned words are audibly different in English, the same words are audibly identical in German. The difference between voiced and voiceless consonants can be determined by measuring the voice onset time (VOT). Final Devoicing, or Auslautverhärtung in German, is the linguistic phenomenon, where voiced obstruents (/d/, /g/) in the syllable final position are devoiced.
This paper analyzes three English L1 German L2 speakers at differing levels of fluency and one German L1 English L2 speaker to ascertain the degree of final devoicing in speech. Because the German language uses Auslautverhärtung, and the English language does not, it is difficult for German L2 speakers to acquire the ability to realize and apply Auslautverhärtung in their speech. Furthermore, because it is not present in English, oftentimes its existence is entirely unknown to German L2 speakers, solidifying its absence, due to lack of knowledge. Results from this study add support to the hypothesis that non-native speakers applyAuslautverhärtung less than native-like speakers, heritage speakers, and native speakers.
To ascertain if Auslautverhärtung is present, the rules of voiceless obstruents and voiced obstruents in the two languages are compared. Voiceless obstruents in both English and German are expected to lie within the 50-80 ms range. Voiced obstruents in both English and German, however, are expected to lie within the 0-20 ms range. Auslautverhärtung neutralizes this distinction: all German obstruents in the syllable final position, or coda position, are expected to lie within the 50-80 ms range, i.e. voiced obstruents are devoiced.. In order to obtain precise lengths of individual sounds, VOT final times are measured and recorded in milliseconds.
This paper argues that as one progresses on the scale of speaking ability from nonnative speaker, native-like speaker, Heritage Speaker, to native speaker, Auslautverhärtung becomes more prominent. I expect to find that, as one progresses on the scale of fluency from nonnative speaker to native speaker, Auslautverhärtung will become increasingly more prominent, being fully expressed in native speakers, and rarely, if at all, in nonnative speakers. For native-like speakers, I anticipate Auslautverhärtung to be applied more, however, not as much as Heritage Speakers, who I expect to speak with Auslautverhärtung more, due to natural exposure and language acquisition from an early age.
Equipment and Methodology
Fluency in this study is gauged by the Goethe Institute certificate system, which follows the Common European Framework of Reference for Languages where level A1 is novice and a native speaker is classified as C2. Speaker 1 is of the B1 level ranking, while Speaker 2 is of the C1 level ranking. Speaker 3, the Heritage Speaker, is of the B2 level ranking. To collect data, word lists were read in the sentence frame Kannst du ___für mich sagen? ‘Can you say ___ for me,’ in order to ensure consistency and provide natural rhythm. Minimal pairs were also randomized within the list of words to avoid pattern recognition, which could sway the data. The free, online software package Praat was fundamental to the analysis of the data in this paper. Praat’s software capabilities include recording speech and providing a medium in which to analyze speech components such as intonation, intensity, volume, and the isolation of individual sounds. In Praat, the three repetitions of each individual sentence frame were isolated, and the cleanest and clearest token was chosen as the sample. The second sentences in the sets were chosen most often, because they tended to sound more fluid and less forced. However, when the second sentence did not sound natural and was forced, the first and third sentences were available in order to maintain the integrity of the data analysis and collection.
In Praat, the data is portrayed visually as seen in Figure 1. The spectrogram representation allows for an optical recognition of voicing layered above with the audio playback. This is seen in the grayscale channel underneath, displaying the voicing and devoicing characteristics, and the frequency channel above. Through the audio and visual recognition features, the target consonants were isolated, and their VOT were measured in milliseconds.
The difference in measurements of minimal pairs, tokens 1 and 2, tokens 3 and 4, and tokens 5 and 6, were calculated, in order to obtain a quantitative value of the final VOT measurements. These VOT differences were utilized to ascertain if the obstruents were above or below the threshold of perception—the ability of the human ear to perceive an audible difference, thus making a distinction between the two phonemes.
The results of data collected can be seen in Table 1 below. Table 1 shows the tokens listed with their minimal pair counterpart, individual token VOT final measurements, and the differences in the minimal pair VOT final. Speaker 1, the nonnative-like speaker, displayed VOT final differences for Rad/Rat (‘wheel/advice’) and Bad/Bat (‘bathroom/requested’) of 44.7 ms and 54.1 ms, respectively. The tokens Bank/Bang (‘bench/afraid’), display a VOT final difference of 12.3 ms. Speaker 2, the native-like speaker, displayed VOT final differences for the tokens Rad/Rat, Bad/Bat, and Bank/Bang of 19.5 ms, 20.9 ms, and 31.3 ms, respectively. Speaker 3, the heritage speaker, displayed VOT final differences for Rad/Rat and Bad/Bat of 4.9 ms and 10.4 ms, respectively. The tokens Bank/Bang display a VOT final difference of 53.7 ms. Speaker 4, the native control, displayed VOT final differences for Rad/Rat, Bad/Bat, and Bank/Bang of 17.1 ms, 7.8 ms, and 6.9 ms, respectively.
Discussion and Analysis
With very few exceptions, the data represented what I hypothesized. The threshold of perception, an approximately 20 ms time period, is the distinguishing audible cue in the physical capability of the human ear to perceive differences between sounds. Minimal pairs were utilized in this study, due to their ability to isolate differences between sounds. Minimal pairs differ in only one dimension; the minimal pairs in this study have one phoneme difference, specifically the final phoneme. The difference in VOT final measurements between minimal pairs indicates whether or not there exists a perceivable difference between the two consonants measured. A difference of 20 ms or less displays the absence of voicing, or neutralization of the distinction between voiced and voiceless obstruents in final position. In German syllable-final obstruents, when the threshold of perception is greater than 20 ms and the VOT final measurement falls closer to the 50-80 ms range, the human ear perceives the difference in length of VOT final obstruent duration, which is not present in German native speech, so English-like pronunciation is evident.
In Speaker 1’s data, token 1 falls within the expected 0-20 ms range of a voiced obstruent, which represents an English-like pronunciation not present in German. The minimal pair counterpart, token 2, falls within the 50-80 ms range, which is the expected VOT for both English and German. Token 3 and token 4, also minimal pairs, displayed similar data, with the exception that token 3 measured 29.3 ms, placing it above the expected English VOT time and below the expected German VOT. This suggests that the speaker’s interlanguage resembles neither purely English, nor German pronunciation, but a little in between. Token 5 and token 6 show a smaller VOT difference measuring below 20 ms showing no perceivable difference. However, both final position obstruents are voiced, which does not match the control data.
In Speaker 2’s data, tokens 1, 2, 3, and 4 fall within the range of 50-80 ms, which is expected in German pronunciation. With VOT differences of 19.5 and 20.9, respectively, these values lie on the mark of the threshold of perception, which, again, is 20 ms. This represents target-like data, mimicking the control data. Tokens 5 and 6, however, are not target-like. While token 5 falls within the expected 50-80 ms range, token 6 falls above the expected English value, and below the expected German value. Thus, token 6 is a representation that the speaker’s interlanguage resembles neither English, nor German in measurement.
In Speaker 3’s data, tokens 1, 2, 3, and 4, also fall within the range of 50-80 ms, which is the expected value in German pronunciation. Both sets of minimal pairs had VOT values below the threshold of perception, at 17.1 and 7.8, respectively. Token 5 and token 6, similar to Speaker 2, portray unexpected values. While token 5 falls within the expected value range, token 6 falls within the expected English value range. This can assumingly be attributed in part, to lack of knowledge of the word Bang. Speaker 3 later disclosed that she had not known the German word Bang, and thus, was unfamiliar with it and unsure on its pronunciation.
In Speaker 4’s data, the control, final devoicing is evident and clear. In the clustered bar graph above, minimal pairs are shown side by side and separated by color, in order to show the differences of VOT final times, with Speaker 1 being the first set, and the control data of Speaker 4 being the last set.
Auslautverhärtung is applied more frequently and consistently, according to the data collected, in native-like speakers, heritage speakers, and native speakers than in nonnative speakers. These data confirm my hypothesis. VOT final measurements are not exclusively indicative of native German speech, but they correlate with target-like German speech. The VOT final measurements enabled a quantitative collection of spoken data and a precise analysis of data.
A duplication of this study addressing variability within the speakers, tokens, and word lists would further the research conducted on the use of Auslautverhärtung. Regional dialects, such as those from Bavaria, Switzerland, and Austria, are occasionally mutually unintelligible, leading to a more diverse data set with which to study the regional applications of final devoicing. These research variables would further the understanding of the L2 German speaker’s acquisition of syllable-final devoicing of voiced obstruents.
Boersma, P., and D. Weenink. Praat. Computer software. Vers. 6.0.17. Softonic International, S.A., 2014. http://www.praat.org/.
“Education and Languages, Language Policy.” Council of Europe. Council of Europe, 2014. http://www.coe.int/t/dg4/linguistic/cadre1_en.asp.
König, Ekkehard, and Volker Gast. Understanding English-German Contrasts. 3rd ed., Erich Schmidt Verlag, 2012.
Acknowledgments: I would like to thank Sandra Mcgury for her suggestions, feedback, and encouragement received while working on this project.
Citation style: MLA