You write to the narrator device by passing a narrator_rb i/o request to the device with cmd_write set in io_command, the number of bytes to be written set in io_Length and the address of the write buffer set in io_Data. VoiceIO->message.io_Command = CMD_WRITE; VoiceIO->message.io_Offset = 0; VoiceIO->message.io_Data = PhonBuffer; VoiceIO->message.io_Length = strlen(PhonBuffer); DoIO((struct IORequest *)VoiceIO); You can control several characteristics of the speech, as indicated in the narrator_rb struct shown in the device interface section. Generally, the narrator device attempts to speak in a non-regional dialect of American English. With pre-V37 versions of the device, the user could change only a few of the more basic aspects of the speaking voice such as pitch, male/female, speaking rate, etc. With the V37 and later versions of the narrator device, the user can now change many more aspects of the speaking voice. In addition, in the pre-V37 device, only mouth shape changes could be queried by the user. With the V37 device, the user can also receive start of word and start of syllable synchronization events. These events can be generated independently, giving the user much greater flexibility in synchronizing voice to animation or other effects. The following describes the fields of the narrator_rb structure: message.io_Data Points to a NULL-terminated ASCII phonetic input string. For backwards compatibility issues, the string may also be terminated with a "#" symbol. See the how to write phonetically for narrator section of this chapter for details. message.io_Length Length of the input string. The narrator device will parse the input string until either a NULL or a "#" is encountered, or until io_Length characters have been processed. rate The speaking rate in words/minute. Range is from 40 to 400 wpm. pitch The baseline pitch of the speaking voice. Range is 65 to 320 Hertz. mode The F0 (pitch) mode. ROBOTICF0 produces a monotone pitch, NATURALF0 produces a normal pitch contour, and MANUALF0 (new for V37 and later) gives the user more explicit control over the pitch contour by creative use of accent numbers. In MANUALF0 mode, a given accent number will have the same effect on the pitch regardless of its position in the sentence and its relation to other accented syllables. In NATURALF0 mode, accent numbers have a reduced effect towards the end of sentences (especially long ones). In addition, the proximity of other accented syllables, the number of syllables in the word, and the number of phrases and words in the sentence all affect the pitch contour. In MANUALF0 mode these things are ignored and it's up to the user to do the controlling. This has the advantage of being able to have the pitch be more expressive. The F0enthusiasm field will scale the effect. sex Controls the sex of the speaking voice (MALE or FEMALE). In actuality, only the formant targets are changed. The user must still change the pitch and speaking rate of the voice to get the correct sounding sex. See the include files for default pitch and rate settings. ch_masks Pointer to a set of audio allocation maps. See the "audio device" chapter for details. nm_masks Number of audio allocation maps. See the "audio device" chapter for details. volume Sets the volume of the speaking voice. Range 0 - 64. sampfreq The synthesizer is ``tuned" to a sampling frequency of 22,200 Hz. Changing sampfreq affects pitch and formant tunings and can be used to create unusual vocal effects. For V37 and later, it is recommended that F1, F2, and F3adj be used instead to achieve this effect. mouths If set to a non-zero value will direct the narrator device to generate mouth shape changes and send this data to the user in response to read requests. See the reading from the narrator device section for more details. chanmask Used internally by the narrator device. The user should not modify this field. numchan Used internally by the narrator device. The user should not modify this field. flags (V37) Used to specify V37 features of the device. Possible bit settings are: NDB_NEWIORB - I/O request block uses V37 features. NDB_WORDSYNC - Device should generate start of word sync events. NDB_SYLSYNC - Device should generate start of syllable sync events. These bit definitions and their corresponding field definitions (NDF_NEWIORB, NDF_WORDSYNC, and NDF_SYLSYNC) can be found in the include files. F0enthusiasm (V37) The value of this field controls the scaling of pitch (F0) excursions used on accented syllables and has the effect of making the narrator device sound more or less "enthusiastic" about what it is saying. It is calibrated in 1/32s with unity (32) being the default value. Higher values cause more F0 variation, lesser values cause less. This feature is most useful in manual F0 mode. F0perturb (V37) Non-zero values in this field cause varying amounts of random low-frequency modulation of the pitch (F0). In other words, the pitch shakes in much the same way as an elderly person's voice does. Range is 0 to 255. F1adj, F2adj, F3adj (V37) Changes the tuning of the formant frequencies. A formant is a major vocal tract resonance, and the frequencies of these formants move continuously as we speak. Traditionally, they have been given the abbreviations of F1, F2, F3... with F1 being the one lowest in frequency. Moving these formants away from their normal positions causes drastic changes in the sound of the voice and is a very powerful tool in the creation of character voices. This adjustment is in ±5% steps. Positive values raise the formant frequencies and vice versa. The default is zero. Use these adjustments instead of changing sampfreq. A1adj, A2adj, A3adj (V37) In a parallel formant synthesizer, the amplitudes of the formants need to be specified along with their frequencies. These fields bias the amplitudes computed by the narrator device. This is useful for creating different tonal balances (bass or treble), and listening to formants in isolation for educational purposes. The adjustments are calibrated directly in ±1db (decibel) steps. Using negative values will cause no problems; use of positive numbers can cause clipping. If you want to raise an amplitude, try cutting the others the same relative amount, then bring them all up equally until clipping is heard, then back them off. This should produce an optimum setting. This field has a +31 to -32 db range and the value -32db is equivalent to -infinity, shutting that formant off completely. articulate (V37) According to the popular theories of speech production, we move our articulators (jaw, tongue, lips, etc.) smoothly from one "target" position to the next. These articulatory targets correspond to acoustic targets specified by the narrator device for each phoneme. The device calculates the time it should take to get from one target to the next and this field allows you to intervene in that process. Values larger than the default will cause the transitions to be proportionately longer and vice versa. This field is calibrated in percent with 100 being the default. For example, a value of 50 will cause the transitions to take half the normal time, with the result being "sharper", more deliberate sounding speech (not necessarily more natural). A value of 200 will cause the transitions to be twice as long, slurring the speech. Zero is a special value in the narrator device will take special measures to create no transitions at all and each phoneme will simply be abutted to the next. centralize (V37) This field together with centphon can be used to create regional accent effects by modifying vowel sounds. centralize specifies the degree (in percent) to which vowel targets are "pulled" towards the targets of the vowel specified by centphon. The default value of 0% indicates that each vowel in the utterance retains its own target values. The maximum value of 100% indicates that each vowel's targets are replaced by the targets of the specified vowel. Intermediate values control the degree of interpolation between the utterance vowel's targets and the targets of the vowel specified by centphon. centphon (V37) Pointer to an ASCII string specifying the vowel whose targets are used in the interpolation specified by centralize. The vowels which can be specified are: IY, IH, EH, AE, AA, AH, AO, OW, UH, ER, UW. Specifying other than these will result in an error code being returned. AVbias, AFbias (V37) Controls the relative amplitudes of the voiced and unvoiced speech sounds. Voiced sounds are those made with the vocal cords vibrating, such as vowels and some consonants like y, r, w, and m. Unvoiced sounds are made without the vocal cords vibrating and use the sound of turbulent air, such as s, t, sh, and f. Some sounds are combinations of both such as z and v. AVbias and AFbias change the default amplitude of the voiced and unvoiced components of the sounds respectively. (AV stands for Amplitude of Voicing and AF stands for Amplitude of Frication). These fields are calibrated in ±1db steps and have the same range as the other amplitude biases, namely +31 to -32 db. Again, positive values may cause clipping. Negative values are the most useful. priority (V37) Task priority while speaking. When the narrator device begins to synthesize a sentence, the task priority remains unchanged while it is calculating acoustic parameters. However, when speech begins at the end of this process, the priority is bumped to 100 (the default value). If you wish, you may change this to anything you want. Higher values will tend to lock out most anything while speech is going on, and lower values may cause audible breaks in the speech output. The following example shows how to issue a write request to the narrator device. The first write is done with the default parameter settings. The second write is done after modifying the first and third formant loudness and using the centralization feature. The following example shows how to issue a write request to the narrator device. The first write is done with the default parameter settings. The second write is done after modifying the first and third formant loudness and using the centralization feature. speak_narrator.c