D ISCUSSION in Software Drawer QR Code JIS X 0510 in Software D ISCUSSION

D ISCUSSION using software toattach qr code jis x 0510 for web,windows application Basic Infromation about Micro QR Code At rst glance it QR for None may seem that predicting prosodic form from text is an impossible task. Two signi cant barriers stand in our way; rst, the text is greatly under speci ed for the type of information we require as textual encoding of prosody is more or less non-existent. Secondly, the uncertainty surrounding what prosodic form should be makes it hard to label data, train algorithms and basically even know if we are headed in the right direction.

. 6.10.1 Labelling schemes and labelling accuracy Taking the second qrcode for None issue rst we see that in every area of prosody there is considerable disagreement as to how to represent prosodic form. The problem isn t so much that researchers disagree. Section 6.10. Discussion completely; there is wide spread agreement about how to label many utterances; in many cases it is quite clear that a particular utterance has a strongly prominent syllable, has a clear phrase break and has a distinctive intonational tune. There are many black and white cases where all theories and all labellers agree. The problem is rather that there is a signi cant grey area, where we can t tell if this really sounds prominent, where we aren t sure that there is a phrase break between two words and where we can t really decide on the intonational tune for an utterance.

There are of course grey areas in other aspects of linguistics. As we saw in Section 4.2, pinning down the exact de nition of a word can be tricky, and we will come across similar dif culties in Section 7.

3.2 when we consider the de nition of the phoneme. But in general the agreement among researchers is much much higher with regard to these phenomena.

Furthermore, so long as we are aware of the dif culties, the de nitions of word and phoneme which we use in this book are good enough, meaning that, as engineers, using these models does not seem to result in significant loss of quality in synthesis. Studies that have examined inter-labeller reliability of prosodic schemes have shown the true extent of the problem; for instance in the original publication of the ToBI system, Silverman et al reported that the agreement on phrase breaks was only 69% for four labellers, and 86% for whether a syllable is prominent (bears a pitch accent in their paper). These gures might not seem too bad, but are about an order of magnitude worse than the equivalent results for verbal, phonetic transcription, and up to two or three orders of magnitude worse for the transcription used for text analysis in 5.

Furthermore, when we consider that the number of choices to be made in prosodic labelling is often small (say choose one break index value from 5) we see just how dif cult labellers nd this task. These results for ToBI are not particularly unusual, consistently similar gures have been reported for many other schemes [426], [488], [487]. Furthermore it is misleading to blame poor labelling agreement on the labellers not being expert enough - non-experts can readily label many of the types of data we require in TTS, and so in comparing the gures for other types of labelling we are in fact comparing like with like.

Why does these problems arise . Essentially we face two related issues;: which labelling scheme, model or theory to use; and how to assign the labels arising from this to a corpus of speech. Most researchers involved in both the scienti c investigation and engineering implementation are aware that there are a huge number of differing and often incompatible theories and models of prosody.

The temptation then is therefore to seek a theory-neutral , common-standard or engineering system which we can use to label the data and so avoid having to nail our colours to any one particular theoretical mast. We have to emphasise at this point that such an approach is folly; there is simply no such thing as a theory neutral scheme. If we take the break index scheme for instance, this explicitly states that there are a xed number of types of breaks, that no recursion or limited recursion exists, that phrasing is not explicitly linked to prominence patterns and so on.

Furthermore, there is no sense in which we can just label what we hear in the data; all labelling schemes are based on a model, the fundamentals of that model are implied when we label speech, and if our scheme is wrong, inaccurate or at fault in some other way, we run a high risk of enforcing inappropriate labels on our data. If we then use these labels as ground truth, we are running a severe risk of enforcing our training and run-time algorithms to make choices that.
Copyright © . All rights reserved.