https://www.selleckchem.com/pr....oducts/lomeguatrib.h
Attention based end-to-end speech synthesis achieves better performance in both prosody and quality compared to the conventional "front-end"-"back-end" structure. But training such end-to-end framework is usually time-consuming because of the use of recurrent neural networks. To enable parallel calculation and long-range dependency modeling, a solely self-attention based framework named Transformer is proposed recently in the end-to-end family. However, it lacks position information in sequential modeling, so that the extra position