Voice generation using text: A deep-learning method   By Aditya Abeysinghe Using text to generate speech similar to human voice is the main function of a text-to-speech (TTS) system. The process of converting text to speech is known as speech synthesis. Speech recorded is used to generate new speech, based on the input of the TTS. Since 1960s, several TTS systems have been developed for speech synthesis for current systems. However, these systems have several issues which led to the use of deep learning methods to synthesize speech. Current methods Two main methods exist for speech synthesis in traditional systems: concatenative and parametric. In concatenation-based synthesis the waveforms in the speech are concatenated to produce a speech stream. This type uses a waveform database to store and retrieve recorded speech. The speech appropriate for each text supplied is selected and joined to the stream to produce the final speech. In ...

Read More →