The Heart of Linux
The lead developer of a new text-to-speech app based on MaryTTS talks about what’s been done and what remains to do.
It’s been a while now since we talked about creating a front end GUI to the open source text-to-speech program, MaryTTS. I have a personal stake in this, as I lost my larynx, and thus my voice, due to throat cancer.
The state of text-to-speech software in the Linuxsphere is horrible. Don’t get me wrong, the software is out there and much of it is fairly good. Where we fall down is that it requires using the command line to get much of it to work. Often TTS software in Linux comes in parts and pieces that have to be assembled in order to get it to work, and the terminal is where most of that work needs to be done.
This is the case with MaryTTS. It’s well-maintained, superb software that would meet most any TTS needs, but when I first discovered it, there wasn’t a GUI available for it. After fooling around with it for a couple of days, I realized that without a decent interface, Mary would be fairly well ignored by those who need her the most. That’s when I decided I needed to approach the community about getting this fixed.
With perfect timing, Rijk Theodoor Oosterhoff introduced himself to me.
Oosterhoff is a native of the Netherlands with a command of the English language many native English speakers do not possess. After three other software engineers talked among themselves, Oosterhoff was decided to be the project lead. He has been steadily working on the project since.
I could give a blow-by-blow, a this-and-that account of how it’s gone since then, but I thought I would ask Oosterhoff to give us his account. I began by asking him about his background in programming and specifically in Java, since that is the language in which MaryTTS is written.
“Programming is just an amazing thing to do. Before I became a professional software engineer I was a project manager at the Ministry of Infrastructure and the Environment. We managed an automatic monitoring network and to keep up with modern measurement techniques we developed new instruments. That is to say, we let companies do all the fun stuff.
“My work only consisted of managing the contracts about research and so on. I was not up to this task and really hated the work. During weekends and in the evening, I spend much time on hobby projects involving websites and my personal groupware/linux server. In the end my boss gave me the opportunity to study Java and work temporarily at the Royal Netherlands Meteorological Institute (KNMI) as a junior software engineer. He reasoned that I would be better off chasing a career that would be more in line with my talents, something for which I am still extremely grateful today.
“I never looked back one second and really love my job every day. In programming I find something of myself. It takes all your intelligence, all your creativity. You make something out of nothing. You have to work very close together with other people who are much smarter than you are (well, in most cases). Altogether it is really the best job in the world. And when I come home I simply cannot stop programming. I therefore have several pet projects to work on when there is nothing else to do. That is not as often as I want to, as our house is from 1904 and constantly needs some fixing.
“My simple and humble pet projects are just for fun, but I play around with new frameworks. By fooling around with all the new stuff I can find, I find useful exercises that come in handy in my professional work as well. Besides this, I really believe in free software and want to give something back to this fine community. In my professional and personal life I try to use free software as often as possible. To paraphrase a popular statement: ‘Free software is eating the world.'”
“In programming I find something of myself. It takes all your intelligence, all your creativity. You make something out of nothing.”
It’s that “something out of nothing” part that leaves me dumbfounded. I mean, more dumbfounded than I usually am. I’ve never thought of programming as a creative thing. I’ve thought of it as math, but never once as a creative tool. Yet, now that I give it some thought, someone has to figure out those bullet and death ray vectors as to where they originate and to where they go. So I asked Oosterhoff about his thoughts and processes in deciding to help get this job done. He is as clear as he always is in answering my question.
“TypeTalk [the project’s temporary name which will be changed soon] came to me as a question I could not put down. When I first read your blog I thought that there would be dozens of people rushing in, but I kept it as a bookmark to see what would come of it. I cloned MaryTTS from GitHub and found that I was able to run it right from the start. Just one night fumbling around and I was able to fire up the engine and get some speech from my PC. But as the days went by I did not see anyone really jump to your help with an announcement that a new project had started, it seemed as this job was just waiting for me.
“I put together a little GUI for myself and wrote you an email. As I only work on this project when I find time for it, the project only progresses as fast as you have noticed, which is not that fast. But you probably don’t want to test a new version every day anyhow. For now I think we have come quite far. We have a website. (https://github.com/TypeTalk/TypeTalk) It’s very simple, but still there are some not to miss download buttons on it. We have a project GitHub account, and if anyone wants to get involved with this project, I can create for you some credentials to become a participant.
“We have packages in rpm and deb formats and those can be found under the ‘releases’ button under the brown bar. We have had a couple of releases and the GUI is a simple but effective front end. The last release is somehow that I intended to build in the first place. I wanted to create a GUI that would enable the user to keep his hands on the keyboard and whenever he needs to he can strike a specific combination to let the computer speak whatever he wants to say. All buttons have keybindings and using the tab key you can cycle through the GUI elements.
“The only thing I find disappointing is the fact that users keep having trouble installing the software. Do not think that I blame you or any user for this, not at all. This is not your fault, but I would like it to install every time without any error. It might have to do with the fact that I chose Java 8. For non-programming people, it is hard to understand this makes all the difference in the world. I just hope that in time this issue goes away.
“For me this is a valuable lesson, not to jump to a new version too soon. This month a new Ubuntu LTS is coming to us and hopefully everybody will switch to this and its derivatives. Java 8 will be standard in the repos with that release. But keep in mind, TypeTalk has worked for most testers using the right click on TypeTalk jar and the open with OpenJDK Java 7 and Jave 8 Runtime, as well as Oracle’s closed-source releases. It’s the .Deb and .RPM files that have been difficult.”
While Oosterhoff has done a huge amount of work, he acknowledges that there is still a long way to go before he’ll be truly satisfied with the way it is. The 1.3 release introduces a couple new voices and some other more subtle changes to how they work and how they can be manipulated in the voices configuration.
Here is a short list, in his words, of the things he thinks important to the ongoing development of TypeTalk:
- Auto-completion: Typing is slow compared to speech, so we need to find a way for people to type faster.
More control over sound: Maybe we can put in a pause or slow down when people put in a couple of spaces or a comma. This is extremely important in making the voices as realistic as possible. If you have ideas in this area, please share these as I could use some guidance as to where to start.
Better integration with the desktop: I fooled around with a systray icon, but this is not well supported in Java and many distros are moving away from this concept. I am thinking of getting better integration anyhow.
Speed startup time: Maybe I can start the server in the background and than you only have to start the frontend.
A real package manager: A human, I mean, that is able to get it in the Debian repos.
- A way of installing extra voices: I provide a couple of voices, but I found that some voices take 500MB to download. I cannot include this in the package, but if people are able to download and install new voices that would be nice.”
So there we are folks. This is the way software should be built, the way it should be created and the manner by which it can be improved. Some of the sharpest minds I’ve ever encountered will read this and it’s from this that we hope you choose to assist. There is no ego. There is no arbitrary choice of who’s in charge or will there be anyone demanding anything at all. There is just the code and the minds that will make it brilliant. for the people who will need it.
It will Be FOSS and it will belong to everyone.
Ken Starks is the founder of the Helios Project and Reglue, which for 20 years provided refurbished older computers running Linux to disadvantaged school kids, as well as providing digital help for senior citizens, in the Austin, Texas area. He was a columnist for FOSS Force from 2013-2016, and remains part of our family. Follow him on Twitter: @Reglue
Auto-completion] Two ideas come to my mind now: soundboards (there are a lot online, search flash-based soundboard) coupled http://www.jqueryscript.net/demo/3D-Interactive-SVG-Tag-Cloud-Plugin-With-jQuery-SVG-3D-Tag-Cloud/ which needs to be prepared previously with key terminology about the subject which will be discussed. Alternatively, it could have pre-searched terminology sets.
More control over sound] Usually we’ll need a parallel input to modify words as they are pronounced (e.g. a mouse to be used with the feet as pedals in an organ). Or maybe modifiers to words selected with different mouse buttons (e.g. right click to mean emphasis etc.)
Better integration with the desktop] I’m not I understand this, specially regarding the systray icon. I notice though that if the TTS application has an input area, the Linux default select and middleclick to paste would be very nice to speak of texts visible on the screen, even simplifying the ideas I cited.
Speed startup time] One day it will work on a smartphones and there apps can me made to run continuosly (for example, for normal conversations). For keynotes with a desktop startup time is not such a problem, me thinks.
A real package manager] AFAIK, one initially makes a deb package which can be download for early adopters. In time, when the TTS app is more fleshed people might ask for it to be included in the Debian repos (this is someone who never made a package talking, so take with a rock of salt). Even if it gets accepted into Debian, it probably will go to Sid at first (where alpha/unstable packages go…).
A way of installing extra voices] 500MB looks like a lot, but one can make small samples for people to download only the ones they like/need and more can be added by volunteers as times passes.
OK I guess it’s early days yet. But I can’t see any real difference between this and eSpeak, which also has a GUI. In fact by and large I find the eSpeak synthesis easier to understand.
The other thing, and ultimately, it’s quite amusing. I had designed a similar GUI for the text based TTSs, and scrapped it, on the premise that as I’m not in Ken’s position, I wasn’t really understanding the problem, and was missing something important… Oh well, never mind. However perhaps that’s why nobody else jumped in too.
Nothing I have tried comes close to Ivona on Android. (Ivona in free but not FOSS).