Anyone who has read FOSS Force for the last couple of months knows that I lost my voice to cancer and that I’ve become personally involved in getting a decent text to speech (TTS) application developed. Some of you have reminded me that there is a good assortment of text to speech applications for Linux, especially in the mobile market, such as Android and the iExperience. Granted, for both examples, but we are needing an application that can either come preinstalled or be easily installed on almost any Linux distribution. That leads us back to the plentiful choices within the Linuxsphere you feel the need to mention. Yes, there are a lot of them, but when it all gets boiled down, they all share one simple trait.
None of them even approach usability for the everyday computer user. None. And you would think that of all these choices, one of them has to work…or provides documentation reasonable enough for everyone. You would think.
They are a hodge-podge assortment of half-finished, half-baked good intentions wrapped in the shiny label of “open source.” Isn’t that nice? Many of us rush “open source” to the fore, like it’s a magic phrase — a sparkling breath of glittery mist that coats the project like it was the long-sought solution to whatever problem it promises to fix. It’s nice to have that idea by which we can comfort ourselves. It’s a warm blanket of security, knowing that all of those tools are available to anyone, with all forms of “freedom” intact.
Except that it’s not true. When you peel back the shimmering wrapper of open source expectations, only to find a promissory note…well, that’s known as a rude awakening. The rude awakening that I received when I took all my expectations and beliefs to the Well of Open Source, only to find it with a broken crank and a cut rope.
It didn’t start out that way. The people who wrote these programs and applications wrote them to the best of their ability. They wrote them and left a promissory note, in hope that someone would be along shortly and complete the work that they did not know how to finish. And that’s the way it’s supposed to work. That, at least in theory, was the purpose behind places like SourceForge, back when SourceForge was a great idea with almost unlimited pools of talent spilling out open source products for the world to use. That’s the golden heart of open source. But somewhere the system got broken.
Applications like Festival started out to be promising, but the developer had to rely on others or other sources to provide the actual “voices” for the application. I needed to find out how to install a decent voice in Festival. “Oh…that can’t be that hard, now can it?”
This is the first thing I found when I searched for and discovered the answer:
Setting up Festival and using better voices——————————————————————–
Section 1: Get Festival up and running with basic voices
- Additional reading:
Here are some places I pulled notes from:
http://gentoo-wiki.com/HOWTO_speechd (speechd and festival notes – mbrola too)
#Someone elses notes on how to build your own festival voice based on recordings of your voice:
- You definately need to use Festival 1.96 or better, the older version sound very poor:
#Get these two packages to start:
Unpack these tars in the same parent directory, festival will unpack into a directory called “festival”, speech tools into “speech_tools”. Compile speech_tools first, then compile festival. Next unpack these other packages in the same parent directory (these get loaded into directory “festival”).
- Next are the voice packages:
- These packages help the voices to sound MUCH better:
- Now edit this file to use the new voices:
;And add this line:
;And then change the line like this to your new voice (notice the prepended “voice_” to the voice name):
(set! voice_default 'voice_nitech_us_clb_arctic_hts)
What we expected to be a cool pool of promise turned into a fetid dunk tank for the masses. Really? This is what a new Linux user will find when seeking a substitute voice? No one can honestly say that the default voices for any of these apps are ready for prime time. So this is how I get better voices into Festival? Tell me you are kidding…
If that wasn’t enough, it gets worse.
In the case of Festival and other TTS programs, the voices tend to be extremely robotic and harsh. Other programs, like Mbrola, can supply voices to improve those default voices. Each of the different developers upgraded/updated their voice apps without telling the synthesis folks that the directories for their files and voices had changed. So the few of us who stiffened our lips and dove into all those lines of code found the whole damned mess to be broken. And we found it useless. That refreshing pool of promise that represented open source was quickly discovered to be a stagnant pool where even mosquitoes wouldn’t nest.
Who is at fault? Who do we blame? This isn’t about blame; it is about solutions. Solutions to fix a broken system. Solutions that many of us have the talent to provide. But talent isn’t the only ingredient in this success. It needs people with the will to help create the solutions to these problems.
I began a quest to find people to help me build a front end for the TTS application, MaryTTS. It’s an open source Java app that is beautiful in production, but an absolute nightmare to get working on the everyday computer user’s machine.
Luckily, a good guy by the name of David walked me through the process of making it work on my computer. It’s not easy, trust me. You basically turn your computer into the server for the app. You can create a server off-site, but that creates some latency issues you might not find comfortable.
Neil Munro worked up a Chrome extension that provided basic TTS. Although he writes his efforts off as “nothing special,” it’s a step forward. It matches the other Chrome addon efforts, but stays simple instead of inundating users with an array of hacks and cosmetic options. I honestly don’t care about the text color; it’s not important. Just speak for me. That’s all I want. I just need the app to provide me a voice.
I promised not to mention their names, but a couple of guys, and maybe a third, are working on this GUI for MaryTTS. And when it comes down to it, it’s not really that fancy. You use your browser, or the browser GUI our development friends are building, to pull the whole thing together. I don’t have to rely on an online tool or importing voices that are all but impossible to incorporate. And while most ISPs are showing off some fairly admirable “up time” statistics, it’s good to have it all under one roof. Because when an outage does occur, it’s good to know that the tool you need isn’t dependent on an internet connection.
For me, this is an important step, and a step that wouldn’t be taken if I had not lost my voice to cancer. Had I not lost my voice, I couldn’t have cared less. But here I am and here’s where it gets personal.
I spend a lot of time in small groups. We all do. I spend time in groups of peers or groups of parents and kids that are helped by Reglue. And of course, within intimate groups of my family. Anyone who has studied the social group dynamic knows how important voice inflection and timing can be, as the conversations within take place. Let me give you a case on point. Someone I have begun an email friendship with is an extremely high functioning autistic. Actually, she is brilliant. She mentioned this dynamic in her own life and I smiled as I read how she deals with the varying voice inflections and timing and how difficult they can be to interpret as the waves of other conversations intermingle.
A few weeks ago, my oldest daughter and her family drove from Copperas Cove to spend the day. Some would remark that our living room space is, uh…cozy. That’s a polite word for “small.” Add two more people and we can move up to the term “cramped.” So everyone gets seated and the conversations begin. I have on my lap my Nexus 7 for the Android app “Speech Assistant” and my “handy-dandy-this-is-what-etch-a-sketch-has-evolved-into” Boogie Board.
It didn’t take long for me to realize that I was the proverbial salt on the bird’s tail, when it came to adding or commenting in the group. When I felt that I had something to contribute, the group would wait politely for me to write or type out what I wanted to say. That includes the times necessary to erase a mistake or redo something I had written by accident.
As Ron White would remind us, “Now that there was some awkward social presence, I’m here to tell ya.”
When I felt the need to interject something, by the time I had it written or typed in, the subject had moved in a completely new direction, and introducing the comment on my boogie board would have caused confusion, at the very least, not unlike watching a movie when the lips and the words are way out of sync. Had I not been in front of family, I would have excused myself and left the room, not to return.
I have begun experimenting with the electronic larynx for use at home and around friends. You know…the one I swore I would never, ever, use? According to my ENT surgeon, I have the toughest and the thickest neck tissue he has ever seen. Trying to find just the right pressure and placement for the head of the device can change from use to use. What bothers me most is that the electronic larynx is the gold standard, the go-to tool for communication after a laryngectomy. Unfortunately, my specific situation is not suitable for a TEP device, which is a surgically-implanted device that allows one to speak.
After a while, one begins to realize that (s)he has no real place in that specific social group. The inability to contribute in a timely manner, combined with the just plain awkwardness of the situation, can lead to one becoming a social deportee of sorts. I found myself being the “go’fer.” I was the one to refresh drinks and foodstuffs. I was the one to entertain my granddaughters when they got unruly or bored. In short, I was a stage prop for the play that was being acted out around me.
That’s an uncomfortable place to be, I don’t care how you try to analyse it.
But you know…in all of this I realized that the speechless are not the only ones in this situation of being a social deportee of sorts. While we are not castigated, we most certainly learn to adapt to our place in group…and we tend to avoid social situations in which we are expected to react.
One of my best friends is a brilliant guy and he’s an absolute scream to be around. Sometimes I laugh so hard that I slobber down the front of my shirt when I’m around him. His Ph.D isn’t a captured, glass-framed moment in time for him. It’s a reminder that no matter what we accomplish, we can always accomplish more.
While he isn’t speechless, he does suffer acute hearing loss. We talked about this recently, and for him, it’s the exact same thing: A physical condition that regulates his social activity. He doesn’t find being part of a social crowd comfortable, especially if they are strangers. Often he makes sure that his wife or a friend is close, so he doesn’t ask the person speaking to repeat themselves over and over. The person with him can act as a repeater of sorts. Unfortunately, his hearing issues have no medical or hardware solutions. Hopefully, someone with an open source mindset can, one day, fix this.
And that’s what it all comes down to. Being willing to donate your time and talent to help those who have no other alternative. Be they financial or geographically situational, many people don’t have the options they need to make their lives better. That’s where the real open source community can help.
I started out by offering a bounty for the MaryTTS GUI.
Everyone who has responded, politely told me to stuff my money in my, uh…ear. They said they would make a solution shortly…they just need to find the time to do it. From what I am understanding, this project is moving along at a decent rate. Hopefully, one of the greatest problems speechless people experience can be lessened or even wiped off of the table completely, at least in the Linuxsphere.
It just depends on how attuned we are to those needs. All of us. And I can understand the hesitancy to accept money. That contract can be perceived as a binding lever to control the rate, quality and gauge of your work. I understand that. But in return, I’ll demonstrate how important this is to me.
I have a bit of money put away. One day I would like to take my youngest daughter to the Chicago Museum of Science and Industry. My dad took me there when I was eleven years old and it was a mesmerizing experience, all the way down to walking through a WWII German submarine. It was pure sensory overload as I took it all in. The hair on my arms stood at rapt attention as the history of transportation display took my breath.
My daughter, who helps with Reglue stuff at times, told me that this need for TTS software was way more important than a trip to Chicago. She made me promise to use the money for this project. She even tried to donate a $100 bill to the effort, an offer I of course refused. I took the 100 dollar bill from her and put it in her shirt pocket, along with a kiss to her forehead. A $100 donation via PayPal showed up later in the day. I’m not at all sure the name of the donor was correct.
Yeah…that’s my girl.
So this isn’t about just me…or you, for that matter. This is about a young lady who works two jobs, selflessly offering the only value she has at her disposal. This is about a fourteen year old boy who had his tongue splayed as a warning to keep his mouth shut after witnessing a murder in the ghettos of Anfield in Liverpool. This is about people with problems we can help. People who have no idea that a loose-knit community of technologists might be able to give them at least a part of their life back.
This isn’t about your money. It’s about your time and talent and how you can spend that to help those who cannot help themselves. And if you need money for your efforts, I don’t begrudge you a dime…just keep in mind that we don’t have a lot of it.
The only real problem is in finding out where to go to help with a particular program or effort that matches our talents and skills. I’m a good place to start. There are four brilliant people now working on providing an easy to use GUI for the MaryTTS app, or other apps for that matter. That is indeed a start. Lets talk about your skills and ways you would like to help.
Ken Starks is the founder of the Helios Project and Reglue, which for 20 years provided refurbished older computers running Linux to disadvantaged school kids, as well as providing digital help for senior citizens, in the Austin, Texas area. He was a columnist for FOSS Force from 2013-2016, and remains part of our family. Follow him on Twitter: @Reglue