Teaching Linux to Speak

Free open source software. FOSS. The vision of one man…a vision tenacious enough to catch fire and spread around the globe. Free open source software is a staple of the enterprise for most of the world. We have one man to thank for that. Richard Stallman’s courage and foresight will be known and built upon long after we are gone. His contribution is truly his timeless, global legacy.

The only sad part of the story, at least so far, is that the United States has stubbornly dug in her heels. She has chosen to pay homage to the Microsofts and the Apples in our nation. We remain as one of the only nations in the world that openly shuns FOSS in the enterprise. We not only shun it, we work directly against it in the halls of our Senate. That’s due to one simple thing.

Money.

It is indeed the most persuasive force in the world. Some would argue that it is the most pervasive thing in our world as well. Look at the lobby efforts that exist just to keep FOSS tucked safely away. Away from the purview of American business. The US copyright lobby has boldly claimed that “Free software weakens the software industry.” Well yeah, ya think? It most certainly is disruptive. Whether tactics like this are effective or not…this isn’t the time or place for that argument. We can hash that out at a later date.

We, I mean all of us who use or count on FOSS, sit upon an an entire world of wealth. The amount of software we can use and/or alter for our own purpose or for the betterment of others is immense. I don’t know if many of us stop to give that any thought. The thought that we can change the world with the software literally under our fingertips. There is an important thing afoot, and I want you to help me make that important thing a reality.

Most who are steady patrons of FOSS Force know that I opted to have my larynx removed early this year when it became evident that the cancer in my throat was attempting a curtain call…a bid to return. Removal of my larynx quashed that nonsense, but the trade off was expensive. I’ve completely lost my ability to speak. I no longer have the ability to communicate verbally.

“Well Ken,” you would begin…”I’m glad you are gonna beat this thing but what about Reglue? How are you going to run that organization, being that you have to talk to dozens of people in a week’s time?”

I’m glad you asked, and just so you have early knowledge of this, I will be working on a weekly video blog, using the best text to speech tools I can find. And these tools are amazing. You don’t have to beat yourself half to death, trying to figure out how to add extra “voices” into the mix. You choose your voice and cadence, then you use the application via your keyboard. How easy can it get?

It’s one of the major ways I’ve communicated in public since my surgery. That’s the way I accepted our FSF award at Libre Planet and that’s the way I gave my talk in an MIT lecture hall during that event. And yes, that was a surreal moment. The moment I stood behind the podium and addressed people in an MIT lecture hall. I used said text to speech software to do it. “That’s fantastic Ken,” you might say. “Tell us about this great Linux app you use.”

Well see, ahem…that’s the rub.

Code:13ANNIP1: €400-€20, Code:13ANNIP2: €200-€10

It’s neither open source nor is it Linux or Linux-based. I paid $100 for a year’s subscription. It translates my typed words into speech and allows me to record them for later use. You can get a good sample of how “my voice” sounds here. With such a rich FOSS infrastructure, why would I choose to do use a proprietary solution? I’ll tell you.

Support for text to speech within the Linuxsphere is terrible. I’ve found nothing in open source that has the ease and flow of the online app I use. Nothing. The text to speech applications available within our realm are horribly under-written, and some appears to be abandoned.

When I began to prepare for my talk on the MIT campus, I made many assumptions. One of those assumptions was that I could choose from a number of text to speech apps for my purpose and the real problem would be in choosing which app was best. Hooo-leee cow, I could not have been any more incorrect.

The first thing I thought of was the Linux distro made for people without sight or with poor eyesight. I was guided to two distros, Sonar and Vinux. I burned Sonar and installed it on a laptop used for distro testing here at Reglue. I was informed that Sonar tweaked apps such as the Orca screen reader, and it worked out of the box. It’s based on GNOME, but I figured I was after the app not the distro. I’d make myself work around GNOME.

When the distro booted, I began to get this scratchy, static sound in my headset. It was uncomfortably loud and it reminded me of a record player cartridge scratching out some hip hop junk. I rebooted and tried again. This time I was able to discover the source of the horrible sound. It was the Orca software introducing itself and telling me how to go about using it.

How to go about using it? How about let’s go about making it painless and understandable? And forgoing that, tone it down so it doesn’t sound like glass shards being ground in a blender. I was finally able to turn the noise off and I begin to grasp the necessity, and ordeal, of adding “voices” so that the speech was understandable.

I could spend 30 minutes here and tell you about the rage-inducing knot that must be untangled in order to get “audio-legible” voices into Orca…or eSpeak for that matter. Here’s the “Reader’s Digest” version.

After ten minutes I wanted to run knitting needles through my eyes and set my head on fire, then put the fire out by repeatedly slamming my face into Pete Rose’s baseball cleats. If I had just done that in the first place, the pain would have been less by at least half.

And look…this isn’t on Jonathan Nadeau, creator of the Sonar distro. Not at all. Jonathan is, in my eyes, one of the bravest people in the FOSS world. Being completely sightless, he’s put together a pretty good Linux distribution, not only for people with low or no sight, but which also includes software to help dyslexic people as well. Jonathan took the best available open source tools and built Sonar around them. It’s not a case of a distro being less than helpful due to difficult software; it’s a case of using the only open source software that is available. Unfortunately, much of that available software is not good enough.

Let me repeat: Much of this software isn’t nearly good enough or easy enough for daily use. I am surprised that some of it doesn’t still sport the “beta” tag. How some of this stuff was allowed into distro repositories is beyond me.

I’m not throwing down a gauntlet or trying to start a flame war. I’m stating an opinion that is shared by almost everyone in my circle of disabled people. I have in my Google Plus circles dozens of people who, like me, deal with disabilities day after day. To a person, they believe as I do: Text to speech software in Linux is horrible. My focus is, and will remain, to create software or improve existing software in order to make it usable and accessible by anyone. There is good news, at least for people who rely on text to speech on a daily basis: That software exists today. Right now. We are soooo close to having a professionally built text to speech tool. We are so close…

Then Along Comes Mary…

When I came to realize how bad Linux text to speech software was I partnered with Google for a four day search binge. I was in search of any software or hardware that could give me a voice again. It was a lot of hit and miss. It was even more hit and “holy-friggin’-crap-are-you-kidding-me-$7000.00-for-a-type-and-talk-keybaord?”

Yeah, there was a lot of that. A Lot.

While I was flailing around the Interwebz, a friend sent me a link to a free open source Java app called MaryTTS. Yeah, Java. I had to learn how to script my own dead man’s switch (no, really, I do have one). Java is so far over my head, I get nosebleeds just reading the man pages.

In a public discussion about MaryTTS, I was introduced to David E. I will let David introduce himself should he desire to do so; I won’t divulge his name without asking first. He’s a humble and kind man, who would never say aloud just how crazy mad his skills are. Ever. So I will do it for him.

MaryTTS is an amazing app. In no time David was able to show me how to get it to work on my individual computer and not have to go online to use it. Trouble is…it’s a total mind-bash to someone who isn’t comfortable at the command line or to editing exotic text files. To the novice, or even some power users, getting MaryTTS to run on presents some daunting challenges. And that’s a shame because MaryTTS is more than likely the tool a speechless world is waiting for. Solid, dependable and free.

Now let’s add “easy of use.” Why? Because of this, an excerpt from the MaryTTS users digest I get daily:

Now, I’m trying to run the MaryClientUser.java example.

Compiling doesn’t work, the .class is not created.

I’ve spent a few hours over the last couple of days on this issue but with
no success.

The following note is mentioned in the example file, (p.s. I can’t find
‘maryclient.jar’):

/**

* A demo class illustrating how to use the MaryClient class.

* This will connect to a MARY server, version 4.x.

* It requires maryclient.jar from MARY 4.0.

* This works transparently with MARY servers in both http and socket server
mode.

* Compile this as follows:

* javac -cp maryclient.jar MaryClientUser.java

* And run as:

* java -cp .:maryclient.jar MaryClientUser

/**

This is far, far too common in the FOSS world. Absolutely killer software is created. I am talking about software that has no rival, no equal in the software world on any platform. Software that could change untold lives for the better. Then the author walks away from it with a broken install script or bin file that the everyday computer user can’t even begin to understand or know how to use. Should a user track him down and ask him a question, the reply, if answered at all, will simply be…

RTFM

I want to enter into dialog and negotiations with anyone who has the skill to create an easily usable GUI/front end for MaryTTS. It needs to be on the user’s computer in whole, not somewhere in the cloud. If that’s you, I want to know how much money you will accept to create this GUI and to maintain it for one year. While it’s not much, I will pledge $500.00 of my personal savings to prime the pump. I live on less than 10K a year; that’s how important this is to me. If my directors will agree, they will each donate whatever they can reasonably afford. I can guarantee you, the impact of your work will be global.

Now, about that elephant in the living room; let’s talk about her — TTS on Android/iPhone.

It does exist and it exists as anything from professionally-created software to weekend hack-attack enthusiast projects. I personally use an Andriod app named, ever so quaintly, Speech Assistant.

I use this app on my Nexus 7 tablet, but regardless of how good the app is, it succeeds or fails on the user’s ability to input text fast enough to keep up with the ambient conversation. So I’ve taken to practicing my “swyping” technique and my two thumb technique. Right now, I am practicing two well known typing exercises:

“Now is the time for all good men to come to the aid of their country.”

“I want to watch zebras that know how to jump and spin in the air.”

On my Model M desktop keyboard I timed both exercises. The first one times at ten seconds and the second at nine seconds. Compare that with 45 seconds and 51 seconds using stylus and swype keyboard on my Nexus. Yeah, I don’t know many people who will want to hang out with me when it takes that long to type fifteen words or so. I don’t see many meaningful conversations happening given those numbers.

Will getting a user-friendly front end to MaryTTS do anything to aid that? It most certainly will and I will be your personal 24/7 press agent. My bluetooth keyboard has a good feel to it and while it wouldn’t be much help for conversations on the hoof, from a stationary position, it could improve things a bunch.

So that’s where we are. If you are interested in talking to us more about getting this front end to a fantastic text to speech app written, do so here in the comments or you can email me ken (at) reglue dot org.

Some of the brightest people I know read FOSS Force. Let’s get this done.

Help keep FOSS Force strong. If you like this article, become a subscriber.

Ken Starks

Ken Starks is the founder of the Helios Project and Reglue, which for 20 years provided refurbished older computers running Linux to disadvantaged school kids, as well as providing digital help for senior citizens, in the Austin, Texas area. He was a columnist for FOSS Force from 2013-2016, and remains part of our family. Follow him on Twitter: @Reglue

linuxlock.blogspot.com/

14 Comments

Jeff Sadowski June 23, 2015

I’ll take a look at it. I can build somewhat of a gui. I know enough about java gui’s and compiling java that I might be able to help. Not as a full time developer but in some spare time after I put the kids to bed every night. While watching TV with my wife.
Carling June 23, 2015

Hi Ken, I can see another donation coming your way for this another worth while project that will help the disadvantaged. Keep up the good work.
ken June 23, 2015

Jeff, thank you. I am sure whatever you build can be passed along the dev chain. I appreciate this. Contact me off list if you like. ken at reglue(dot) org
John Morris June 23, 2015

Sounds like what you really need is an Android device that supports USB host mode and a USB OTG cable. Plug a spare Model M into a PS/2 to USB converter, plug that into the OTG cable and declare victory on the ‘home front.’ Or get a good USB keyboard (Unicomp sells them built on the original Lexmark tooling for the Model M but with USB and optional Windows keys.) and save some of the clutter. Should at least work until you get better options up and running.
Brian June 23, 2015

Ken, are you aware of the UbiDuo from sComm? While not a TTS device, it is one way to accomplish face to face communication, particularly for deaf persons like me. Check it out at http://www.scomm.com.
Brian June 23, 2015

A simple Tkinter GUI on the MaryTTS on a Raspberry Pi 2 ($35) ought to do the job. That will have decent performance and low hardware costs. Later you can think about adding a small display, battery and speakers to make a portable unit.

If you can’t find anybody to handle the work soon send me a note.
Ken Starks June 24, 2015

John, Brian…thank you both. The reason I key on MaryTTS as much as I do it the complete professionalism by which it is built. And yes, I do need to look into a decent portable keyboard. I have a Bluetooth that’s pretty decent but I can only seem to pair one device on my Nexus. I would normally pair my hang from a belt loop round bluetooth speaker then pair the keyboard. I’m sure every other bluetooth device I’ve messed with allows 2 pairings.

I did look into a few of the all in one keyboards that have some amazing features, but they are several grand and of course I can serve my country in two wars but they won’t buy me a frickin’ keyboard. You want paperwork? You should see.

But back to the real world, thank you guys. I’ll be toying and working with as many combinations my meager budget allows. Hopefully we can get this MaryTTS girl going. It is, and I kid you not, the best device out there. And it’s open source.
CFWhitman June 24, 2015

I think part of the problem with the existing software is that there are two applications for TTS software: people who are unable to speak and people who are unable to see. A lot of what exists it geared for people who are unable to see, and they have different needs than people who are unable to speak.

People who are unable to see want TTS software that can read large passages of text fast in a way you can learn to interpret correctly so that they can read faster.

People who are unable to speak want TTS software that can read single lines of text interactively and speak at a normal rate and be readily understood by the uninitiated.

I’m not saying that existing software is perfectly acceptable to the people who are unable to see, but it’s much better for them than it is for people who are unable to speak.
Mike June 24, 2015

@CFWhitman

An interesting observation.
Unbeknownst June 26, 2015

Two things…

1) Speed
——–

We’re gonna have trouble with interfaces. Maybe we’d need to extend sign language to some 2-D stenography-like (*) signs (and maybe use a soundboard for common phrases — maybe a kind of bash or Google completion could work here).

I don’t see ordinary input methods as speedy enough for a debate, for instance.

(*) not steganography, ok?

2) Parsing problems
——————-

I managed to make MaryTTS work in Portuguese (more or less) by tricking it with the Italian “Lucia” voice.

For example, Italians say “segue” as “Segway”, while Portuguese say it as “segay” (it means “follows”). That’s kinda easy to do because Italian and Portuguese have very similar phonemes (or maybe more exactly, “phones”). For the same reason, it’s advisable to write “ke” instead of the correct form “que” (“what”/”that”). In time, one learns to type in that way just like young’ums use shorthands like BTW, BRB, PLS…

French and Portuguese, for a contrast, would give a lot more trouble.

English is even harder: I see problems which words with different pronunciations — e.g. “laughter”.

I wouldn’t know how to fool MaryTTS to speak American “laughter” with a British voice. “Leicester” is close, but not really equal.
Mathias June 26, 2015

I have not tried MaryTTS, but you really should consider looking into Festival Speech Synthesis as well.

Try out the voices here:
http://www.cstr.ed.ac.uk/projects/festival/morevoices.html

I think they sound really good.
Unbeknowst June 26, 2015

@ Mathias
> Try out the voices here:

Sounds great. Maybe that will be very useful to English speakers.

Terms of use should be discussed with authors first, AFAIU.
kirk June 26, 2015

Just for temporary usage, you might look some extensions for the chrome browser. There are a few there, type what you’d like to say into a “message” body of an email, highlight, hit the text to speech button and voila. It might be far from ideal as you are stuck having to type “in” a browser window, but it would work in a pinch, you can use your PC keyboard, and it won’t cost a dime.
Ken Starks July 1, 2015

Kirk thank you. I will look at that here in a bit. I am out and about. A good friend has also made a text to speech extension for chrome. We will add this as well to our lists. Thanks a bunch and write more often…hope you are doing well.