Although the numbers behind the name do not reflect it, the currently-named “SpeechLess” front end for MaryTTS is now being released as beta software. I was able to assemble a three man team to create a GUI and to my way of thinking, it has come along nicely. Although the demo is web-based, these guys have been able to construct it so the entire thing is local. That means little to no latency between hitting enter and having the text replicated to speech.
I’ve talked at length about how TTS in the Linuxsphere is less than user friendly at about every turn. Our goal is to create a front end that makes MaryTTS easy to use for everyone. We’re getting there.
The first thing to do is download the jar file, the current release, from the GitHub page. You will see the download cue box on the right. Yeah, I know…the numbering doesn’t reflect a beta release. We’ll fix that. Once you have it downloaded and extracted, change directories within your terminal and issue one of any of these commands:
java -jar SpeechLess.one-jar.jar
java -jar SpeechLess.one-jar.jar nimbus
java -jar SpeechLess.one-jar.jar metal
java -jar SpeechLess.one-jar.jar motif
java -jar SpeechLess.one-jar.jar gtk
I’m running KDE and if I choose to run the command with the first or the last of the above commands, the fonts are huge. However, if I execute any of the other commands, it’s fine. Also, you can right click the jar file once extracted and choose to open it with any of the Java apps currently on your computer. At this time, SpeechLess is built against Oracle Java 8, but it should work with any of the open source Java offerings as well. You can comment below if you’ve encountered any problems.
Once executed, the icon for the application is placed in your system tray. Clicking it either way allows you to “show SpeechLess” or quit. Upon opening the app, just type something in the input field at the bottom and hit enter. It will “speak” the text you typed in. At this time, if it runs across a word that isn’t recognized or if it’s a potty-mouth word, it will tell you in the upper right hand corner that the word isn’t recognized. I am suggesting today that if it does not recognize the word, you are presented an option to open a thesaurus to seek an alternative.
Rijk, if you see this, let’s talk about it.
The quality of the voices are acceptable, especially when measured against the majority of voices already available in Linux TTS applications. My current tool for using TTS is a paid subscription website at www.spokentext.net. The voices are amazingly clear, and while some of the vocal inflections are a bit quirky, overall it’s great. For a yearly C-note, I can’t complain. But two things strike me as not so great.
- It’s proprietary.
- It’s web-based.
With MaryTTS the whole thing is on your machine. so there isn’t any latency to speak of. It’s a great tool, but open source applications could be so much better.
That being said, let’s talk about a growing concern of mine: The mindset that an application or operating system should not be necessarily easy to use, that ease of use isn’t a prerequisite. The developer creates the app to meet his or her needs and calls it done. So what if you have to compile from source? So what if it is a command line tool only with no GUI?
I am still uncomfortable with this opinion. I will illustrate a common thread that runs through the meat of the argument below. This is the type of conversation I’ve had with a number of developers and other people who wanted to “help” in the last month. It does not reflect the opinions or beliefs of current project developers or others who assist.
“Ken, I think this app is ready for prime time.”
“No, Not yet. There is still work left to do in order to make it more intuitive for the new user.”
“I disagree. The current state is fine. There doesn’t have to be a GUI for everything. How are they going to learn if they are not challenged to do so?”
“This isn’t about learning. This is about having a tool that is available for everyone to use. Think about giving this to your mom or your aunt. Would you make her jump through hoops to learn the commands to open and use the application?”
“Well, sure. They don’t need to be spoon fed. All they have to do is call me if they have problems.”
Really?
First off, you are not going to be around every time she is going to need your help. Secondly, there is no reason for her or anyone to have to learn complex tools to use a simple application. That’s just laziness. Or worse yet…stick around.
Again, this isn’t about learning how to use the command line. This is about offering a tool to those who need it for day to day matters. Adding any layer of complexity to this tool’s use is not only counter productive, in some cases it’s down-right mean, passive-aggressive behavior taken straight from the textbooks.
Let me tell what I think, based on the emails and messages I’ve received.
Aunt Betty can now use her computer on her own, meaning you are no longer the Great and Powerful Oz; the man behind the curtain who is hiding the easy tools from Aunt Betty or anyone else requiring or needing your help; You, the person who seems to be almost magical in the way you can fix things on the computer, are afraid of losing your power.
The truth of the matter…Linux isn’t hard at all. I’ve got hundreds of 12-15 year-old kids to back that up.
Two of you have emailed me and said just about as much. You were looking to argue the point. If you want to argue your point, do so in the comments below this article. We can hash it out here.
That being said, let me present the first beta release of SpeechLess: A GUI that makes using MaryTTS much easier…a lot easier. And with your help, it will get even easier than it is now. Play with the controls…they are much too geeky at this time. Tell us how to make them better…more intuitive. How can we improve them? How can we remove the “geek” from the tool names? What can we do to make it easier to use, not only for Aunt Betty, but for everyone who needs a text to speech tool that talks nice to them?
Someone is going to directly benefit from your suggestion.
I promise.
Oh, and a bit of assistance here. Who can create a butler-type graphic character to represent the current application? The name “speechless” is only temporary. We’ll decide on a more permanent name once you show us a great servant for the people.
Help keep FOSS Force strong. If you like this article, become a subscriber.
Ken Starks is the founder of the Helios Project and Reglue, which for 20 years provided refurbished older computers running Linux to disadvantaged school kids, as well as providing digital help for senior citizens, in the Austin, Texas area. He was a columnist for FOSS Force from 2013-2016, and remains part of our family. Follow him on Twitter: @Reglue
Hi Ken. You are right, a program is not done until it is easy and intuitive to use! I never considered myself a wizard or a computer guru…just someone who knows more than the average person about computers (with 2 degrees in computer science). I have helped q fair number of friends and acquaintances with their computers, even convinced a few to use Linux. One of my goals is for those that I help to be able to do what they need and want to do without me having to help them on a daily or even weekly or monthly basis. One of the best ways to do this is via easy to use, intuitive software. The software is not done until it has an easy to use, intuitive GUI.
“All they have to do is call me if they have problems.”
That approach doesn’t scale. Not to hundreds of users. Not even to dozens.
There doesn’t appear to be a .jar file at the github link.
Tanja, my apologies. the Jar file is in the zip file on the right side middle of the page.
“Once you have it downloaded and extracted, change directories within your terminal and issue one of any of these commands:”
The zip file contains a copy of the git repository. The only jar files in there are voices:
$ cd speechless-master/
$ find . -name ‘*.jar’
./lib/voice-cmu-slt-hsmm-5.1.2.jar
./lib/voice-cmu-bdl-hsmm-5.1.jar
./lib/voice-dfki-poppy-hsmm-5.1.jar
./lib/voice-dfki-obadiah-hsmm-5.1.jar
./lib/voice-cmu-rms-hsmm-5.1.jar
Yeah, that’s a problem. Here’s the download link for the jar file I am working from. Rijk isn’t around at this time of the day (for me) so I will leave him a message. Thanks for the heads-up Tanja.
https://drive.google.com/file/d/0B2bdSLJMw_v1X1hnUnhtYTI4U0E/view?usp=sharing
I question the philosophy of not supporting so called potty-mouth words. Like it or not they are legitimately a part of our language. Artificially limiting what is apparently meant to be a globally available general use product based on some arbitrary sense of bible-belt prudishness seems like a dubious thing to do at best.
I downloaded and ran the application.
I liked it very much. It’s simple and very fast.
I needed java 8 to run it.
I also liked the effects for mary voices.
I’ve been developing a GUI for festival/flite/marytts.
Take a look here:
http://sourceforge.net/projects/o-milo/?source=directory
Maybe we can exchange views.
Have you tried flite voices?
You can test them online:
http://tts.speech.cs.cmu.edu:8083/
Flite is very fast and some voices are quite clear.
Thank you. I have family in from out of town and a deadline I am trying to make so let me take a look at this tomorrow and we can exchange ideas and notes.
@justniz You raise a very good point.
To my mind “Artificially limiting what is apparently meant to be a globally available general use product based on some arbitrary sense of bible-belt prudishness” seems to me to be no better than the thing Ken complains about.
Limiting the usefulness of the application by not providing a GUI that anyone can use.
For a lot of people, and especially those who do not subscribe to ‘bible-belt prudishness’, not being able to use ‘obscene’ language will not only limit their vocabulary, but is also insulting in ways that Ken should already be familiar with, given his new found inability speak.
@justniz You raise a very good point.
To my mind “Artificially limiting what is apparently meant to be a globally available general use product based on some arbitrary sense of bible-belt prudishness” seems to me to be no better than the thing Ken complains about.
Limiting the usefulness of the application by not providing a GUI that anyone can use.
For a lot of people, and especially those who do not subscribe to ‘bible-belt prudishness’, not being able to use ‘obscene’ language will not only limit their vocabulary, but is also insulting in ways similar to what Ken will no doubt already be familiar with, given his new found inability to speak.
There are two sides to this software and everyone in this discussion knows it. There is the actual Java app that makes it do your bidding and then there is the graphical user interface that controls the software to say what you want it to say. I am to a software developer as a coal miner is to a medical clean room, so I have no place mucking around a compiler. No offense to coal miners. Our lead guy is back this week so we are going to talk about this and see what he thinks and how this might be fixed, or as many of us believe, to be corrected. As grateful as I am for this software, I side with Tracyanne and justniz. The software is hobbled by this inability.
Understand, I spent a career in the United States Army and I’ve also been around the rodeo for much of my adult life. Those are probably the most cursinist bunch you can imagine
“Those are probably the most cursinist bunch you can imagine”
You obviously haven’t spent any time around 3 Shielas trying to build a viable farm in the middle of the Australian bush.;)
@Ken Starks
Take your time. If you need instructions on installing and running the application just mail me.
Nick, I will need some assistance but I am not seeing your email addres
s. I don’t have a github login so I could not message you that way. You can email me your address ken at reglue dot org