Category Archives: Web Design

Accepting Speech Input in HTML5 Forms

The way that we interact with computers has changed dramatically during the last decade. Touch-screen devices and laptop trackpads have enabled a way more

Accepting Speech Input in HTML5 Forms

intuitive style of interaction than is achievable using a conventional mouse. These changes haven’t been limited to simply hardware. Gestures, predictive text, and speech recognition are all examples of software innovations that experience improved the best way wherein we interact with our devices.

Speech recognition has somewhat eluded innovators for many years. Many organizations have tried (with varying levels of success) to create reliable speech recognition technologies. There’s one company however that appears to have cracked the issue – Google.

In this post you’ll be learning the right way to make the most Google’s speech recognition technologies to reinforce your web forms. You’ll discover ways to give Chrome users the facility to fill-in text fields using speech, and the way to detect support for this new speech input capability in browsers.

Lets start.

Enabling Speech Input

Enabling support for speech input is so simple as adding an attribute on your <input> elements. The x-webkit-speech attribute will illustrate to the browser that the user must be given the choice to finish this manner field using speech input.

<input type="text" x-webkit-speech>

When speech input is enabled the element could have a small microphone icon displayed at the right of the input. Clicking in this icon will launch a small tooltip to point out that your voice is now being recorded. You may also start speech input by focussing the element and pressing Ctrl + Shift + . on Windows, or Command + Shift + . on Mac.

In JavaScript, you may test to work out if a component has speech input enabled by examining it’s webkitSpeech property. It is a boolean property and may therefore be set to true or false. You could override this property to enable or disable speech input on a factor.

// Enable
 element.webkitSpeech = true;

 // Disable
 element.webkitSpeech = false;

A Caveat About Input Types

Speech input isn’t available for the entire different HTML5 input types. In my testing i discovered that the text, number, and tel types do support speech input whereas the e-mail, url, date, and month input types don’t.

If you apply the x-webkit-speech attribute to an <input> element with an unsupported input type, the webkitSpeech property on that element will still be set to true. You therefore cannot depend on this property to inform if the browser is displaying the speech input controls, only that the browser supports speech input ordinarily.

Detecting Browser Support

A simple way of checking if the user’s browser supports speech input is to peer for the webkitSpeech property on an <input> element. An example of the way to do that is shown below.

if (document.createElement('input').webkitSpeech === undefined) {
     // Not supported
 } else {
     // Supported!
 }

Google Chrome is the best browser that currently supports speech input. We’ll examine the explanations for this within the next section.

How Speech Recognition Works

Speech-to-Text with a Web Service

Speech-to-Text with an online Service

The browser will depend on an external service to address speech-to-text conversion. The recording of your voice is distributed to this service which then analyses the audio and constructs a textual representation. The text is then sent back to the browser which populates the <input> element to finish the method. Many speech-to-text services incorporate machine-learning algorithms that let them to get more accurate over the years.

Note: an aspect effect of using an external service to address speech-to-text is that you’re going to need an online connection for speech input to work. This is often something to maintain in mind in case you plan to your web application to work offline.

The Chrome browser depends upon Google’s proprietary speech recognition technology to supply the functionality behind x-webkit-speech. Google has had a team engaged on speech recognition and natural language processing for a very long time. It’s this team that’s been accountable for developing the complex systems had to provide a competent speech-to-text service for products like Google Translate and Voice Search.

Note: If you’re involved in learning more about how speech-to-text works inspect the research papers published by Google engineers.

Developing speech-to-text services is extremely difficult and requires a major amount of investment. It’s probably the primary the reason for this is that no other browser vendor has implemented speech recognition yet. However, now that Apple has acquired Siri, I’m interested to work out if speech recognition will make it’s way into Safari a while soon.

Summary

In this post you’ve learned concerning the x-webkit-speech attribute and the way it could be used so as to add speech input capabilities on your web forms. There’s also a more advanced Web Speech API that we haven’t covered during this post. This API allows developers to feature speech recognition functionality to more aspects in their applications, or even synthesize speech from text.

Whether it’s within the computer in your desk, or the telephone on your pocket, software innovations like Google Voice Search and Siri are paving the way in which for a revolution in how we interact with computers. Welcome to the long run my friends, now if only someone could work out the entire teleportation thing.