Would you like to try something using Web Speech API?! How about scrolling a web page using voice commands?
Before we start that here’s a brief introduction to Web Speech API.
What is the Web Speech API?
Web Speech API allows you to add voice capabilities into your web app. It has two components: SpeechSynthesis
and SpeechRecognition
.
The SpeechSynthesis
is used to convert text into speech and SpeechRecognition
is used to convert speech into text.
For the purpose of this demonstration we’ll be focusing only on the SpeechRecognition
part of the Web Speech API.
Speech recognition works, in simpler terms, by matching the sounds we produce when we speak (phonemes) with written words.
Let’s start
In the following sections we will start to code a simple HTML page with a long scrollable text and the JavaScript code need to make this page scroll via speech commands.
HTML
Create a simple page with lot’s of text to scroll through.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Page Scroller using Web Speech API</title>
<meta name="description" content="Page Scroller using Web Speech API">
<meta name="author" content="Cloudoki">
<!-- Mobile Specific Metas
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- CSS
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<link rel="stylesheet" href="styles/custom.css">
<!-- Favicon
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<link rel="icon" type="image/png" href="images/icon.png">
</head>
<body>
<div class="container">
<h1>Page Scroller using Web Speech API</h1>
<h6>click the scroller button then say:</h6>
<h6>"scroll + [up, down, top, bottom]"</h6>
<h6>click again to stop</h6>
<div class="lipsum">
[LARGE TEXT HERE]
</div>
<button class="scroller">SCROLLER</button>
</div>
<script src="scripts/index.js"></script>
</body>
</html>
JAVASCRIPT
In our script, let’s first see if the browser supports the Web Speech API.
try {
var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition || null;
}
catch(err) {
console.error('Starting Web Speech API Error:', err.message);
var SpeechRecognition = null;
}
function init () {
// initialize speechRecognition if supported
if (SpeechRecognition === null) {
alert('Web Speech API is not supported.');
} else {
console.log('Web Speech API is supported.');
}
}
window.addEventListener('load', function() {
init();
}, false);
If the SpeechRecognition
variable is null
then the browser does not support the Web Speech API. Otherwise, we can use it to transcribe our speech.
Note:
At the time of writing this article the browsers that supported Web Speech API were limited. We recommend using Google Chrome for this.
Starting the speech recognition
To start the recogniser, we initialise it by creating a new SpeechRecognition
object instance var recognizer = new SpeechRecognition();
and set its properties.
SpeechRecognition.continuous
Controls whether continuous results are returned for each recognition, or only a single result. Defaults to single (false)
SpeechRecognition.interimResults
Controls whether interim results should be returned (true) or not (false). Meaning we get results that are not the final result.
SpeechRecognition.lang
Returns and sets the language of the current SpeechRecognition. If not set, it uses the HTML lang attribute value or the user agent language. It’s a good practice to set this value.
SpeechRecognition.maxAlternatives
Sets the maximum number of alternatives provided per result.
if (recognizer.continuous) {
recognizer.continuous = true;
}
recognizer.interimResults = true; // we want partial result
recognizer.lang = 'en-US'; // set language
recognizer.maxAlternatives = 2; // number of alternatives for the recognized speech
When we click the scroller button the recogniser should start by using the recognizer.start()
and by clicking it again it should stop using the method recognizer.stop()
. We’ll also use a state variable to monitor if the recogniser is listening and has really started.
var scrollerClass = '.scroller'
// try to get SpeechRecognition
try {
var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition || null;
}
catch(err) {
console.error('Starting Web Speech API Error:', err.message);
var SpeechRecognition = null;
}
/**
* Initialize the Speech Recognition functions
*/
function startSpeechRecognier () {
// state used to to start and stop the detection
var state = {
'listening': false,
'started': false,
};
var scroller = document.querySelector(scrollerClass); // button to start and stop the recognizer
var recognizer = new SpeechRecognition();
// set recognizer to be continuous
if (recognizer.continuous) {
recognizer.continuous = true;
}
recognizer.interimResults = true; // we want partial result
recognizer.lang = 'en-US'; // set language
recognizer.maxAlternatives = 2; // number of alternatives for the recognized speech
recognizer.onstart = function () {
// listening started
state.started = true;
scroller.innerHTML = 'listening';
console.log('onstart');
};
scroller.onclick = function () {
if (state.listening === false) {
try {
state.listening = true;
// start recognizer
recognizer.start();
console.log('start clicked');
// if after 3 seconds it doesn't start stop and show message to user
setTimeout(function () {
if(!state.started && state.listening) {
scroller.click();
alert('Web Speech API seems to not be working. Check if you gave permission to access the microphone or try with another browser.');
}
}, 3000)
} catch(ex) {
console.log('Recognition error: ' + ex.message);
alert('Failed to start recognizer.');
}
} else {
state.listening = false;
state.started = false;
// stop recognizer
recognizer.stop();
scroller.innerHTML = 'scroller';
console.log('stop clicked');
}
}
}
function init () {
// initialize speechRecognition if supported
if (SpeechRecognition === null) {
alert('Web Speech API is not supported.');
} else {
startSpeechRecognier();
console.log('initialized...');
}
}
window.addEventListener('load', function() {
init();
}, false);
Getting and handling the results
After starting the recogniser there are several events that will occur that we can use to get the results or information (like the event presented in the code above recognizer.onstart
that is triggered when the service starts to listen to voice inputs). The method we are going to use to get the results is the SpeechRecognition.onresult
, fired every time we get a successful result.
So inside the startSpeechRecognizer
function we’ll add:
recognizer.onresult = function (event) {
// got results
// the event holds the results
if (typeof(event.results) === 'undefined') {
// something went wrong...
recognizer.stop();
return;
}
for (var i = event.resultIndex; i < event.results.length; ++i) {
if(event.results[i].isFinal) {
// get all the final detected text into an array
var results = [];
for(var j = 0; j < event.results[i].length; ++j) {
// how confidente (between 0 and 1) is the service that the transcription is correct
var confidence = event.results[i][j].confidence.toFixed(4);
// the resuting transcription
var transcript = event.results[i][j].transcript;
results.push({ 'confidence': confidence, 'text': transcript });
}
console.log('Final results:', results);
} else {
// got partial result
console.log('Partial:', event.results[i][0].transcript, event.results[i].length);
}
}
};
The event parameter contains the array (SpeechRecognitionResultList) of results of the service. That array may contain the property isFinal
, this means that we’ll be able to know if it’s a final result or a partial one and it contains another array of objects with the properties confidence
and transcript
. The confidence values (between 0 and 1) allow us to know how much the service is sure that the speech matches the transcription we received. The transcript is the text that the service generates based on what it understands from our voice input.
We’ll also use the events onend
, onspeechend
and onerror
to determine when the service stopped so we can to start to listen again,when the service detected that our speech stopped so we know it’s not listening to new sentences and if an error occurred respectively.
recognizer.onend = function () {
// listening ended
console.log('onend');
if (state.listening) {
recognizer.start();
}
};
recognizer.onerror = function (error) {
// an error occurred
console.log('onerror:', error);
};
recognizer.onspeechend = function () {
// stopped detecting speech
console.log('Speech has stopped being detected');
scroller.innerHTML = 'wait';
};
Scrolling based on the transcript
In order to scroll we’ll need to add the following code to the onresult
event:
// scroll according to detected command
var scroll = sortByConfidence(results).shift();
console.log('Final results:', results, scroll);
autoScroll(scroll);
This will sort the results by confidence values, get the best one and send it to be executed if it’s a command matching our trigger words. Adding the next bit of code will achieve the behaviour explained.
/**
* Returns an list ordered by confidence values, descending order.
* @param {array} list - A list of objects containing the confidence and transcript values.
* @return array - Ordered list
*/
function sortByConfidence(list) {
list.sort(function(a, b) {
return a.confidence - b.confidence;
}).reverse();
var sortedResult = list.map(function(obj) {
return obj.text;
});
return sortedResult;
}
/**
* Execute the command if it matches the inputed.
*
* @param {String} speech The command to evaluate
*/
function autoScroll (speech) {
var body = document.body,
html = document.documentElement;
var pageHeight = Math.max(body.scrollHeight, body.offsetHeight,
html.clientHeight, html.scrollHeight, html.offsetHeight);
var currentHeight = Math.max(body.scrollTop, html.scrollTop, window.pageYOffset)
if (typeof speech === 'string' || speech instanceof String) {
if (speech.indexOf('up') > -1) {
console.log('Scrolling up...')
window.scrollTo({
top: currentHeight - 250,
behavior: 'smooth'
})
} else if (speech.indexOf('down') > -1) {
console.log('Scrolling down...')
window.scrollTo({
top: currentHeight + 250,
behavior: 'smooth'
})
} else if (speech.indexOf('top') > -1) {
console.log('Scrolling top...')
window.scrollTo({
top: 0,
behavior: 'smooth'
})
} else if (speech.indexOf('bottom') > -1) {
console.log('Scrolling bottom...')
window.scrollTo({
top: pageHeight,
behavior: 'smooth'
})
}
}
}
And with this we should now be able to scroll up, down, to the top or to the bottom of our web page just be aware that the transcriptions are not always a perfect match to what is being said but it can help out with the accessibility in your page. This as you can see, can be easily adapted to follow more instructions and create your own simple web assistant.
You can check the full demo code here.
Related Articles
Reflecting on 2024: A Year of Growth, Innovation, and Milestones
Reflect on 2024 with Cyrex Enterprise! Discover our achievements in software development, ...
Read moreDeploying NestJS Microservices to AWS ECS with Pulumi IaC
Let’s see how we can deploy NestJS microservices with multiple environments to ECS using...
Read moreWhat is CI/CD? A Guide to Continuous Integration & Continuous Delivery
Learn how CI/CD can improve code quality, enhance collaboration, and accelerate time-to-ma...
Read moreBuild a Powerful Q&A Bot with LLama3, LangChain & Supabase (Local Setup)
Harness the power of LLama3 with LangChain & Supabase to create a smart Q&A bot. This guid...
Read more