Some languages provide special classes in order to solve that problem e.g. StringTokenizer in Java. In JavaScript we do not have specialised methods or classes. However, we have very powerful mechanism which is regular expressions.

Let's take a string as follows " Splitting String Into Tokens in JavaScript " and we will try to extract only word tokens from it Splitting, String, Into etc. Note that the given string has leading and trailing spaces. Also some words are delimited by more than just a single space.

Let's use string.split() function and pass a single-space string as a separator string.

var text = " Splitting String   Into  Tokens in JavaScript ";
console.log(text.split(' '));

It will give something like:

[ '', 'Splitting', 'String', '', '', 'Into', '', 'Tokens', 'in', 'JavaScript', '' ]

So this is not exactly what we expected. There are too many empty tokens which we need to get rid of. One of the solutions could be to remove empty strings (tokens) from the output array using Array.prototype.splice function. That, however, is not a great approach as it would require too much of extra code (additional testing etc.). We need something which works straight during text conversion to tokens.

Let's try following regular expression:

// we use the same text variable as defined above
console.log(text.split(/\s+/));

It will give output like:

[ '', 'Splitting', 'String', 'Into', 'Tokens', 'in', 'JavaScript', '' ]

So we have managed to get rid of empty tokens in the middle of the output array. Now we need to get rid of empty tokens at the begging and the end of the array. Let's try following regular expression:

// we use the same text variable as defined above
console.log(text.split(/\b\s+/));

It will give output like:

[ ' Splitting', 'String', 'Into', 'Tokens', 'in', 'JavaScript', '' ]

We have managed to get rid or the first empty token but we still have an empty token at the end of the array. Let's modify our regular expression and split the string only if it is followed by non-whitespace characters.

// we use the same text variable as defined above
console.log(text.split(/\b\s+(?!$)/));

It will give output like:

[ ' Splitting', 'String', 'Into', 'Tokens', 'in', 'JavaScript ' ]

Now all looks OK-ish. The first and the last token contains some spaces though. Let's change JavaScript method to do the tokenisation work a bit better.

We will use String.prototype.match function. The match() method retrieves the matches when matching a string against a regular expression. So we need to create regular expression which will match all our words but whitespace characters.

JavaScript regular expression has a special character \S which matches a single character other than whitespace (non whitespace).

So let's check whether match function will help us. Our code snippet is as follows. Note that we used g (global) option in order to match all tokens in the given string.

// we use the same text variable as defined above
console.log(text.match(/\S+/g));

And the output is:

[ 'Splitting', 'String', 'Into', 'Tokens', 'in', 'JavaScript' ]

Now the output is exactly what we expected it to be i.e. there is no empty tokens and no spaces in the extracted tokens.

Note that by providing different regular expression matcher we can extract different tokens. In the example below we will extract all digit tokens. I've found 4 ducks on 11th street.

"I've found 4 ducks on 11th street.".match(/\d+/g)

That will give something like:

[ '4', '11' ]

Enjoy!