In my previous post I shared how to tokenize a string in JavaScript.

In this post, I am sharing how to split a string into tokens which are of N characters in length. That might be very handy when it comes to number formatting e.g.

  • Separate a number in decimal format with thousand delimiter, from 1233232 to 1,233,232
  • Separate a number in binary format every fourth digit (bit), from 1011010010101001 to 1011 0100 1010 1001
  • Separate a number in hexadecimal format every second character, from 0xfab22883b0ada0 to 0xfa b2 28 83 b0 ad a0

For that purpose we can use the following regular expression .{1,X}(?=(.{X})+(?!.))|.{1,X}$ and string.match() function, where X indicates number of characters in a single token.

Some examples

Let's split a number in a hexadecimal format into 2-character long tokens. First token can have 1 or 2 characters.

console.log("fab22883b0ada0".match(/.{1,2}(?=(.{2})+(?!.))|.{1,2}$/g));
console.log("ab22883b0ada0".match(/.{1,2}(?=(.{2})+(?!.))|.{1,2}$/g));

That would give like:

[ 'fa', 'b2', '28', '83', 'b0', 'ad', 'a0' ]
[ 'a', 'b2', '28', '83', 'b0', 'ad', 'a0' ]

Let's split a number in a binary format format into 4-character long tokens. First token can have 1 to 4 characters.

console.log("1110101010101101".match(/.{1,4}(?=(.{4})+(?!.))|.{1,4}$/g));
console.log("10101010101101".match(/.{1,4}(?=(.{4})+(?!.))|.{1,4}$/g));

That would give like:

[ '1110', '1010', '1010', '1101' ]
[ '10', '1010', '1010', '1101' ]

Alternatively you can build a regular expression using RegExp JavaScript class.

function split (input, len) {
    return input.match(new RegExp('.{1,'+len+'}(?=(.{'+len+'})+(?!.))|.{1,'+len+'}$', 'g'))
}
console.log(split('11010101101', 4));
console.log(split('ab22883b0ada0', 2));

That would give like:

[ '110', '1010', '1101' ]
[ 'a', 'b2', '28', '83', 'b0', 'ad', 'a0' ]