Base64 verification?
I'm doing an email application(HapiJS) and found that some emails have their text encoded to base64, but others don't.
In this application I will need to receive emails from all services(Gmail, Hotmail,...) and I need to make a method to check if the text is in base64 or not, to only then forward to decoding or direct to the client.
I've searched a lot and so far I couldn't find anything that worked 100% as I need to, and as I am new to programming, I still do not have enough knowledge to figure out how to do it myself...
Code I'm using to try to check:
let base64 = /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/;
let isBase64Valid = base64.test(mail.text); // base64Data is the base64 string
if (isBase64Valid) {
// true if base64 formate
console.log('base64');
} else {
// false if not in base64 formate
console.log('String');
}
2 answers
The strings in base64 have only the characters of a-z, A-Z, 0-9,'+','/' and ' = 'i.e. if there is any character other than these, such as a space' ', then this string is not in base64. This is the test that should be done by regex.
Try changing the Regex to this one here: ^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$
To encode and decode use the functions btoa()
to encode and atob()
to encode decode. Following reference for functions:
js base64 Encode and Decode
Using javascript
the most correct approach to check if a given {String}
has been (is) encoded in base64 on the front-end is to wrap in a block try\catch
the return of the function atob()
compared to the encoded return itself since, the VM
of the browser javascript
will already throw an exception in case of failure.
Some examples here from the StackOverflow community (Portuguese, English) say that the following approach is the most correct:
function isBase64(str) {
try {
return atob(str) ? true : false
} catch(e) {
return false
}
}
However this approach it is incorrect since the following example would return a false-positive ":
isBase64('jgjhgj hg') // true
When in fact the return of the above example using atob()
would be:
console.log(atob('jgjhgj hg')) // "á8`"
The most correct front-end approach
The correct would be to "encode" the "decoding" and compare the input there like this:
function isBase64(str) {
try {
return btoa(atob(str)) === str ? true : false
} catch(e) {
return false
}
}
This refutes the cases of "false positives":
isBase64('jgjhgj hg') // false
In the backend (NodeJs)
Do not perform native functions in NodeJS like btoa()
or atob()
so it is very common to use third-party modules or the use of Buffer
to get the same result.
It is important to note that not all third-party libraries report "exceptions" or make a comparison against the input and so it is easy to go through "false positives".
The following example uses Buffer
to encode and decode in addition to check against input:
function atob(str) {
return new Buffer(str, 'base64').toString('binary');
}
function btoa(str) {
let buffer;
if ( str instanceof Buffer ) {
buffer = str
} else {
buffer = new Buffer(str.toString(), 'binary')
}
return buffer.toString('base64')
}
function isBase64(str) {
try {
return btoa(atob(str)) === str ? true : false
} catch(ex) {
false
}
}
Testing it is possible to notice that it does not report "false positives":
console.log(isBase64('SGVsbG8gV29ybGQh')) // true
console.log(isBase64('jgjhgj hg')) // false
The use of RegExp (opinionated question)
If it is not possible to credit that the input (source) of {String}
is indeed encoded (and therefore the need for verification) not always the use of RegExp
should be understood as "the best option" the following example expresses this question:
function isBase64(str) {
return /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/.test(str)
}
isBase64('SGVsbG8gV29ybGQh') // true
isBase64('jgjhgj hg') // false
isBase64("regexnaofunciona") // true
isBase64("hoje") // true
isBase64("errado+tanto+faz") // true
The above expression is failed for it validates any {String}
with length of 4 or multiple of 4.
It is worth noting that if it is not possible to assert that the input {String}
was indeed encoded in base64 there is no guarantee that the above RegExp
does not validate it thus forming a "false-positive".