Base64 verification?

I'm doing an email application(HapiJS) and found that some emails have their text encoded to base64, but others don't.

In this application I will need to receive emails from all services(Gmail, Hotmail,...) and I need to make a method to check if the text is in base64 or not, to only then forward to decoding or direct to the client.

I've searched a lot and so far I couldn't find anything that worked 100% as I need to, and as I am new to programming, I still do not have enough knowledge to figure out how to do it myself...

Code I'm using to try to check:

let base64 = /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/;

        let isBase64Valid = base64.test(mail.text); // base64Data is the base64 string

        if (isBase64Valid) {   
            // true if base64 formate
            console.log('base64');
        } else {
            // false if not in base64 formate
            console.log('String');
        }
Author: LeonardoEbert, 2017-08-29

2 answers

The strings in base64 have only the characters of a-z, A-Z, 0-9,'+','/' and ' = 'i.e. if there is any character other than these, such as a space' ', then this string is not in base64. This is the test that should be done by regex.

Try changing the Regex to this one here: ^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

To encode and decode use the functions btoa() to encode and atob() to encode decode. Following reference for functions: js base64 Encode and Decode

 0
Author: Alexandre Cavaloti, 2017-08-29 16:52:05

Using javascript the most correct approach to check if a given {String} has been (is) encoded in base64 on the front-end is to wrap in a block try\catch the return of the function atob() compared to the encoded return itself since, the VM of the browser javascript will already throw an exception in case of failure.

Some examples here from the StackOverflow community (Portuguese, English) say that the following approach is the most correct:

function isBase64(str) {
    try {
        return atob(str) ? true : false
    } catch(e) {
        return false
    }
}

However this approach it is incorrect since the following example would return a false-positive ":

isBase64('jgjhgj hg') // true

When in fact the return of the above example using atob() would be:

console.log(atob('jgjhgj hg')) // "á8`"

The most correct front-end approach

The correct would be to "encode" the "decoding" and compare the input there like this:

function isBase64(str) {
    try {
        return btoa(atob(str)) === str ? true : false
    } catch(e) {
        return false
    }
}

This refutes the cases of "false positives":

isBase64('jgjhgj hg') // false

In the backend (NodeJs)

Do not perform native functions in NodeJS like btoa() or atob() so it is very common to use third-party modules or the use of Buffer to get the same result.

It is important to note that not all third-party libraries report "exceptions" or make a comparison against the input and so it is easy to go through "false positives".

The following example uses Buffer to encode and decode in addition to check against input:

function atob(str) {
    return new Buffer(str, 'base64').toString('binary');
}

function btoa(str) {
    let buffer;
    if ( str instanceof Buffer ) {
        buffer = str
    } else {
        buffer = new Buffer(str.toString(), 'binary')
    }
    return buffer.toString('base64')
}

function isBase64(str) {
    try {
         return btoa(atob(str)) === str ? true : false
    } catch(ex) {
        false
    }
}

Testing it is possible to notice that it does not report "false positives":

console.log(isBase64('SGVsbG8gV29ybGQh')) // true

console.log(isBase64('jgjhgj hg')) // false

The use of RegExp (opinionated question)

If it is not possible to credit that the input (source) of {String} is indeed encoded (and therefore the need for verification) not always the use of RegExp should be understood as "the best option" the following example expresses this question:

function isBase64(str) {
    return /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/.test(str)
}

isBase64('SGVsbG8gV29ybGQh') // true

isBase64('jgjhgj hg') // false

isBase64("regexnaofunciona") // true

isBase64("hoje") // true

isBase64("errado+tanto+faz") // true

The above expression is failed for it validates any {String} with length of 4 or multiple of 4.

It is worth noting that if it is not possible to assert that the input {String} was indeed encoded in base64 there is no guarantee that the above RegExp does not validate it thus forming a "false-positive".

 1
Author: Lauro Moraes, 2017-12-18 09:11:36