Validate an email in JavaScript that accepts all Latin characters

Question

How to validate an email that accepts all Latin characters?

  • by Latin characters I mean accented letters, ñ, ç, and all used by languages such as Spanish, Portuguese, Italian... Latin.

Context

  • the goal is to display an icon next to the text as the user types their email address.
  • I'm not interested in accepting all valid cases. It was a design decision cover only the most frequent mails. That is, letters (including accents and the like) and symbols ._%+-.
  • I can use code from other sources, as long as they are popular (ex: jQuery).

Code

document.getElementById('email').addEventListener('input', function() {
    campo = event.target;
    valido = document.getElementById('emailOK');
        
    emailRegex = /^[-\w.%+]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}$/i;
    //Se muestra un texto a modo de ejemplo, luego va a ser un icono
    if (emailRegex.test(campo.value)) {
      valido.innerText = "válido";
    } else {
      valido.innerText = "incorrecto";
    }
});
<p>
    Email:
    <input id="email">
    <span id="emailOK"></span>
</p>

Cases

I'm using the regex

/^[-\w.%+]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}$/i

Which works perfect in cases like

[email protected]
[email protected]

But fails with accents and other Latin letters

germá[email protected]
yo@mi-compañía.com
estaçã[email protected]
 96
Author: PAGANA, 2015-12-01

6 answers

With this regular expression you can validate any email address containing Unicode characters:

/^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i

If you test it in a JavaScript console:

> emailRegex.test("[email protected]");
< true
> emailRegex.test("germá[email protected]");
< true

Source


From there, and as you very well mentioned, an expression that fits more to your needs would be the following:

/^(?:[^<>()[\].,;:\s@"]+(\.[^<>()[\].,;:\s@"]+)*|"[^\n"]+")@(?:[^<>()[\].,;:\s@"]+\.)+[^<>()[\]\.,;:\s@"]{2,63}$/i
 93
Author: Hewbot, 2017-06-22 13:57:57

There are certain restrictions for emails but I can comment that they should regularly be based on these rules:

  • uppercase and lowercase letters of the English alphabet.
  • numbers from 0 to 9
  • can contain period but not at start or repeat.
  • you can use the characters:!#$%&'*+-/=?^_`{|}~

There are restrictions with certain email types for example if they contain:

  • Greek alphabet.
  • Cyrillic characters.
  • Japanese characters.
  • Latin alphabet with diacritics.

Examples not accepted as valid email addresses:

червь.ca®[email protected]

josé.patroñ[email protected]

See More:

Https://en.wikipedia.org/wiki/Email_address http://tools.ietf.org/html/rfc5322

I imagine an email with Cyrillic characters, even worse if you want to store that data in a database, what kind of SQL collation to use!

But well the question concerns how to validate that type of emails, this is a script that would help with the task:

function validarEmail(valor) {
  if (/^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i.test(valor)){
   alert("La dirección de email " + valor + " es correcta!.");
  } else {
   alert("La dirección de email es incorrecta!.");
  }
}

For example:

validarEmail("jorgé[email protected]");

The script would show you that the email address is correct.


  • update:

It is now possible to use international characters in domain names and email addresses .

Traditional email addresses are limited to English alphabet characters and some other special characters. The following are valid traditional email addresses:

  [email protected]                                (English, ASCII)
  [email protected]                            (English, ASCII)
  user+mailbox/[email protected]   (English, ASCII)
  !#$%&'*+-/=?^_`.{|}[email protected]               (English, ASCII)
  "Abc@def"@example.com                          (English, ASCII)
  "Fred Bloggs"@example.com                      (English, ASCII)
  "Joe.\\Blow"@example.com                       (English, ASCII)

International email, by contrast, uses Unicode characters encoded as UTF-8 , making it possible to encode the text of addresses in most writing systems in the world.

The following are all email addresses valid international:

  用户@例子.广告                   (Chinese, Unicode)
  अजय@डाटा.भारत                    (Hindi, Unicode)
  квіточка@пошта.укр             (Ukrainian, Unicode)
  θσερ@εχαμπλε.ψομ               (Greek, Unicode)
  Dörte@Sörensen.example.com     (German, Unicode)
  аджай@экзампл.рус              (Russian, Unicode)
 33
Author: Jorgesys, 2018-09-28 20:49:13

I have found an article here that talks about some different regular expression statements that can Verify Email addresses based on the RFC standard. There are many different recommended regular expression statements and there is no single all-in-one solution. But this regular expression is probably the one I would go with, adding accented characters to the list of valid characters as well.

\A[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z
 18
Author: SnareChops, 2015-12-01 21:19:21

How to validate an email that accepts all Latin characters?

The only 100% secure way to verify if an email is valid is by sending one. If the user typed the mail wrong, they will simply retry.

According to RFC 5322, [email protected] it's a "valid" email, but, will anyone receive it? Is there a server behind the domain that accepts emails? Those are the concerns you should have. Whatever you're doing, a list of distribution, registration, etc. must of send a confirmation email to validate it . The implementation will depend on the stack you use (C#, PHP, Java?) and did you have valid emails that someone receives.

You can implement something on the client side that at least says "this is an email address", but it shouldn't be your "validation" tool, it just tries to make the user realize that what he wrote is #($^%#$@^(#$^.with

 15
Author: Braiam, 2015-12-02 22:39:42

Simply point out that, according to the official specification , the regex representing an orthographically valid email address is as follows:

/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

I purposely put the term email address orthographically valid, because what defines an email address really valid is that it works, that is, that it exists and can receive emails.

Hence it follows that a verification by medium Javascript is not enough. You can help us do a validation [spelling] , provided Javascript is enabled on the client side.

If you want to verify that the email actually exists, there is no other way than by sending an email and having the recipient respond. That's what you can call with every property [real] validation of an email.

In fact, that's what all the services of serious subscription, they send us an email that we must verify to be permanently registered on their sites or in their distribution lists.

I allow myself to graphically show the steps to validate an e-mail. We will see that what is discussed here is just stage 2/5 of a validation process that would comprise 5 stages :

  • Stage 1: the user writes an email
  • Step 2 : Validation spelling of the email written by the user
  • Stage 3: check if the domain corresponding to the orthographically validated e-mail has an e-mail server
  • Stage 4: send a request (ping) or an email to verify that the server is accepting emails
  • Stage 5: the email was received successfully on esa address

Until we reach Stage 5, we cannot say that the email has been validated .

enter the description of the image here

If all the same the OP requests a validation method that accepts addresses with ñ and other characters not so far defined by the official specification of w3.org (link above), the regex mentioned in a previous answer works.

The following code is the same used in the question, but implementing on the one hand the official regex and regex that allows Latin characters such as ñ.

document.getElementById('email').addEventListener('input', function() {
    campo = event.target;
    valido = document.getElementById('emailOK');
        
  var reg = /^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;

 var regOficial = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

    //Se muestra un texto a modo de ejemplo, luego va a ser un icono
    if (reg.test(campo.value) && regOficial.test(campo.value)) {
      valido.innerText = "válido oficial y extraoficialmente";
    } else if (reg.test(campo.value)) {
      valido.innerText = "válido extraoficialmente";

    } else {
      valido.innerText = "incorrecto";

}
});
<p>
    Email:
    <input id="email">
    <span id="emailOK"></span>
</p>

Validation [spelling] in HTML5

HTML5 allows to declare our input of type email and takes care (in part) of the validation by US, as MDN says:

email: the attribute represents an email address. The line breaks are automatically removed from the value entered. Can enter an invalid email address, but the login field it will only work if the address satisfies the production ABNF 1*( atext / "." ) "@" ldh-str 1*( "." ldh-str ) where atext is defined in RFC 5322, section 3.2.3 and ldh-str is defined in RFC 1034, Section 3.5.

You can combine email with the attribute pattern:

pattern : a regular expression against which the value is evaluated. Pattern must match the full value, not just a part. Can be used the title attribute to describe the pattern as helping the user. This attribute applies when the type attribute is text, search, tel, url, email, or password, and otherwise it is ignored. The language of regular expression is the same as JavaScript's RegExp algorithm, with the 'u' parameter that allows treating the pattern as a sequence Unicode code. The pattern is not surrounded by diagonals.

The disadvantage is that not all clients are compatible with HTML5.

<form>
<input type="email" pattern='^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$' title="Entre un email válido"  placeholder="Entre su email">
<input type="submit" value="Submit">
</form>
 14
Author: A. Cedano, 2017-12-29 23:56:55

According to RFC 6531 it would have to support more characters than we are used to. But servers limit it with previous ones. I don't see a solution with a single range that involves entering "all Latin characters". Even though they seem to go together (as in this table from 0080 to 00ff ), there are others in between.

A possible regex for the Latin characters you might be interested in ( font) and adding the (suggestion):

/[A-Za-z\u0021-\u007F\u00C0-\u00D6\u00D8-\u00f6\u00f8-\u00ff]+/g

Could be joined with your regex, the ones you have already indicated above or one according to RFC 2822, like this, so that it does not exclude the ranges you are interested in (that there are many types of tildes) (source):

^([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22))*\x40([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d))*$
 11
Author: dayer, 2017-05-23 12:39:21