Get text in quotes with regex in JavaScript
Through a textarea I am capturing what the user enters. My goal is to capture from the textarea, anything that is between double quotes, or single quotes.
Note: no line breaks are assumed if the quotes have not been closed.
For example, if the textarea contains the following:
aaa 'bbb' ccc "ddd"
Then the regex should capture
bbb
ddd
The regex I'm using:
/((\".*?\")|(\'.*?\'))/g
Works fine for the case exposed, however it throws me an error in console Unterminated group when the textarea contains the following:
aaa 'rgba(255,255,255,'
What I need is for any string regardless of what it contains to be captured just as the strings in the first example were captured.
3 answers
Three ways to capture text in quotes (single or double)
1. Simple
To get the text in single or double quotes, we use 2 groups. After the match, only one of these 2 groups will have the searched text, and we will use only that value. Thus, you get only the text in quotes (not including quotes).
/"([^"]*)"|'([^']*)'/g
function obtenerTextoEnComillas() {
const regex = /"([^"]*)"|'([^']*)'/g,
texto = document.getElementById("ingreso").value;
var grupo,
resultado = [];
while ((grupo = regex.exec(texto)) !== null) {
//si coincide con comillas dobles, el contenido estará en el
// grupo[1], con el grupo[2] undefined, y viceversa
resultado.push(grupo[1] || grupo[2]);
}
//resultado es un array con todas las coincidencias
// mostramos los valores separados con saltos de línea
document.getElementById("resultado").innerText = resultado.join("\n");
}
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'bbb' ccc "ddd"a
aaa 'rgba(255,255,255,'
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>
2. All in one
We can get the text always searched within the same group (grupo[2]
).
At the end of the expression, we use \1
, which is a retroreference to Group 1 (or backreferences), to ensure that it ends with the same character that was captured at the beginning(the quotes used to open).
/(["'])(.*?)\1/g
function obtenerTextoEnComillas() {
const regex = /(["'])(.*?)\1/g,
texto = document.getElementById("ingreso").value;
var grupo,
resultado = [];
while ((grupo = regex.exec(texto)) !== null) {
//el grupo 1 contiene las comillas utilizadas
//el grupo 2 es el texto dentro de éstas
resultado.push(grupo[2]);
}
//resultado es un array con todas las coincidencias
// mostramos los valores separados con saltos de línea
document.getElementById("resultado").innerText = resultado.join("\n");
}
Texto:
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'rgba(255,255,255,'
"texto con comillas 'simples' incluidas" ... 'y "viceversa"'
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>
Or, allowing line breaks between quotes, replacing the dot with [\s\S]
:
/(["'])([\s\S]*?)\1/g
function obtenerTextoEnComillas() {
const regex = /(["'])([\s\S]*?)\1/g,
texto = document.getElementById("ingreso").value;
var grupo,
resultado = [];
while ((grupo = regex.exec(texto)) !== null) {
//el grupo 1 contiene las comillas utilizadas
//el grupo 2 es el texto dentro de éstas
resultado.push(grupo[2]);
}
//resultado es un array con todas las coincidencias
// mostramos los valores separados con saltos de línea
document.getElementById("resultado").innerText = resultado.join("\n\n");
}
Texto:
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'rgba(255,
255,255,'
"texto con comillas 'simples' incluidas" ... 'y "viceversa"'
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>
Also, many times you want to implement structures more elaborate than .*?
within quotes. This expression is hardly less efficient than the previous one, but many times more effective with more complex structures (such as the regex that will be seen later).
/(["'])([^"']*(?:(?!\1)["'][^"']*)*)\1/g
- we define the first group to match any of the two types of quotes
(["'])
- at the end of the expression, we use
\1
, like retroreference to Group 1 (the quotes used to open). -
In the middle, Group 2
([^"']*(?:(?!\1)["'][^"']*)*)
, which will contain the searched text. Matches:- any text without either of the two types of quotes
[^"']*
, followed (optionally) by - quotes not captured in Group 1
(?!\1)["']
, followed by more text allowed[^"']*
(?!
..)
is a negative forecast (or negative lookahead).
*In this structure we use a technique known as Unrolling the Loop, which follows the formatnormal* (?: especial normal* )*
. - any text without either of the two types of quotes
3. "With \" escapes\ ""
We can also consider escaped quotes with a slash \"
as valid (just like most languages).
In this case, we use the modifier /y
(sticky ), which forces the match start at the beginning of the text or at the end of the last match, and thus ensure that the quotes are balanced. *see compatibility
/[^'"\\]*(?:\\.[^'"\\]*)*(["'])([^"'\\]*(?:(?:(?!\1)["']|\\.)[^"'\\]*)*)\1/gy
Description:
/
[^'"\\]* # Texto antes de las comillas
(?: # Grupo sin capturar
\\.[^'"\\]* # Un \escape y más texto
)* # repetido 0 o más veces
(["']) # Comilla inicial (grupo 1)
( # Grupo 2: texto entre comillas
[^"'\\]* # Caracteres que no son comillas ni \
(?: # Grupo sin capturar
(?:(?!\1)["']|\\.) # Comillas que no son las usadas o un \escape
[^"'\\]* # Seguido de más caracteres permitidos
)* # repetido 0 o más veces (unrolling the loop)
) # fin del grupo 2
\1 # Cierre de comillas (\1 es el texto capturado en el grupo 1)
/gy # Modos: g (todas las coincidencias) y (sticky, anclado)
Code:
function obtenerTextoEnComillas() {
const regex = /[^'"\\]*(?:\\.[^'"\\]*)*(["'])([^"'\\]*(?:(?:(?!\1)["']|\\.)[^"'\\]*)*)\1/gy,
texto = document.getElementById("ingreso").value;
var grupo,
resultado = [];
while ((grupo = regex.exec(texto)) !== null) {
//el grupo 1 contiene las comillas utilizadas
//el grupo 2 es el texto dentro de éstas
resultado.push(grupo[2]);
}
//resultado es un array con todas las coincidencias
// mostramos los valores separados con saltos de línea
document.getElementById("resultado").innerText = resultado.join("\n");
}
Texto:
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'bbb' ccc "ddd"a
aaa 'rgba(255,255,255,'
acá "se \"permiten\" 'comillas' con escapes"
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>
Try doing it like this :
var str = 'aaa \'bbb\' ccc "ddd" aaa \'b"bb\' ccc "d\'dd"',
re = /"[^"]*"|'[^']*'/,
match;
while (match = re.exec(str)) {
console.log(match[0]);
str = str.replace(match[0], '');
}
Console output :
'bbb'
"ddd"
'b"bb'
"d'dd"
(?:'|")(.+)(?:'|")
This will capture you if there is any internal content between single or double quotes
Https://regex101.com/r/ZOLlyd/1
You can try it here with all your cases