A regular expression to detect Brazilian Highway acronyms
I am trying to detect if a given address corresponds to a Brazilian Highway.
For example, br-101
matches.
My initial plan was to list the state acronyms (mg, sp, rn ...) plus the acronym br, and write something like /sigla1-[0-9]{3}|sigla2-[0-9]{3}.../
.
But a query to wikipedia gave me a surprise: there are other prefixes besides states. (for example, prc
, in https://pt.wikipedia.org/wiki/Rodovias_do_Paran%C3%A1 )
I ask, so: what is the most correct way to detect highways?
We can take (duas_ou_tres_letras)-(tres_numeros)
, for example. Does the part before the hyphen necessarily have two or three letters? can the part after hyphem have less than three numbers?
Would anyone happen to have a list of the possible acronyms that may come before hyphen?
2 answers
I found the question interesting and tried to inform myself how the nomenclature of Brazilian highways works.
According to the government's website federal highways there is a standard for defining the names of federal highways. And by the I researched I could notice that this standard is also adopted on state highways, but there are exceptions.
The first number of the name of the highway, for example, BR-307 It has meaning and ranges from 0 to 6. And also applies the state highways.
- radial Highways: BR-0xx-highways that depart from the federal capital towards the ends of the country
- longitudinal Highways: BR-1XX-highways that cut the country in north-south direction
- cross roads: BR-2xx-highways that cut the country in the direction East-West
- diagonal Highways: BR-3XX-highways can have two modes of orientation: northwest-southeast or northeast-southwest
- connecting highways: BR-4XX - highways present themselves in any direction. There are also highways started with BR-6XX, but there are few and short extension.
It would be interesting to confirm this information so that the regex is more accurate, for example:
- we know that the first information is capital letters and varies
2 to 3 letters:
[A-Z]{2,3}
- there is a hyphen between the letters and numbers:
-
- the first number ranges from 0 to 6:
[0-6]
- and ends with two more digits:
[0-9]{2}
Finally your regex would look like this: [A-Z]{2,3}-[0-6][0-9]{2}
.
functional example
You can assemble two regexs one more generic to validate only the format of the highway and another more specialized that guarantees with greater chances its existence.
By the research I did some highways receives a C
after the acronym of the state because they are coincident or a stretch of a federal highway is in the same stretch of a state and it is the responsibility of the state to maintain conservation but I did not find any centralized list each is maintains its own list.
Not all states have coincident highway soon the second regex House invalid values like BRC-000 or ACC-00 so further treatment is needed in the application as a list exceptions or find out which states have those highways and refine more regex.
A generic Series:
[A-Z]{2,3}-[0-9]{3}
Entries:
BR-101 //OK
ABC-100 //OK
ZZ-000 //OK
The other would be the list of State acronyms followed by an optional matching C
followed by dash and three numbers.
(AC|AL|AP|AM|BA|CE|DF|ES|GO|MA|MT|MS|MG|PA|PB|PR|PE|PI|RJ|RN|RS|RO|RR|SC|SP|SE|TO|BR)C?-[0-9]{3}
Entries:
BR-101 //OK
ABC-000 //fora do padrão
ZZZ-999 //fora do padrão
PRC-280 //OK
RSC-453 //OK
BRC-000 //OK mas é inválida
ACC-999 //OK mas é inválida