Doubt about Chomsky's hierarchy

Question

Doubt about Chomsky's hierarchy

Regarding the 4 types of languages, I have doubt in understanding about the context-dependent or context-sensitive language.

I find very confusing the explanation I find by the sites, I would like someone to help me understand what is a context-sensitive language and a unrestricted language.

3

terminologia ciência-da-computação teoria-da-computação linguagens-formais

Author: Victor Stafusa, 2019-03-29

Source

1 answers

score 6 · Answer 1

To hierarchy

The hierarchy of languages is this:

What Chomsky considered at the time were regular, context-free, context-sensitive, and unrestricted languages. However, I will address all of them, including the other three.

Finite languages

A finite language is one that corresponds to a finite set of strings. Therefore, a language is not finite if there is an infinite set of strings in that language.

Regular languages

All finite languages are regular, but not all regular languages are finite.

A regular language is one that can be expressed by a regular expression. A regular expression consists of strings that can have:

(a ) repetitions of certain subsequences in an indeterminate number of times and;
(b ) choices between two or more follow-ups.

The characteristic b exists in finite languages, and therefore it is the a that increases the recognition power of languages and allows regular languages to express an infinite number of strings. Thus, a regular expression that does not make use of the characteristic to expresses a finite language.

Other characteristics often attributed to regular expressions such as repetition of subsequences one or more times, repetition of subsequences a certain number of times and subsequences that may or may not appear, are all characteristics that arise from combinations of the characteristics a and b described above.

Context-free languages

All regular languages are context-free, but not all context-free languages are regular.

A context-free language is one that can be expressed by means of a Context-Free Grammar.

A context-free grammar is a set of nested recursive derivation rules beyond what regular languages are already able to recognize. Each rule, called production, is defined as how a non-terminal symbol can be expanded into a sequence of terminal and non-terminal symbols. The string to be recognized or generated by a Context-Free Grammar consists only of symbols terminals, being the non-terminal symbols used only in the intermediate steps of generation or recognition of the string.

Even without the use of recursion in rules, the possibility of expanding non-terminal symbols into sequences of terminal and non-terminal symbols is sufficient to express the characteristics a and b of regular languages. Therefore, it is the possibility of using such recursion that makes languages context-free more expressive than regular ones. A context-free grammar that does not use this feature is equivalent to a regular expression.

I explain more about context-free languages here .

Context-sensitive languages

All context-free languages are context-sensitive, but not all context-sensitive languages are context-free.

A context-sensitive language is one that can be expressed by some context-sensitive grammar.

A context-sensitive grammar is somewhat similar to a context-free grammar, but with one big difference: productions do not necessarily map only non-terminal symbols to Terminal and non-terminal sequences. They can map terminal and non-terminal sequences to other terminal and non-terminal sequences, and this feature makes them more expressive than free grammars context.

An example of context-sensitive grammar, that I took from Wikipedia and that recognizes the language {a ^N b ^N c ^N: n ≥ 1} that is not Context-Free is this:

S → aBC
S → aSBC
CB → CZ
CZ → WZ
WZ → WC
WC → BC
aB → ab
bB → bb 
bC → bc
cC → cc

Note that unlike in a context-free grammar, the left side of the productions is not restricted to a singular non-terminal symbol.

Context-sensitive grammars are often confusing, difficult to understand understand and difficult to analyze. The main reason for this is that they define rules that can be used to replace anything from the input with anything else in any way in any order and anywhere. Effectively, they are computationally equivalent to Turing machines with memory limited to input size.

Therefore, using these production rules is not the only way to specify how a context-free language works. A more natural way is to define it as a deterministic Turing machine whose tape on which it operates is limited only to the size of the input. Being a Turing machine, it can perform any type of operations of reading, writing and symbol manipulations on this input tape. However, since your tape is limited and finite, it still has less computational power than a Turing machine with unlimited memory.

When using a Turing machine deterministic with an alphabet much larger than the input alphabet, it is possible to simulate a deterministic Turing machine operating on a memory tape whose maximum size is the input size multiplied by some arbitrary constant. Therefore, an equivalent definition for a context-sensitive language is one that can be recognized by a deterministic Turing machine operating on a tape whose size maintains a linear relationship with the size of the tape. entry.

A practical example of a context-sensitive language that is not Context-Free is the one given above, {a^NB^Nc^N: n ≥ 1} . That is, the language that corresponds to a sequence of a's, followed by a sequence of B's and then a sequence of C's, where the number of a's, B's and C's are equal.

Another example of context - sensitive language is to check if pairs of italics ( - ), bold ( - ) and underscore ( - ) open and close correctly, even if they are not properly nested. In this language, the string abcdefghijklmno is a valid string. There is no context-free language capable of recognizing this.

Another simple example of a context-sensitive language is that language that consists of the sequence of digits between 0 and 9 that express a prime number in the decimal base. Again, there is no context-free language capable of recognizing it.

Another example, and one that abuses the available computational power, is to determine whether a sequence of symbols representing variables, parentheses and The Logical Operators E, or e is or is not satisfiable. This is the satisfiability problem which is the best known NP-complete problem. The class of context-sensitive languages is so comprehensive that it encompasses all problems in NP.

Recursive languages

All languages context-sensitive are recursive, but not all recursives are context-sensitive.

The definition of a recursive language is simple: everything for which there is some Turing machine that recognizes or rejects it in finite time.

It is difficult to think of a recursive language that is not context sensitive. Most of the computational problems we face on a day-to-day basis are computationally equivalent to recognizing some language context sensitive.

A recursive language that is not context sensitive must necessarily be impossible to recognize with the use of an amount of memory limited to the input size multiplied by any constant, however arbitrarily high it may be. This already implies a computational complexity outside of NP and absolutely intractable, although still solvable problems.

Problem examples that are recursive but are not context sensitive:

Determine whether two regular expressions are equivalent.
Check or refute the validity of First-Order sentences (which include variables, and, or, not, for all (∀) and exists (existe)) with real numbers, addition and comparison (but without multiplication).
Determine if there is any perfect strategy to win a chess game being disregarded some rule that forces the draw after some number arbitrary moves without capture for a board of arbitrary size.

Since recursive languages are those that can be decided in finite time, then non-recursive languages are those that cannot be decided in finite time for all cases. In other words, languages that are not recursive are Undecidable.

Recursively enumerable languages

All recursive languages are recursively enumerable, but not all recursively enumerable are recursive.

A recursively enumerable (or semi-Decidable) language is one that is accepted by some finite-time Turing machine, but can require infinite time to be rejected.

For a language to be recursively enumerable without being recursive, it is necessary that there exist strings that do not belong to the language, but that it is impossible to determine this in a finite sequence of steps (after all, it must necessarily be Undecidable). However, strings belonging to the language in question can always be decided as such in a finite (even very large) number of steps.

The classic example of a recursively enumerable language that is not recursive is that of the stop problem. Given a description of a Turing machine and an input chain, determine whether or not the Turing machine accepts such a chain. Although this problem seems to be simple, in truth is a very cruel problem, as there is simply no algorithm capable of solving it in finite time for all possible inputs. I talk more about this problem here .

Unrestricted languages

All recursively enumerable languages are unrestricted, but not all unrestricted languages are recursively enumerable.

The unrestricted word already says what language is: anything and everything. This is the set that encompasses all the languages, including languages that are not recursively enumerable.

Languages that are not recursively enumerable are insoluble. They contain strings that are impossible to be determined as belonging to the language or not.

According to rice's theorem deciding on any non-trivial semantic property of some program is an Undecidable Problem. By trivial property means those that are valid for all programs or for no programs (and therefore non-trivia are those that are true for some and false for others). By semantic property is understood by properties about the behavior of the program. Being Undecidable, therefore, these problems are not recursive and many of them are also not recursively enumerable. We can use this to design some languages that are not recursively enumerable. For example:

the set of strings of characters describing Turing machines that decide whether their inputs are prime numbers.

Another known example, which does not depend on Rice's theorem is this:

the set of strings that represent context-free grammatical pairs that recognize the same language.