Are there objective advantages in a language being "case sensitive" or not?

Or is this just taste?

Do not want if you like one more than the other. I don't care why people like one or the other more. I don't want flat answers or flat answers. I don't care about historical reasons.

I want to know in a reasoned way what is gained or lost in each of the approaches.

Examples of sensitive languages: C, C++, C#, Java, JavaScript, Python, Ruby, Objective C.

Examples of insensitive languages: SQL, COBOL, BASIC (I think all dialects), PHP (well, I don't know, in part), and Clipper and dialects.

This question is related to this other.

Just to make it clear to those who do not understand the subject, I'm talking about the syntax of the language. I'm talking about the keywords, the identifiers.

Author: Comunidade, 2015-07-13

6 answers

The main advantage of case sensitive is to increase the set of possible symbols (names). The main impact on traditional languages is the creation of an implicit relationship between a type and an instance of this type. Another less employed (Prolog, Erlang) is the possibility of giving differentiated semantic treatment depending on capitalization. Finally, there are practical issues involving the compilation and management of the runtime , in which the implementation of a language case sensitive is simpler.

The main advantage of case insensitive is the ease of memorization of language symbols, and the consequent reduction of errors related to incorrect capitalization.

Types vs. instances

In natural language, a name has the function of representing only a concept. There are bumps (e.g. fruit sleeve or shirt sleeve), but they are rare and ambiguity is usually removed by context. In formal language (e.g. controlled vocabulary, taxonomy, thesaurus , ontology) the uniqueness of names is mandatory, and this applies especially to computer programs - where ambiguity would be a hindrance to automated text processing.

However, ambiguity is not always bad: in the presence of context , using the same name to represent distinct - but related - things not only does little harm but is also useful. If I say "dog" this refers to a type of animal. If I say "a dog" I speak of a specimen of that same animal. If I say "the dog" it is not only any specimen of this animal, but a specific one, which may have already been mentioned before or is unique given the context (e.g. if there is only one dog in a set of animals). This facilitates communication.

Similarly, conciseness is sought in programming, since it is a measure of expressiveness of language. However, this conciseness must always be weighed against the clarity of the code ("programs are made for humans to read and only incidentally for computers to run"). In the absence of these semantic "shortcuts", we look for other ways to make the name of something refer to a concept:

  • sigils ($foo is a scalar, @foo is an array);
  • Hungarian notation (iSize is an integer, szLastName is a NULL-terminated string);
  • naming conventions (IFoo is an interface, _foo is a private field).
  • etc.

These conventions are in general implicit, but ultimately fulfill the role of assisting in communicating code semantics to human readers. And a similar convention, quite similar to the natural language case mentioned above, is to use the same name to refer to a type and an instance of that type:

cachorro = new Cachorro();

In a language case insensitive , it would be necessary to use another name, and most likely the programmer would choose something like:

oCachorro = new Cachorro();
cachorro1 = new Cachorro();

Which makes the code less readable. And by the way, this tendency of programmers to use the same name for class and object is evident when the class has the same name as a reserved word. See if this sounds familiar:

clazz = new Class();

Choosing names is difficult. Someone could insist " now, but just give a different name to the variable, type rex = new Cachorro()", but in practice this makes it difficult to understanding. Even if the programmer sticks to a particular style of coding (class starts with uppercase, variable with Lowercase, constant is all uppercase) he loses this implicit semantic Association of the type with the instance. Or at the very least it is forced to adopt a different convention, such as those exemplified above.

Differentiated semantics

In most languages any difference between capitalized or non-capitalized symbols is purely conventional. But nothing prevents - and I am of the opinion that it would be a major breakthrough - that the compiler "forces" this convention (as suggested in Victor Stafusa's answer). This not only creates consistency but can increase expressiveness, as happens for example in the Prolog language.

Prolog (and other languages inspired by it, such as Erlang) makes case-sensitive and - in the specific case of the first letter of the name - gives a differentiated semantic treatment for the same: foo is an "atom" (constant) and Foo is a variable. This sounds like a silly detail, but in my experience with this language the gain in expressiveness is huge.

As an example, some modern languages have a feature called destructuring assignment / bind, such as Python:

for chave,valor in dicionario.items():

If Python supported unification such as Prolog, it would be possible to do at the same time a destructuring bind, a check of types and even a filtering in a single expression:

# Assumindo uma estrutura de dados Cachorro(nome, raca, idade)

for chave,Cachorro(nome, MALTES, sqrt(4)) in dicionario.items():
    print(nome)

# que seria equivalente a:

for chave,valor in dicionario.items():
    if isinstance(valor, Cachorro):
        nome = valor.nome
        if valor.raca == MALTES: # Assumindo que MALTES é uma constante global
            if valor.idade == sqrt(4):
                print(nome)

If the language was case insensitive, on the other hand, such unification would not be possible. How would the compiler know that:

  • is not to call Cachorro(...) as a function, but sqrt(...) yes;
  • nome is a variable, which should take as a value the first field of the instance being iterated;
  • already MALTES is a constant, which must be compared with the second field of the instance being iterated?

Note that all this could be done as sigils, quotes (ex.: Lisp), etc., but notice how the code gets "clean" without the visual pollution of special symbols everywhere.

(P.S. one more example "relatable" is the case with regular expressions: \w takes a character class, \W negates that class. If regexes were case insensitive you would have to "spend" one more symbol, making the language a slightly less dense.)

Memorization and mis capitalization

It is already difficult to memorize the names of all classes, functions, etc of an API, in order to be productive in it without having to go back every hour to consult a reference. If the use of capitalization is not very consistent , one ends up having to memorize it too, and this is not something to be underestimated: our brains are good at memorizing concepts (and the Oriental ones are even better than the westerners), but not so much to memorize symbols and spellings:

  • if you try to memorize "housewife", most likely your brain will store this as a sequence of sounds (ˈdɔ nɑ d ˈ ˈk.sə).
  • an oriental could memorize this as an image, which is even easier (婦).
  • a programmer would also have to worry about how to write it:
    • dona de casa? No, because identifiers cannot have more than one word;
    • dona-de-casa? No, because - is an operator;
    • dona_de_casa? Could be... or would it be donaDeCasa?
    • DonaDeCasa? DONA_DE_CASA? Let me See, Am I dealing with a variable, class or constant?

If a language follows a rigid capitalization Convention, and especially if the compiler imposes these rules (as also discussed in Victor's answer) then the problem is not so great. But when you start using acronyms The thing gets more complicated:

  • IDCachorro or IdCachorro?
  • UTF8Regex, Utf8Regex, UTF8RegEx or Utf8RegEx?

Unless the compiler knows what a ID is, that there is something called UTF8, which regex means REGular EXpression, etc, nothing prevents a programmer from defining something perfectly valid within the coding style of a language, and another programmer does not know how to spell it...

Frustration with this situation has been one of the most presented arguments in favor of the case-insensitive . For consistency, there are those who defend case-preserving, case-insensitive, which in my understanding means "it does not matter the capitalization, but after a symbol is declared for the first time it requires that it always be spelled in the same way". Personally, I see this as the worst of both worlds: you lose all the advantages of case sensitive, but you keep forcing the programmer to memorize the spelling of the name... (or would it be mine incorrect interpretation of this concept?)

Anyway, there is at least one definite advantage to case insensitive: if the API has defined RegEx and the programmer has written Regex, it won't treat both as distinct things. If the language requires every variable/type/etc to be declared, and doesn't allow it to be declared more than once, that's fine (otherwise you'd be simply swapping false negatives for false positives). It facilitates the process of learning the language and its APIs, in exchange for a reduction in the set of usable names.

Practical questions

Finally, there are the practical implications of adopting one way or another. Much of it has already been described both in Victor's answer and in my answer to the related question : the greater effort on the part of the compiler / interpreter to normalize the names (in relation to Unicode), convert them to a unique form and perform the interning. But more important than this complexity when implementing (which by the way is the function of the computer itself, to make human life easier, even if the compiler designer has to try harder for this) is the question of localization, in which the same program can have a different interpretation if capitalization changes are an integral part of its compilation process.

By way of example, in a language case insensitive How would a compiler in Turkish locale treat names like MAİL and maıl? And how would an "international" compiler treat them? The expectation of the Turkish programmer would be met or thwarted by any of the compilers, and which one? As a user of systems that do not always treat Unicode well, I know how these details can fill the bag and make anger pass. If I had to worry about them also when developing (in which my attention is all "spent" on the problem in hands) I think I would end up rejecting language that did not deal with this very well...

Conclusion

The tradeoffs seems to me to be basically the following:

  • ease of memorization (case insensitive) vs. semantic expressiveness ( case sensitive);
  • false positives (case insensitive) vs. false negatives (case sensitive) when determining whether two names refer to the same thing.

And the other characteristics of the language have influence on these factors, sometimes improving on one side and worsening on the other. Ex.: if Python became case insensitive it would increase the number of collisions of variable names, and if to counterbalance this the declaration of variables would become mandatory (ex.: var RegEx) it would reduce the conciseness.

 30
Author: mgibsonbr, 2017-04-13 12:59:38

One advantage of the language being case sensitive is that it is easier to force code naming rules. Despite this, none of the languages you mentioned that are case sensitive end up forcing naming rules, and I don't know of any that have come to see the light of day that do.

An example, in a language case sensitive that will force naming rules similar to Java (just as an example), if I tried to declare a variable with a name beginning with uppercase, would receive compile error. If I tried to declare a named Class starting with lowercase as well. However, the fact that these languages do not force the naming rules, causes them to lose this advantage, since they can still declare the name of a class with a lowercase letter or the name of a variable with a capital letter, disobeying the conventions of the language.

On the other hand the disadvantages are many. It becomes more difficult to learn the languages case sensitive , as it is difficult for beginners to get used to the idea that fileName, FileName, filename and FILENAME are different things. Even those who are more experienced sometimes end up making mistakes to change some case that the compiler will pick up... if the language is compiled!

If the language is interpreted, case sensistive and allow implicit declaration of variables, as with the javascript, if you unintentionally use resultARray instead of resultArray, you will have a headache and will waste a lot of time with debugging. This is because naturally the case sensitive languages allow identifiers to exist in the same scope differing only by case, making resultARray and resultArray different variables. But worse than that, it's when you get all that gambiarrado code written by some idiot who uses in the same scope the variables xy, Xy, XY and xY with completely different purposes and purposes.

It is true that the fact that the language is case sensitive inhibits the programmer from writing code inconsistently and paying attention to case, which should make the code look more uniform. However, bad programmers will always find a way to encode identifiers with inconsistent naming in case sensitive and good programmers will always find a way to encode identifiers with consistent naming even in case insensitive languages.

So let's summarize what's good:

  • Force a naming rule that leaves the code uniform. Languages case insensitive do not even try this and fail. Languages case sensitive try, but fail the same way.

  • Identifiers that differ only in case must match the same identifier. - Point for the case insensitive.

  • Identifiers that differ only in uppercase/lowercase should be prohibited. - Point for the case insensitive.

Thus, we have on the scoreboard two points in favor of case insensitive and zero for case sensitive (or half a point, if you want to consider that the unsuccessful attempt is better than thing).

So the case insensitive is better? Yes, it's better than the case insensitive , as it's less confusing, less error-prone, and more natural. But that doesn't mean there can't be something even better: would it be possible to create a programming language that gets three points?

Yes, if the programming language aims to ensure the consistency of the identifiers without creating confusion. For this, it would have to be case sensitive in the analysis of the source code, but case insensitive in accepting this. For example, if the language dictates that variable names should be written only in lowercase, then declaring a variable with the name Minha_Variavel instead of minha_variavel should cause a compilation error (or at least a warning). On the other hand, if I declared minha_variavel = Minha_Variavel * 2, the compiler would understand that Minha_Variavel is just incorrectly spelled minha_variavel, giving a compilation error or warning warning this, but he would not think that it is another variable. Such a programming language would achieve a score of 3 and surpass both the purely case sensitive and the purely case insensitive. But I do not know any programming language like this, what comes closest are IDEs of languages case insensitive that convert to the default form automatically as the programmer types (as happens with the Visual Basic).

There is still a little-targeted disadvantage in case insensitive languages: Unicode characters outside the ANSI standard. An interesting example is Turkish, where the uppercase form of i is İ while the lowercase form of I is ı. Thus, if a language that supports international characters in identifiers is case sensitive , when these characters are used it may end up not being clear which different identifiers are for those who do not know the alphabet in question (another example, few people would see ΓΩξνσς as equivalent to γωΞΝΣΣ). For alphabets such as Chinese, which does not have the concept of uppercase/lowercase, this distinction gets even worse, since they are letters that are neither uppercase nor lowercase. However, few programming languages accept identifiers with names using Unicode characters, and in general, even those that allow, use them ends up being a bad programming practice. Because of this, in the end, this disadvantage ends up either weighing very little or being irrelevant in practice.

 11
Author: Victor Stafusa, 2015-07-16 02:51:05

In addition to perfumery, of course (there is an attempt to standardize the code, which should, does not happen very often)... There are some significant advantages, for example in encodings such as base64_encode and base64_decode, data encryption, creation of more secure passwords, use of rashs and unlimited combinations. SEO page indexing. Behavioral alteration between similar methods. Indexing things from the same world, for example, you can have a class called Animal and a constant called ANIMAL. But nothing that prevents it from being discarded, since much of it despises its importance...

 0
Author: Ivan Ferrer, 2015-07-13 18:59:20

I'm really going by the definition. Case-sensitive is an Anglicism that refers to a type of typographic analysis of computer science. In Portuguese, it means something like "case sensitive"or" case sensitive". A software is said to be case-sensitive or has "case sensitivity" when it is able to analyze a string, evaluate the existence of high box and low box, and behave in different ways depending on it.

Case-sensitive it means that high-box and low-box characters are treated differently. For example, the words sum and SUM are considered different

 0
Author: Miro Eduardo, 2016-12-15 08:50:32

Basically

  • expands the number of symbols you can use
  • standardizes the code
  • makes it easy to read
 0
Author: Jonathan, 2016-12-15 10:09:36

In languages that require variable declaration before use, the question is generally irrelevant.

In languages that do not require variable declaration before use (typically scripting languages) being case-sensitive is a shot in the foot, as an error in typing variable names is not a syntactic error, so it is not accused as an error by the compiler/pre-compiler/interpreter/etc.

 0
Author: zentrunix, 2016-12-17 19:10:27