Table normalization to 2nd Normal Form
A relation is in 2FN if and only if it is in 1FN and not contains partial dependencies.
From this definition I am normalizing a BD to 2fn. This database has 2 tables, which illustrates the figure with their respective attributes:
To candidate key cpf
of the Table cliente
, it's causing partial dependence on this format, right? For only id
already the reference.
Rearranging the table cliente
according to 2FN I ended up defining it like this, is the central idea of my normalization thinking correct?
Is there any other better way for me to normalize it?
3 answers
Your tables Cliente
and Agência
are in the first normal form because no field is multivalued, so let's focus on the second normal form.
There are two candidate keys in the table Cliente
: id
and cpf
. Clearly from the id
we can get any other field and from the cpf
as well. Nor can we use only a part of id
or cpf
for this purpose.
Already in the table Agencia
, the only candidate key is id
and the other fields are defined based on id
and not just part of id
.
However, the fields estado
are determined by the fields cidade
which in turn are determined by the fields rua
in Table Cliente
and cep
in Table Agencia
. That is, the fields cidade
and estado
are violations of the second normal form because they do not depend on the primary key, but on some other field. Your attempt to normalize does not correct this problem completely, but it is already a step in some direction.
The solution Series:
-
Create a table
Estado
with the fieldscodigo
andnome
.The
codigo
is the primary key, while thenome
is another candidate key because we cannot have two states with the same name. -
Create a table
Cidade
with the fieldscodigo_estado
,codigo_cidade
andnome
.The
codigo_cidade
is the primary key, thecodigo_estado
is the foreign key for the tableEstado
.The fields
codigo_estado
andnome
are also a candidate key because they do not there may be two cities with the same name in the same state.The field
nome
alone is not a candidate key because there may be cities with the same name in different states (such as Rattlesnake, name of a city in Paraná and one in Ceará). -
Create a table
Logradouro
with the fieldscep
,codigo_cidade
,bairro
andnome
.cep
it's the primary key.codigo_cidade
it is the foreign key for the tableCidade
.One would imagine that
codigo_cidade
andnome
together were a key candidate, and in almost every city that would be true, but it is not always. In São Paulo, for example, there are three different streets called "Rua Piracicaba". Therefore, the candidate key iscodigo_cidade
,nome
andbairro
. -
Create a table
Endereco
with the fieldsid
,cep
,numero_logradouro
andcomplemento
.id
it's the primary key.cep
it is the foreign key for the tableLogradouro
.Interestingly, the number of the plot is not always numerical. For example: "Rua João da Silva, 148-B , casa dos fundos ".
In the tables
Agencia
andCliente
, put a foreign key forEndereco
and remove the other fields for addresses.
This model is not yet perfect, because the same street can have different ZIP codes and maybe it was the case to have a table of neighborhoods and also a table of types of logradouros (Avenue, Street, Square, alameda, transom, Highway, court, etc.). However, for its purpose, this should be enough.
Normalization essentially exists to resolve redundancies. See any redundancy there?
Address
None shown. Is it possible for the customer to have more than one address? Is it possible for more than one customer to have the same address? If you can, maybe makes sense to apply normalization in this case. With only one address it makes no sense to make this separation. Neither the second, nor any normal form applies.
Note that in the second example you created a table called Cliente Endereço
. Why are you putting an agency's address on it? It doesn't make sense.
Let's assume that it is actually a general address table. You could even put the address there. But for what?
Nothing prevents you from having a table only for addresses, but if there is no repetition of the data you are not doing this by normalization. This makes sense if entities can have more than one address or more than one of them has the same address.
I did not enter the table normalization address because it does not seem to be the focus of the question and in the form presented may even be normalized, it has no data indicating what are the columns, not even the types, it could very well be ids
of the normalized data. It may also be that you do not want to normalize this.
Agency
A table has been created to separate the agency it belongs to. Again, do you have more than one agency that the client can have? It's the same question of the address. What gain do I think you got doing this? That problem you think you solved? Normalization needs to solve problems, not cause new ones. I saw no advantage.
Best Model
Even these cases can be questioned in modern databases. It doesn't cost so much to keep space for more than one address in the entity itself. It is not always a problem to have a very possible repetition of registration of the address when two entities are in the same address. It is a matter of pragmatism.
If you can have these cases, strictly speaking, it should normalize, but an experienced developer will analyze how much the effort is worth because it complicates the model hindering all the code and performance. So you have to think if it compensates, if it is so necessary.
On the other hand the whole model may be wrong. This idea of separating entities into customer, supplier, Bank, etc. it's wrong by nature. At least in the presented form.
Entities are natural and legal persons (separate). Customer, supplier, bank, carrier, seller, employee, etc. these are roles that these people play in this organization. In these tables should only have data on the papers. The data of the person himself should be in the tables of natural and legal persons. This is the most correct, but again it is possible pragmatically to do otherwise if it makes sense. But you need to choose another shape because it's the best and not because it's the only shape you know. Not all cases make sense to separate so, but in general is the most correct.
I could point out other possible errors in this model. But it would be speculation because it is only right or wrong knowing all the requirements. The errors I see would be according to my experience, not with the actual case, are could be right or not.
Normalization
To know how to normalize you need to know the goals. Be the most formally correct, be the fastest to develop, be the easiest to maintain, be the most performatic, to be what the teacher taught or the boss ordered even if it is not the best, or whatever. You can't blindly normalize. How far to normalize and where to stop is something you will learn over time.
Based on the conversion examples in the book, below is an example of 1FN to 2fn and 2FN to 3fn.
Based on the drawing of the book, I made a drawing to try to help with your question.
Since your question limited the conversion only up to 2FN , I did not convert from 2fn to 3FN as the example in the book shows.
Consider the primary keys the words with full underscore, e.g. (
CPF
) and the foreign keys like (_CEP
) with the underline before the word.
I added some non-key attributes to the CLIENT_AGENCIA relationship table just to make the illustration more coherent, since you did not cite any relevant attributes to compose that table.
Source: database systems 6th edition, authors: Elmasri, Ramez Navathe, Shamkant B. year: 2011