Paradigm theory of relational databases
Relational database design is to follow certain rules. In particular, the database design paradigm is briefly introduced 1NF (first paradigm), 2NF (second paradigm), 3NF (third paradigm) and BCNF, and there is a fourth paradigm and the fifth paradigm to be left until later to introduce. When you design a database, if you can meet these paradigms, you are a master of database design.
The first paradigm (1NF): in the relational model R in each specific relationship r, if each attribute value is the smallest unit of data that can not be subdivided, it is said that R is the first paradigm of the relationship. Example: if employee number, name, and phone number form a table (a person may have an office phone number and a home phone number) canonicalized to be 1NF there are three ways to do this:
One is to duplicate the storage of employee number and name. This way, the keyword can only be the phone number.
Two, the employee number is the keyword, and the phone number is divided into two attributes, the office phone number and the home phone number
Three, the employee number is the keyword, but it is mandatory that there can only be one phone number per record.
The above three methods, the first method is the least desirable, according to the actual situation of the latter two cases selected.
Second Paradigm (2NF): If all the non-primary attributes in the relational schema R (U, F) are completely dependent on any one of the candidate keywords, the relation R is said to belong to the second paradigm.
Example: the course selection relation SCI (SNO, CNO, GRADE, CREDIT) where SNO is the student number, CNO is the course number, GRADEGE is the grade, and CREDIT is the credit. From the above conditions, the keyword is the combination keyword (SNO, CNO)
There are the following problems with the use of the above relational schema in applications:
a. Data redundancy, assuming that the same course is taken by 40 students, the course number is repeated 40 times.
b. Update anomalies, if you adjust the credits of a course, the corresponding tuple CREDIT values are updated, it is possible that the same course credits are different.
c. Insertion exceptions, such as the plan to open a new course, because no one elective, there is no keyword for the number of students, you can only wait for someone to take the course to deposit the course and credits.
d. Deletion exceptions, such as if the student has already finished, from the current database to delete the elective records. Certain courses are not yet taken by new students, then the course and credit records cannot be saved for this course.
Cause:
The non-keyword attribute CREDIT is functionally dependent on CNO only, i.e., CREDIT partially depends on the combined keywords (SNO, CNO) rather than completely. Split into two relationship schemas SC1 (SNO, CNO, GRADE), C2 (CNO, CREDIT). The new relation consists of two relational schemas, which are linked by the outer keyword CNO in SC1, and then naturally linked when needed, restoring the original relation
Third Paradigm (3NF): a relation R is said to belong to the Third Paradigm if all the non-primary attributes in the relational schema R (U, F) do not have transitive dependencies on any of the candidate keywords.
Example: e.g., S1 (SNO, SNAME, DNO, DNAME, LOCATION) each attribute represents the student number,
name, department, department name, and department address.
The keyword SNO determines each attribute. Since it is a single keyword, there is no problem of partial dependency, and it must be 2NF. However, there must be a lot of redundancy in this relationship, and several of the attributes DNO, DNAME, and LOCATION related to the student’s location will be duplicated in the storage, and insertion, deletion, and modification will result in a situation similar to that in the above example. The relationship S cannot be without the foreign keyword DNO, otherwise the link between the two relationships is lost.
BCNF: A relation R is said to belong to BCNF if all the attributes (both primary and non-primary) of the relation schema R (U, F) do not pass any of the candidate keywords that depend on R. Or a relational schema R, if each determinant contains keywords (rather than being contained by keywords), then the relation R is a relational schema of RCNF.
Example: the parts management relational schema WPE (WNO, PNO, ENO, QNT) tables warehouse number, part number, employee number, and quantity respectively. There are the following conditions
a. A warehouse has more than one employee.
b. An employee works in only one warehouse.
c. One type of accessory in each warehouse is under the responsibility of a special person, but one person can manage several types of accessories.
d. Accessories of the same model can be placed in several warehouses. From the above, PNO cannot determine QNT, which is determined by the combination attribute (WNO, PNO), and there is a functional dependence (WNO, PNO)->QNT. Since one type of fittings is under the responsibility of one person in each warehouse, but one person can manage several types of fittings, there is a combination attribute (WNO, PNO) in order to determine the person in charge, and there is a function dependence (WNO, PNO)->ENO). Since an employee works in only one warehouse, there is ENO->WNO. since one type of accessory in each warehouse is under the responsibility of a dedicated person, and an employee works in only one warehouse, there is (ENO, PNO)->QNT.
Look up the candidate keywords, since (WNO, PNO)->QNT, (WNO, PNO)->. ENO, therefore (WNO, PNO) can determine the entire tuple and is a candidate keyword. According to ENO->WNO, (ENO, PNO)->QNT, therefore (ENO, PNO) can also determine the whole tuple and is another candidate keyword. The attributes ENO, WNO, and PNO are all principal attributes, and there is only one non-principal attribute, QNT. it is fully functionally dependent and directly dependent on any of the candidate keywords, so the relational pattern is 3NF.
Analyze the principal attribute. Because of ENO->WNO, the main attribute ENO is a determinant of WNO, but it is not a keyword itself, it is just part of the combined keyword. This creates a partial dependency of the main attribute WNO on another candidate keyword (ENO, PNO), because (ENO, PNO)->ENO but the reverse does not hold, and P->WNO, so (ENO, PNO)->WNO is also a transfer dependency.
While there is no transfer dependency of a non-primary attribute on a candidate keyword, there is a transfer dependency of a primary attribute on a candidate keyword, which can be equally troublesome. For example, a new employee is assigned to work in a warehouse, but is temporarily in the internship phase and is not independently responsible for the task of managing certain parts. It cannot be inserted into the relationship because it lacks the PNO, which is part of the keyword. Or if a person changes to be responsible for security regardless of accessories, the employee will be deleted when the accessories are deleted.
Solution: split into management EP (ENO, PNO, QNT), the keyword is (ENO, PNO) work EW (ENO, WNO) whose keyword is ENO
Disadvantage: after the decomposition of the function dependency of the retention of poor. In this example, due to decomposition, the function dependency (WNO, PNO)->ENO is lost, thus destroying the original semantics. It does not reflect the fact that one type of part is under the responsibility of one person in each warehouse. It is possible that a component is managed by two or more people at the same time. As a result, the decomposed relational model reduces some of the integrity constraints.
For decomposition of a relationship into multiple relationships to be meaningful, the minimum requirement is that the original information is not lost after decomposition. This information includes not only the data itself, but also the mutual constraints between the data as represented by the functional dependencies. The goal of decomposition is to achieve a higher level of normalization, but decomposition must be accompanied by two considerations: lossless connectivity and preservation of functional dependencies. It is often not possible to have lossless connectivity and fully maintain functional dependencies. Tradeoffs need to be made as needed.
The four paradigms from 1NF up to BCNF have the following relationships:
BCNF contains 3NF contains 2NF contains 1NF Purpose: The purpose of normalization is to make the structure more rational, eliminate storage anomalies, make data redundancy as small as possible, and to facilitate insertion, deletion, and updating
Principle: Follow the concept of unitary one thing and one thing principle, that is, a relational schema describes an entity or a link between entities. The essence of the specification is the conceptual homogenization.
Methodology: projective decomposition of a relational schema into two or more relational schemas.
Requirements: the decomposed set of relational schemas should be equivalent to the original relational schema, i.e., the original relationship can be restored without loss of information through natural linkage, and reasonable connections between attributes are maintained.
Note: The decomposition of a relational schema can result in different sets of relational schemas, i.e. the decomposition method is not unique. The requirement of minimum redundancy must be realized on the premise that the decomposed database can express all the information of the original database. The fundamental goal is to save storage space, avoid data inconsistency, improve the efficiency of the operation of the relationship, and at the same time meet the application requirements. In fact, it is not necessarily required that all the schema can not reach BCNF. Sometimes intentionally retain some of the redundancy may be more convenient for data queries. This is especially true for database systems that are updated infrequently and queried very frequently.
In relational databases, in addition to function dependencies, there are multi-valued dependencies, join dependencies, which puts forward the fourth paradigm, the fifth paradigm and other higher level of normalization requirements. Here, more on that later.
Friends, you have read what you think, in fact, any book on the basic theory of the database will talk about these things, taking into account that many users are halfway to do the database. Special to find a book to copy a hand, you have any questions, do not ask me, they go to find a book on relational database theory to see it, perhaps, you are very helpful. Said to say that the above is the basic theory of things, please think about, you do database design time has not considered to comply with the above paradigm, there is no database design is not good, think about, compared to the above, in the end, is a violation of the first few paradigms it?
I have seen the database design, very few people do very much in line with the above paradigms, in general, the first paradigm we can all comply with, fully comply with the second and third paradigm of very few people, to comply with the people must be the design of the database of the masters, the paradigm of the BCNF less chance of appearing and will destroy the integrity of the design, you can do not take it into account in the design of, of course, in the ORACLE can be triggered by the trigger in the database. You can disregard it in your design, but of course, you can solve its shortcomings through triggers in ORACLE. In the future, when we do the design together, we hope that you will comply with the above paradigms.
An analysis of the three main database design paradigms
In order to create less redundant and well-structured databases, certain rules must be followed when designing a database. In relational databases such rules are called paradigms. A paradigm is a summary that meets a certain design requirement. To design a well-structured relational database, certain paradigms must be met.
To really understand what “paradigm (NF)” means, first of all, look at the definition in the textbook, paradigm is “a collection of relational patterns that conform to a certain level, indicating the degree of rationalization of the links between the attributes within a relationship”. In fact, it can be roughly interpreted as a table structure of a data table meets the level of a certain design standard. Just like home improvement to buy building materials, the most environmentally friendly is E0 level, followed by E1 level, and E2 level and so on. The database paradigm is also divided into 1NF, 2NF, 3NF, BCNF, 4NF, 5NF. generally in our design of relational databases, the most we can consider BCNF is enough. A design that conforms to a higher paradigm must conform to a lower paradigm, e.g., a relational schema that conforms to 2NF must conform to 1NF.
There are three design paradigms that are the most common in real-world development:
First, there is the first paradigm (1NF).
Relationships that conform to 1NF (which you can understand as data tables. The difference between a “relationship” and a “relational pattern” is similar to the difference between a “class” and an “object” in object-oriented programming. The difference between “relationship” and “relational schema” is similar to the difference between “class” and “object” in object-oriented programming. Relationship” is an example of “relational schema”, you can understand “relationship” as a table with data, and “relational schema” is the table structure of this table. A 1NF is defined as a relationship that conforms to a 1NF where each attribute is indivisible. The case shown in Table 1 is not 1NF compliant.
Table 1
In fact, 1NF is the most basic requirement for all relational databases. When you create a data table in a relational database management system (RDBMS), such as SQLServer, Oracle, or MySQL, if the data table’s design doesn’t comply with this basic requirement, the operation will certainly not be success. In other words, if the data table already exists in the RDBMS, it must meet the 1NF. If we want to represent the data in the table in the RDBMS, we have to design it in the form of Table 2: Table 2
Table 2
But the design that only conforms to the 1NF will still have the problems of excessive data redundancy, insertion exceptions, deletion exceptions, and modification exceptions, such as for the design of Table 3:
Each student’s The data of student number, name, department name, and department head are repeated many times. The data for each department and its corresponding department head is also repeated several times – too much data redundancy
If the school has created a new department, but has not yet enrolled any students (e.g., the department was created in March, but enrollment won’t occur until August), it is not possible to add the data for the department name and head of the department individually to the data table.
If all records related to students in a department are deleted, then all department and department chair data disappears (just because all students in a department are gone does not mean that the department is gone). –Deletion Exception
If Li Xiaoming changes departments to the Law Department, then in order to ensure the consistency of the data in the database, it is necessary to modify the data of the department and department chair in three records. –Modify Exception.
Precisely because of the problems with a database design that conforms only to the 1NF, we need to raise the design standard to conform to a higher paradigm (2NF) by removing the factors that lead to the four problems mentioned above, which is known as “normalization”.
The Second Paradigm
The Second Paradigm is one level above the First Paradigm. It means that 2NF builds on 1NF by removing some of the functional dependencies of non-primary attributes on codes.
Functional dependency: if in a table, in the case of attribute (or group of attributes) X’s value is determined, it must be able to determine the value of attribute Y, then it can be said that the Y function is dependent on X, writing X → Y.
Functional dependencies in a table are, for example:
Department name → Department head
Student number → department head
(student number, class name) → Score
But the following functional dependency does not hold:
College number→Class name
College number→Score
College name→Department chairperson
(College number, class name)→Name
Code:K is a code if, when K is determined, all the attributes of the table, except for K, have their values determined as well. Code can also be understood as the primary key.
The second paradigm needs to ensure that every column in a database table is related to the primary key and not just a part of the primary key (mainly for joint primary keys). This means that only one type of data can be stored in a database table, and not multiple types of data can be stored in the same database table.
For example, to design an order information table, because there may be multiple items in the order, the order number and the item number should be used as the joint primary key of the database table, as shown in the following table.
Order Information Table
This creates a problem: the order number and item number are used as joint primary keys in this table. So in this table the information such as product name, unit, product price etc. are not related to the primary key of this table but only to the product number. So the second paradigm design principle is violated here.
And it would be perfect if this order information table is split and the commodity information is separated into another table and the order item table is also separated into another table. As shown below.
Order Information Table
Order Items Table
Merchandise Information Table
Designed in this way, the database redundancy is reduced to a large extent. If you want to get the product information of an order, you can just use the product number to look up in the product information table.
So you can summarize the judgment is:
First step: find out all the codes in the data table.
The second step: according to the code obtained in the first step, find out all the main attributes.
Step 3: In the data table, remove all the master attributes and the rest will be non-master attributes.
Step 4: See if there is any partial function dependency of the non-primary attributes on the code.
The Third Paradigm
3NF builds on 2NF by eliminating the transfer function dependency of non-primary attributes on codes. That is, if there is a transfer function dependency of a non-primary attribute on a code, it does not qualify for 3NF.
Then it is that the third paradigm needs to ensure that every column of data in a data table is directly related to the primary key and not indirectly.
For example, when designing an order data table, you can use the customer number as a foreign key to establish a corresponding relationship with the order table. And you can not add fields in the order table about other information about the customer (such as name, company, etc.). The design shown in the following two tables is a database table that satisfies the third paradigm.
Order Information Table
Customer Information Table
This way, when querying the order information, you can use the customer number to refer to the records in the customer information table, and you don’t have to enter the content of the customer information multiple times in the order information table, which reduces data redundancy.
This shows that the 3NF-compliant database design basically solves the problems of excessive data redundancy, insertion exceptions, modification exceptions, and deletion exceptions. Of course, in practice, often in order to performance or to cope with the need to expand, often to 2NF or 1NF, but as a database designer, at least should know, 3NF requirements are how.
How many paradigms are there for creating a relational database? And detail the recursive relationship between each paradigm
First Paradigm (1NF): every attribute is an atomic item and is not divisible
Not divisible, as described in INF, means that it must be divisible if it is divisible, and this is judged in the context of the application, and when the attribute is a document, it should not be divisible, even though the document is marked with a paragraph mark.
The second paradigm: each non-primary attribute to be fully functionally dependent on the candidate key, or the primary key.
The key word is “full dependency”, as opposed to “partial dependency” or “partial dependency”, so that if the candidate or primary key consists of two attributes, the non-primary attribute cannot depend on only one or some of them. depend on one or some of the attributes.
For example, if a daily stock ticker consists of four attributes: ticker symbol, stock name, date, and closing price, this violates 2NF because the “stock name” is partially dependent on the “ticker symbol”.
The third paradigm: all non-primary attributes have no transitive dependency on any candidate keyword
The key word is “transitive dependency”, which is a transitive dependency if a non-primary attribute is dependent on a primary key through another non-primary attribute.
For example, the basic stock information table consists of stock code, stock name, company name, region, and province, where “province” depends on the region, and there is a transfer dependency.
——— ———- ———- ——
Several related terms:
Superkey: A set of attributes that uniquely identifies a tuple in a relationship is called a superkey of the relational schema. superkey of the relational schema
candidatekey: a superkey that does not contain redundant attributes is called a candidate key
primarykey: a candidate key selected by the user to identify a tuple is called a primary key
primaryAttribute: an attribute in a candidate key is called a primary attribute
Non-primaryAttribute: an attribute in a candidate key is called a primary attribute.
Non-KeyAttribute: An attribute that is not included in any candidate key is called a non-primary attribute.