What are the design principles of relational databases

What are the principles for designing a relational database

Guideline 1: Minimize the number of records accessed in a transaction.

Guideline 2: Keep transactions as simple as possible.

One, don’t add too many modify or delete statements in the same transaction.

The second is that when updating, if there are many statements to update at once, it is better to choose the right time to update

Guideline 3: Don’t ask for user input during a transaction

Guideline 4: Try not to open a transaction when browsing data.

Database design principles

The design of the database in this system, to consider and follow the following basic principles of database design, in order to establish a stable, safe and reliable database.

1)Principle of consistency:Uniform and systematic analysis and design of data sources, coordination of various data sources to ensure data consistency and validity.

2)Integrity principle:The integrity of the database refers to the correctness and compatibility of the data. To prevent legitimate users from using the database to add unsemantic data to the database. There should be an audit and constraint mechanism for the data entered into the database.

3)Security principle:The security of the database refers to the protection of data to prevent data leakage, alteration, or destruction caused by the use of the database by illegal users or the illegal use of the database by legitimate users. To have authentication and authorization mechanisms.

4)Principle of scalability and extensibility:The design of the database structure should take full account of the need for development, the need for transplantation, with good scalability, extensibility and moderate redundancy.

5)Normalization:The design of the database should follow the theory of normalization. Normalized database design can reduce the database insertion, deletion, modification and other operations when the exception and error, reduce data redundancy and so on.

Please guys explain me the basic principles of database design!

Three paradigms of database designThe so-called paradigm, is a relational database relational model standardization, from the standardization of loose to strict, respectively, for different paradigms, usually used in the first paradigm, the second paradigm, the third paradigm and BC paradigm and so on. Paradigms are based on function dependencies.

Functional Dependence

Definition: there is a relational schema R(U), X and Y are subsets of the set of attributes U. Functional Dependence is a proposition shaped as X → Y. For any two tuples t and s in R, there is a t[X]=s[X] implies t[Y]=s[Y], then FDX→Y holds in the relational schema R(U). x → y reads as ‘X function determines Y’, or ‘Y function depends on X’. In layman’s terms, if the value of a field Y in a table is determined by the value of another field or set of fields X, it is called Y function dependent on X. Function dependency should be determined by understanding the rules of the data item and the enterprise, and function dependency based on the contents of the table may be incorrect.

First Paradigm (1NF)

Definition: a relational schema R is said to be a first-paradigm schema if the attributes of each relation r of the schema are indivisible data items.

Simply put, every attribute is an atomic term, indivisible.1NF is the minimum condition that a relational schema should have, and if the database design does not fulfill the first paradigm, it is not called a relational database. The relational database design study of relational normalization is carried out on top of 1NF.

Second Paradigm (2NF)

Definition: a relational schema R is said to be second paradigm if it is 1NF and each non-primary attribute is fully functionally dependent on the candidate key.

Simply put, the second paradigm has to fulfill the following conditions: firstly, it has to satisfy the first paradigm, and secondly, each non-primary attribute has to be completely functionally dependent on the candidate key, or the primary key. That is, each non-primary attribute is determined by the entire primary key function, and cannot be determined by a part of the primary key. As an example:

The primary key of a table with daily stock quotes is composed of the stock code and the trade date. Non-primary attributes such as closing price and volume are determined by the primary key, i.e., the stock code and trade date functions, and neither the stock code nor the trade date alone can functionally determine these non-primary attributes. If there is a non-primary attribute stock short name in this table, the stock short name can be functionally determined by the stock code, so that the non-primary attribute stock short name is not completely functionally dependent on the candidate key, and this design does not satisfy the second paradigm.

Third Paradigm (3NF)

Definition: a relation R is said to belong to the third paradigm if the relation schema R is 2NF and all non-primary attributes in the relation schema R(U, F) do not have a transfer dependency on any candidate keyword.

Simply put, the third paradigm has to fulfill the following conditions: firstly, it has to satisfy the second paradigm, and secondly, there is no functional dependency between non-primary attributes. Since the second paradigm is satisfied, it means that every non-primary attribute is functionally dependent on the primary key. If there is a function dependency between the non-primary attributes, there will be a transfer dependency and this will not satisfy the third paradigm.

As an example: in the stock basic facts table, the primary key is the stock code, and there are non-primary attributes belonging to the primary industry and belonging to the secondary industry. According to business rules, belonging to the secondary industry can function to determine belonging to the first industry, which means that there is such a relationship: the stock code function to determine belonging to the secondary industry, belonging to the secondary industry function to determine belonging to the first industry, which forms a transfer of dependence, such a design does not meet the third paradigm. However, in practice, for the convenience of query and use, sometimes also violate the third paradigm. As in the above example, if there is no attribute of the first industry, you need to query the related stocks of the first industry, and you need to use a function to generate the first industry from the second industry, which will affect the performance. Therefore, we usually add the attribute of the first-level industry to which it belongs.

BC Paradigm (BCNF)

BC Paradigm is an enhanced version of the third paradigm, but some people say that it is a direct development from the 1NF, that is, each attribute, including the primary attribute or non-primary attribute, are completely dependent on the candidate key, and there is no passing dependency.

In the implementation of the system, the principles of database design

1. The relationship between the original documents and entities

Can be one-to-one, one-to-many, many-to-many relationships. In general, they are one-to-one relationship: that is, an original document corresponds to and only corresponds to an entity.

In special cases, they may be one-to-many or many-to-one relationships: that is, one original document corresponds to more than one entity, or more than one original document corresponds to one entity.

The entities here can be understood as basic tables.

〖Example 1〗: an employee’s resume, in the human resources information system, corresponds to three basic tables: the basic employee information table, social relations table, work resume table. This is a typical example of “one original document corresponds to multiple entities”.

2. Primary key and foreign key

Generally speaking, an entity can not have neither a primary key nor a foreign key. In an E-R diagram, an entity in a leaf part can define a primary key or not (because it has no descendants), but it must have a foreign key (because it has a father).

The design of primary and foreign keys is important in the design of a global database. The primary key is a high level abstraction of the entity, and the pairing of a primary key with a foreign key indicates a connection between entities.

3. Properties of Basic Tables

Basic tables are different from intermediate and temporary tables because they have the following four properties:

(1) Atomicity. The fields in a basic table are not re-decomposable.

(2) Rawness. The records in the basic table are records of the original data (base data).

(3) Deductive. From the data in the basic table and code table, all the output data can be derived.

(4) Stability. The structure of the basic table is relatively stable, and the records in the table are to be stored for a long time.

Understanding the nature of the basic table, when designing a database, you can distinguish the basic table from intermediate and temporary tables.

4. Paradigm Criteria

The relationships between basic tables and their fields should try to fulfill the third paradigm. However, a database design that satisfies the third paradigm is often not the best design.

In order to improve the operational efficiency of the database, it is often necessary to lower the paradigm standard: appropriate increase in redundancy, to achieve the purpose of space for time.

[Example 2]: There is a basic table for commodities, as shown in Table 1. The existence of the field “Amount” indicates that the design of the table does not satisfy the third paradigm, because “Amount” can be obtained by multiplying “Unit Price” by “Quantity”, which means that “Amount” can be obtained by multiplying “Unit Price” by “Quantity”. “This means that “Amount” is a redundant field. However, the addition of “amount” of this redundant field, you can improve the speed of the query statistics, which is the space for time approach. In Rose2002, there are two types of columns: data columns and calculated columns. “Amount” such columns are called “calculated columns”, and “unit price” and “quantity” such columns are called “data columns”.

5. Understanding the three paradigms in layman’s terms

Understanding the three paradigms in layman’s terms can be very beneficial for database design. In database design, in order to better apply the three paradigms, it is necessary to understand

three paradigms (common understanding is enough to understand, not the most scientific and accurate understanding):

The first paradigm: 1NF is the atomicity constraints on the attributes, requiring that the attributes have atomicity, and can not be decomposed;

The second paradigm: 2NF is the uniqueness constraints on the record, requiring that the record has a unique constraint on record uniqueness, which requires records to have unique identifiers, i.e., entity uniqueness;

Third Paradigm: 3NF is a constraint on field redundancy, i.e., any field cannot be derived from any other field, and it requires that fields have no redundancy.

Database design without redundancy can be done. However, a database without redundancy is not necessarily the best database, and sometimes it is necessary to lower

the paradigm standard and retain redundant data appropriately in order to improve operational efficiency. Specific approach is: in the conceptual data model design to comply with the third paradigm, lower the paradigm standard work to the physical

data model design considerations. Lowering the paradigm means adding fields and allowing redundancy.

6. Be good at recognizing and correctly handling many-to-many relationships

If a many-to-many relationship exists between two entities, it should be eliminated. The way to eliminate it is to add a third entity between the two. Thus, what was a many-to-many relationship is now two one-to-many relationships. The attributes of the original two entities have to be rationally distributed among the three entities. The third

entity here is essentially a more complex relationship that corresponds to a basic table. In general, database design tools do not recognize many-to-many relationships, but they can handle

many-to-many relationships.

[Example 3]: In the Library Information System, “Book” is an entity, and “Reader” is also an entity. The relationship between these two entities is a typical many-to-many relationship: a book can be borrowed by more than one reader at different times, and a reader can borrow more than one book. To this end, to add a third entity between the two, the entity named “borrowing and returning the book”, its attributes are: borrowing and returning time, borrowing and returning signs (0 that is, borrowing a book, 1 that is, return the book), in addition, it should have two foreign keys (“book” primary key, “readers” primary key, “readers” primary key, “readers” primary key, “books” primary key, “books” primary key, “books” primary key, “books” primary key, “readers” primary key, “readers” primary key. “reader” primary key), so that it can be connected to the “book” and “reader”.

7. Primary key PK value method

PK is for programmers to use the table connection tool, can be a non-physical significance of the number of strings, by the program automatically add 1 to achieve. Can also be physically meaningful

field names or combinations of field names. But the former is better than the latter. When PK is a combination of field names, it is recommended that the number of fields is not too many, more than not only the index takes up a lot of space, but also slow.

8. Correct understanding of data redundancy

Primary key and foreign key in multiple tables in the repetition of data redundancy, this concept must be clear, in fact, there are many people are still not clear. The repetition of non-key fields is data redundancy! And it is a low-level redundancy, i.e., repetitive redundancy. High-level redundancy is not the recurrence of fields, but the derivative occurrence of fields.

〖Example 4〗: the “unit price, quantity, amount” three fields in the commodity, “amount” is derived from “unit price” multiplied by “quantity”, “amount” is derived from “unit price” multiplied by “quantity”, “amount” is derived from “unit price” multiplied by “quantity”. “quantity” is derived, it is redundant, and is a high-level redundancy. The purpose of redundancy is to increase processing speed. Only low-level redundancy increases data inconsistency because the same data may be entered multiple times from different times, places, and roles. Therefore, we advocate high-level redundancy (derived redundancy) and oppose low-level redundancy (repetitive redundancy).

9. There is no standard answer to the E–R diagram

There is no standard answer to the E–R diagram of an information system because its design and drawing method is not the only one, as long as it covers the business scope and functional content of the system requirements, it is feasible. Instead, the E–R diagram should be modified. Although it does not have the only standard answer, it does not mean that it can be designed arbitrarily. The criteria for a good E–R diagram are: clear structure, concise associations, moderate number of entities, reasonable distribution of attributes, and no low–level redundancy.

10. View techniques are useful in database design

Unlike basic, code, and intermediate tables, a view is an imaginary table that relies on the real table of the data source for its existence. View is for programmers to use the database of a window, is the base table data synthesis of a form, is a method of data processing, is a means of confidentiality of user data. In order to carry out complex processing, improve computing speed and save storage space, the depth of the definition of the depth of the view is generally not more than three layers. If three layers of view is still not enough, the view should be defined on the temporary table, the temporary table and then define the view. This repeated iterative definition, the depth of the view is not limited.

For certain information systems related to national political, economic, technological, military, and security interests, the role of views is even more important. These systems, after the completion of the physical design of the basic table, immediately after the basic table to establish the first layer of views, the number and structure of this layer of views, and the number and structure of the basic table is exactly the same. And regulations, all programmers, all only allowed to operate on the view. Only the database administrator, with a number of people together with the “security key”, can operate directly on the basic table.

11. Intermediate, Reporting and Temporary Tables

An intermediate table is a table that holds statistical data and is designed for data warehousing, reporting or querying, and sometimes does not have a primary key or a foreign key (with the exception of data warehousing). Temporary tables are designed by individual programmers to hold temporary records for personal use. The base and intermediate tables are maintained by the DBA, and the temporary tables are automatically maintained by the programmer himself with his program.

12. Integrity constraints are manifested in three ways

Integrity of the field: Check is used to realize the constraints, in the database design tool, the range of values of the field is defined, there is a Check button, through which the value of the field is defined city.

Referential Integrity: It is implemented with PK, FK, and table level triggers.

User Defined Integrity: it is some business rules which are implemented with stored procedures and triggers.

13. The way to prevent patching of database design is the “Three Few Principles”

(1) The fewer the number of tables in a database, the better. Only a small number of tables can show that the system’s E–R diagram is less precise, remove the duplication of redundant entities, the formation of a high degree of abstraction of the objective world, the system’s data integration, to prevent the design of patching;

(2) the combination of the number of fields in a table of the primary key, the fewer the better. Because the role of the primary key, one is to build the primary key index, and the other is to do as a foreign key of the sub-table, so the combination of the primary key has fewer fields, not only saves the running time, but also saves the index storage space;

(3) The fewer the number of fields in a table, the better. Only a small number of fields, to show that there is no data duplication in the system, and there is little data redundancy, and more importantly, to urge readers to learn “columns into rows”, which prevents the sub-table of fields pulled into the main table, leaving many empty fields in the main table. The so-called “columns into rows”, that is, a part of the contents of the main table will be pulled out, another separate sub-table. This method is very simple, some people are not accustomed to, not adopted, not implemented. Practical principles of database design is: to find the right balance between data redundancy and processing speed. The “three less” is a holistic concept, a comprehensive view, can not isolate a principle. The principle is relative, not absolute. The principle of “three more” is definitely wrong. Imagine: if the same function of the coverage system, one hundred entities (a total of one thousand attributes) of the E–R diagram, certainly than two hundred entities (a total of two thousand attributes) of the E–R diagram, much better. The “three less” principle is advocated to tell the reader to learn to utilize database design techniques for data integration of the system. The steps of data integration is the file system set into the application database, the application database set into the subject database, the subject database set into the global integrated database. The higher the degree of integration, the stronger the data sharing, the less the phenomenon of information islands, the number of entities, the number of primary keys, the number of attributes in the global E-R diagram of the entire enterprise information system will be less.

The purpose of advocating the principle of “three less” is to prevent readers from utilizing patching techniques, constantly adding and deleting changes to the database, turning the enterprise database into a “garbage heap” of arbitrarily designed database tables or a “mishmash” of database tables. “large miscellaneous yard”, and finally cause the database of the basic table, code table, intermediate table, temporary table clutter, countless, resulting in enterprises and institutions of the information system can not be maintained and paralyzed. The principle of “three more” can be done by anyone, the principle of “patching method” of designing the database of the doctrine of distortion. The principle of “three less” is the principle of less and more precise, it requires a high level of database design skills and art, not anyone can do, because the principle is to put an end to the “patching method”

The theoretical basis for the design of the database.

14. Improve the operational efficiency of the database

In the given system hardware and system software conditions, to improve the operational efficiency of the database system is:

(1) in the physical design of the database, reduce the paradigm, increase redundancy, use fewer triggers, and use more stored procedures.

(2) When the calculation is very complex, and the number of records is very large (for example, 10 million), complex calculations should be first outside the database to the file system, the way the C++ language calculations to complete the processing, and then finally into the database to append to the table. This is the experience of telecom billing system design.

(3) found that a table has too many records, for example, more than 10 million, the table should be horizontally partitioned. Horizontal partitioning is done by dividing the records of the table horizontally into two tables, bounded by a certain value of the primary key PK of the table. If a table is found to have too many fields, for example, more than eighty, the table is split vertically, breaking the original one table into two.

(4) System optimization of the database management system DBMS, that is, optimizing various system parameters, such as the number of buffers.

(5) Try to adopt optimization algorithms when using data-oriented SQL language for programming.

In short, to improve the operational efficiency of the database, it is necessary to optimize the database system level, database design level optimization, program implementation level optimization, the three levels at the same time to work.

The above fourteen techniques are summarized by many people in a large number of database analysis and design practices. For the use of these experiences, the reader can not be helped, rote memorization, but to digest and understand, pragmatic, flexible grasp. And gradually do: in the application of the development, in the development of the application.