What is a compiler used for?
A compiler is a computer program that converts source code written in one programming language into another.
Its main purpose is to translate a source code program written in a high-level computer language that is easy for a human to write, read, and maintain into a program, or executable, in a lower-order machine language that a computer can interpret and run. The compiler takes the original program as input and translates it to produce an equivalent program in the target language. The source code is usually a high-level language such as Pascal, C, C++, C#, Java, etc., while the target language is assembly language or target code for the target machine, sometimes called machine code.
The main workflow of a modern compiler is as follows:
Source code→preprocessor→compiler→assembly program→target code→linker→executable file, and finally the packaged file can be given to the computer to read and run.
A program that translates an assembly language source program into a target program is called ()A compiler B interpreter C editor D assembler
A right, this is the basic topic of the principle of compilation, right
The basic function of a compiler is to translate a source program into a target program. However, as a compilation system with practical application value, in addition to the basic functions, it should have syntax checking, debugging measures, modification means, override processing, optimization of the target program, the use of different languages, as well as human-computer linkage and other important functions. ① Syntax checking: check whether the source program is grammatical. If it does not conform to the syntax, the compilation program should point out the part, nature and information of the syntax error. The compilation program should enable the user once on the machine, as much as possible to find out the error. ② Debugging measures: Check whether the source program is in accordance with the designer’s intention. To this end, the compiler is required to place some output instructions compiler in the compiled target program, so that when the target program is run, it can output information on the dynamic execution of the program, such as changes in the value of the variables, the lines experienced in the execution of the program and so on. This information helps the user to verify and validate whether the source program expresses the requirements of the algorithm. ③ Modification means: provide users with easy means to modify the source program. The compiled program usually provides batch modification means (for modifying a large number of errors or errors that are not easy to modify temporarily) and on-site modification means (for modifying a small number of errors that are easy to change temporarily at runtime). ④ Coverage processing: it is mainly set up to deal with large problematic programs with long program length and large amount of data. The basic idea is to make some program segments and data common to certain storage areas, which only store the current program or data to be used; the rest of the temporarily unused program and data, first stored in the disk and other auxiliary memory, to be dynamically transferred to when needed. ⑤ Target program optimization: improve the quality of the target program, i.e., occupy less storage space and shorten the running time of the program. According to the different optimization goals, the compiler can choose to achieve expression optimization, loop optimization or program global optimization. Some of the target program optimization is carried out at the source program level and some at the target program level. (6) Different language co-location: Its function helps users to write applications using multiple programming languages or to apply existing program modules written in different languages. The most common is the use of high-level language and assembly language. This can not only make up for the high-level language is difficult to express some non-numerical processing operations or direct control, access to peripheral devices and hardware registers of the shortcomings, but also conducive to the use of assembly language to write the core part of the program, in order to improve operational efficiency. (vii) human-machine links: determine the compiler to achieve the implementation of the program to achieve a well-designed function. The purpose is to facilitate the user in the compilation and operation stage in time to understand the internal workings of the system to effectively monitor and control the operation of the system. Compiler books early compiler program to achieve the program, the above functions are fully integrated into the compiler. However, the customary practice is to configure the debugger, editor and connection assembly programs under the support of the operating system to assist in the implementation of the program debugging, modification, override processing, as well as the use of different language functions. Careful consideration must be given to how to interface with these subsystems when designing compiled programs.
Read more about compilation principles from the library!
Components of the compiler and the functions and roles of each part
1. Lexical analysisThe lexical analyzer identifies individual tokens (tokens) in the source program according to lexical rules, each representing a class of words (lexeme). Common tokens in source programs can be grouped into several categories: keywords, identifiers, literals, and special symbols. The input to the lexical analyzer is the source program and the output is a stream of recognized tokens. The task of the lexical analyzer is to convert the stream of characters from the source file into a stream of notation. Essentially it looks at consecutive characters and recognizes them as “words”.2. Syntactic AnalysisThe syntactic analyzer identifies the structure (phrase, sentence) in the stream of tokens according to the syntactic rules and constructs a syntactic tree that correctly reflects the structure.3. Semantic AnalysisThe semantic analyzer performs static semantic checking of the syntactic units in the syntactic tree according to the semantic rules. The semantic analyzer carries out static semantic checking according to the semantic rules for the syntactic units in the syntax tree, such as type checking and conversion, etc. Its purpose is to ensure that the structure of the syntactically correct structure is also semantically legal.4. Intermediate Code GenerationThe intermediate code generator generates the intermediate code according to the output of the semantic analyzer. Intermediate code can take several forms, and their common feature is that they are machine-independent. The most commonly used type of intermediate code is the three-address code, which is realized in a quadratic form. The advantage of three-address code is that it is easy to read and easy to optimize.5. Intermediate Code OptimizationOptimization is an important part of a compiler, and since the compiler’s work of translating a source program into intermediate code is done mechanically and according to a fixed pattern, the intermediate code generated is often wasteful in terms of time and space. Optimization is necessary when efficient target code needs to be generated.6. Target Code GenerationTarget code generation is the final stage of the compiler. The following issues should be considered when generating the object code: the system structure of the computer, the instruction system, the allocation of registers and the organization of memory. The target program code generated by the compiler can be in a variety of forms: assembly language, relocatable binary code, and memory form.7 Symbol Table ManagementThe role of the symbol table is to record the necessary information about the symbols in the source program and to organize them in a reasonable manner so that they can be quickly and accurately looked up and manipulated in the various stages of the compiler. Some of the contents of the symbol table should even be retained until the runtime stage of the program.8 Error HandlingThere are often errors in the source programs written by users, which can be divided into two categories: static errors and dynamic errors. The so-called dynamic errors are logical errors in the source program that occur while the program is running, also known as dynamic semantic errors, such as a variable taking the value of zero as a divisor, and subscripts out of bounds when array elements are referenced. Static errors can be further categorized into syntax errors and static semantic errors. Syntactic errors are errors concerning the structure of the language, such as misspelling of words, missing operands in expressions, mismatch between begin and end, and so on. Static semantic errors are errors in language meaning that can be found when analyzing the source program, such as one of the two operands of addition is the name of an integer variable, while the other is the name of an array and so on.