All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler
IntroductionHello it's a me again @drifter1! First of all, happy new year to all of you! I was very busy with exercises and studying, but here we are again!Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison.
In this article we will start talking about the Assignment checking problem. Function calls are being used inside of the assignment expressions, which means that we will have to perform the assignment check as a revisit check, similar to how we did function parameter checking, in the previous articles! This topic is quite large and so I will split it into some parts!
The topics that we will cover today are:
- Visualizing the Problem
Requirements:Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases, including function parameters
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for AST nodes and more
Difficulty:Talking about the series in general this series can be rated:
- Intermediate to Advanced
Actual Tutorial Content
Visualizing the ProblemAn assignment statement consists of an variable that we assign a value too, the '=' sign and the actual value that we assign. The variable is of a specific datatype and so is the expression/value that we assign. The datatypes need to be compatible with each other. Getting the datatype of each branch is quite easy in our compiler, as we defined structures and functions that give us that specific information. So, in general, such a statement can be visualized as following:
As you can see the datatype of the variable is stored inside of the symbol table entry, and more specifically in the "st_type" variable, but we will still use the more general "get_type()" function, to be more safe, as pointers and arrays store this information in the "inf_type" entry. In the same way, getting the datatype of the expression "assign_val" is also easy and done using a specified function. But, don't think that we are done, simply like that! The expression used in the right part of the assignment, might contain a function call, for which we might not know the return datatype yet!
The need of a "contains revisit" flag variableTo know if such a revisit case occurred inside of that expression, we will define a new flag variable "cont_revisit" that tells us if the expression "contains a revisit". This can be managed very easily during the call of the "expression_data_type()" function, as the function identifier will have "UNDEF" in both entries!
Whenever this value has a value of '1' we know that we have to do something with revisit queues. If the value is '0' we can already perform the assignment check, as no function call was inside of the assignment expression!
The revisit queue structureTo be able to perform the assignment check afterwards, during a revisit, we will have to store some information in the revisit queue entry! More specifically, we just have to store the "assign_val" expression node in a "nodes" array and the count of those (num_of_assigns). As a visualization this looks like this:
Main DiagramThe steps that we have to follow to solve this problem can be visualized by the following diagram:
You can see that the "new" case of revisiting is quite interesting! We should first check if an entry exists, to not create a new one with the same name, for the same variable, that just occurs in more assignment statements. Either way, if we create or don't create a new entry, we will end up managing the "nodes" array, to add the information of the newly found assignment, and also increment the count of them. In the end, we should not forget to reset the flag variable, so that it is '0' again for the next assignment! The actual assignment check will happen much later on, after the parsing is done, so that all the functions have been declared, to let us use the correct return datatype for the assignment check!
Example program assignmentsThinking about the "classic" example program "full_example.c" we know that we have to check the following assignments:
The compiler should add these 3 assignments, in two separate entries, depending on the variable. After the parsing is finished and the identifier of the function call stores the correct return datatype, we can perform the revisit, so that we can check if the variable and assign-expression are inter-compatible or not. Of course, this example program is build in such a way, that no errors exist...
References:No references, just using code that I implemented in my previous articles.
Images:All of the images are custom-made!
Previous parts of the series
General Knowledge and Lexical Analysis
- A simple C Language
- Lexical Analysis using Flex
- Symbol Table (basic structure)
- Using Symbol Table in the Lexer
Semantic Analysis (1)
Intermediate Code Generation (AST)
Semantic Analysis (2)
Final words | Next up on the projectAnd this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:
- Semantic analysis (using even more action rules in Bison)
- Machine Code generation (MIPS Assembly)
So, see ya next time!
Keep on drifting! ;)