Writing a simple Compiler on my own - Action Rules for If-Else Statements [C][Flex][Bison]
[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler
Introduction
Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will write the needed Action Rules for If-Else Statements.More specifically, the topics that we will cover today are:
- The need of a Statements-node
- Action rules for "simple" if-else statements
- Action rules for if-else statements that contains else if's
- Running the compiler for "full_example.c"
Requirements:
Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases, including function parameters
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for other cases
Difficulty:
Talking about the series in general this series can be rated:- Intermediate to Advanced
- Medium
Actual Tutorial Content
The need of a Statements-Node
If-else statements, for statements, while statements, optional functions and even the "main" function, all mostly contain more than one statements in their "body" or "tail" as I call it sometimes in the grammar. So, there is the need of creating a new node that will store many "statement" nodes, to be able to access them later on. Knowing that statements are described recursively in our grammar, we can visualize it as follows:-||- means that the second "Statements" does the same as the first one.
So, let's create such a structure!
The Statements AST Node
This node will store an dynamic pointer array to AST_Node structures, where each of them will of course be a "statement". To know the amount of statements that are stored in this structure, we have to also define a variable to store the statement count. The whole structure will be of a new Node-type "STATEMENTS" and will have the name "AST_Node_Statements". So, let's define an array "statements" and the integer variable "statement_count" as entries of this structure.In code this looks as following:
typedef enum Node_Type {
...
STATEMENTS, // statements
...
}Node_Type;
...
typedef struct AST_Node_Statements{
enum Node_Type type; // node type
// statements
struct AST_Node **statements;
int statement_count;
}AST_Node_Statements;
...
Of course this code is inside of "ast.h".
Creation Function
The way with which the rule "statements" is build up makes it easy for us to use the creation function to add new statements to the "statements"-parent rule. So, the creation function, let's say "new_statements_node", will have 3 parameters which are:- The statements array of the previous "statements"-rule
- The statement_count value of the previous "statements"-rule
- The "statement" to be added
statements -> statements statement | statement ;
we understand that the first two parameters are useless for the second sub-rule of "statements". For this rule we will just pass NULL and 0 as parameters!
The function declaration in "ast.h" and actual code of "ast.c" are:
ast.h:
AST_Node *new_statements_node(AST_Node **statements,
int statement_count, AST_Node *statement);
-----------------------------------------------------------------
ast.c:
AST_Node *new_statements_node(AST_Node **statements,
int statement_count, AST_Node *statement){
// allocate memory
AST_Node_Statements *v = malloc (sizeof (AST_Node_Statements));
// set node type
v->type = STATEMENTS;
// first statement
if(statements == NULL){
statements = (AST_Node**) malloc (sizeof (AST_Node*));
statements[0] = statement;
statement_count = 1;
}
// add new statement
else{
statements = (AST_Node**) realloc (statements,
(statement_count + 1) * sizeof (AST_Node*));
statements[statement_count] = statement;
statement_count++;
}
// set entries
v->statements = statements;
v->statement_count = statement_count;
// return type-casted result
return (struct AST_Node *) v;
}
You can see that there are two cases, which correspond to the second and first sub-rule of "statements". Of course I also did some changes in the traversal and print functions of the nodes. To see these changes go to GitHub, where I always upload the code of the series.
Integrate with the "statements"-rule
In the second sub-rule we just create a new node with parameter NULL, 0 and the already created node stored in "statement". Knowing that all nodes are type-casted to "AST_Node", the second sub-rule will need to type-cast the already created "statements" node of the "statements"-part and give the statements-array and statement_count of this node as a parameter of the creation function, together with the "statement" node.So, the new rule now becomes:
statements:
statements statement
{
AST_Node_Statements *temp = (AST_Node_Statements*) $1;
$$ = new_statements_node(temp->statements, temp->statement_count, $2);
}
| statement
{
$$ = new_statements_node(NULL, 0, $1);
}
;
Also don't forget to include the token definition of "statements":
%type <node> statements
The same stuff will have to be done for "declarations" as well, but I will leave it for when we get to function declarations!
Action Rules for "Simple" If-Else Statements
The if statement is a somewhat complicated statement as it can include a dynamic amount of else if's and an optional else part. So, we will have to use concepts we already used in previous topics like "declarations", to take all the needed information/attributes and pass them over to the actual If-node! To solve a conflict with the else ifs and else, I had to split the "if_statement" rule into two parts: one with else if's and one without. So, the whole problem can be visualized as following:Let's first get into the "simple" case that does not require the complicated passing of Else-If nodes. Visualizing this simple case we see that:
(as you can see I put lots of visualizations today!)
So, what do we understand? Well, the condition is pretty simple as we already have the information stored in the expression. The actual if_branch is represented by the "tail"-rule, which will have to store the "statements"-node that it get's from the "statements"-rule inside of the bracelets. This tells us that the "tail" also has to be of type "node" and so we add it alongside "statements". In the same way the if_statement, optional_else and else_if rule are also of type node. And so we can already write the following non-terminal type-definitions:
%type <node> statements tail
%type <node> if_statement else_if optional_else
The "tail"-rule becomes:
tail: LBRACE statements RBRACE
{
$$ = $2; /* just pass information */
ast_traversal($2);
}
;
So, what remains now is the optional else. For the optional else we will just return the "statements"-node stored in the "tail"-part (else exists) or NULL (no else). The rule becomes:
optional_else:
ELSE tail
{
/* else exists */
$$ = $2;
}
| /* empty */
{
/* no else */
$$ = NULL;
}
;
Knowing what we store in each of these non-terminals, the action rule for this simple if statement looks as following:
if_statement:
IF LPAREN expression RPAREN tail else_if optional_else
{
/* later */
}
| IF LPAREN expression RPAREN tail optional_else
{
$$ = new_ast_if_node($3, $5, NULL, 0, $6);
}
;
You can see that we pass NULL and 0 for the else ifs. For the rest we just find the correct number to put after '$'. All the parameters have the correct type already and so there is no need of type-casting or creating temporary nodes..
Action Rules for If-Else Statements that contains Else If's
So, now for the more complicated case! We will be using the same idea that we used for declarations and initialization, where we had global variables to store the newly found names and values for the array initialization. In this case we will have a AST_Node pointer array to store the else-if branches (elsifs) and a counter variable for the number of else ifs (elseif_count). To add a new entry we will define a new function "add_elseif" that takes the newly found Else-if node as a parameter and adds it to the global array. We of course have two cases for that:- First entry of the array, where we will have to allocate the array for the first time
- General case, where we allocate memory for the new entry and add the entry to the end of the array
%{
...
// for else ifs
void add_elseif(AST_Node *elsif);
AST_Node **elsifs;
int elseif_count = 0;
%}
...
void add_elseif(AST_Node *elsif){
// first entry
if(elseif_count == 0){
elseif_count = 1;
elsifs = (AST_Node **) malloc(1 * sizeof(AST_Node));
elsifs[0] = elsif;
}
// general case
else{
elseif_count++;
elsifs = (AST_Node **) realloc(elsifs, elseif_count * sizeof(AST_Node));
elsifs[elseif_count - 1] = elsif;
}
}
...
All that we said until now, can be visualized as following
Using the function "add_elseif", the "else_if"-rule is pretty easy to write now! Both sub-rules do the same thing, but will have a different numeration for the non-terminals that are passed as parameters. The "else_if"-rule looks as following:
else_if:
else_if ELSE IF LPAREN expression RPAREN tail
{
AST_Node *temp = new_ast_elsif_node($5, $7);
add_elseif(temp);
}
| ELSE IF LPAREN expression RPAREN tail
{
AST_Node *temp = new_ast_elsif_node($4, $6);
add_elseif(temp);
}
;
Thinking about the "if_statement"-rule now, the only thing that changes is that we will use these global variables as parameters, by also re-initializing them back to 0 and NULL. So, with all that in mind the final "if_statement" rule becomes:
if_statement:
IF LPAREN expression RPAREN tail else_if optional_else
{
$$ = new_ast_if_node($3, $5, elsifs, elseif_count, $7);
elseif_count = 0;
elsifs = NULL;
}
| IF LPAREN expression RPAREN tail optional_else
{
$$ = new_ast_if_node($3, $5, NULL, 0, $6);
}
;
And this is it actually! Wasn't that bad right?
Running the Compiler for "full_example.c"
Before running the compiler I would like to point out that I removed many "ast_traversal" functions from the action rules, as they would just produce duplicate messages. We will only put "ast_traversal" functions for the declarations and statements rules now. Later on there will be only one traversal of the whole AST, that will be found in the "program"-rule. Of course this will be used for the machine code generation :)So, let's run the compiler:
You can see that we get new messages in the console! I also included some explanations for what got printed out. The main outer statements of the "full_example.c" file are of course only 4! The other statement node messages correspond to the for and while statements and more specifically to the "tail" of them. In the beginning we can also see the statements node of the last if-node that appears in the for statement. Nice huh? Next time we will get into Loops ;)
RESOURCES
References:
No references, just using code that I implemented in my previous articles.Images:
All of the images are custom-made!Previous parts of the series
- Introduction -> What is a compiler, what you have to know and what you will learn
- A simple C Language -> Simplified C, comparison with C, tokens, basic structure
- Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
- Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
- Using Symbol Table in the Lexer -> Flex and Symbol Table combination
- Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
- Bison basics -> Bison tutorial actually
- Creating a grammar for our Language -> Grammar and first Parser
- Combine Flex and Bison -> lexer and parser combined
- Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
- Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
- Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
- Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
- Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
- Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
- Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
- Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
- Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
- Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
- Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
- Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
- Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
- Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
- Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler
Final words | Next up on the project
And this is actually it for today's post! I hope that I explained everything well.Next up on this series are:
- Semantic analysis (using more action rules in Bison)
- Intermediate Code generation (using the AST structure for more cases)
- Machine Code generation (MIPS Assembly)
So, see ya next time!
GitHub Account:
https://github.com/drifter1Keep on drifting! ;)
Thank you for your contribution @drifter1.
Very good tutorial, thank you for your good work in developing these tutorials. Keep up the good work.
We look forward to seeing more tutorials of this quality.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]
Thank you for your review, @portugalcoin! Keep up the good work!
Hey, @drifter1!
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Congratulations @drifter1! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :
Click here to view your Board of Honor
If you no longer want to receive notifications, reply to this comment with the word
STOP
To support your work, I also upvoted your post!
Hi @drifter1!
Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.506 which ranks you at #6317 across all Steem accounts.
Your rank has dropped 11 places in the last three days (old rank 6306).
In our last Algorithmic Curation Round, consisting of 290 contributions, your post is ranked at #253.
Evaluation of your UA score:
Feel free to join our @steem-ua Discord server