Writing a simple Compiler on my own - Action Rules for Function Declarations (part 1) [C][Flex][Bison]
[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler
Introduction
Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will be writing the needed Action Rules for Function Declarations. This topic came out as an enormous article and so I will split it into 2 parts. Today's article will mostly be about the preparation of the AST structures, whilst the next article (tomorrow) will contain most of the actual use of today's code in the parser. The next article will also contain a compiler validation using examples. :)More specifically, the topics that we will cover today are:
- Small Tweaks and Fixes
- Visualizing the Problem
- The new AST nodes and creation functions
- Action Rules for Declarations
Requirements:
Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases, including function parameters
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for other cases
Difficulty:
Talking about the series in general this series can be rated:- Intermediate to Advanced
- Medium
Actual Tutorial Content
Small Tweaks and Fixes
There are some things that I miss-typed or forgot to change after copy-pasting, and also some other small things that I changed while adding code related to function declarations.A small change in the "symtab.c" function insert
A symbol table item defined in "symtab.h" stores the length of the stored entry in "st_size". But, we never really set the value of that entry inside of the function "insert". This might not be that important, but still worth noting as I did change it to be able to access the length of "st_name" without the use of the "strings.h" function "strlen()", which makes code smaller in some cases...So, a snipper of the code looks such as:
...
l = (list_t*) malloc(sizeof(list_t));
strncpy(l->st_name, name, len);
l->st_size = len;
...
Wrong type in creation function of Return_Node
While running the compiler I had a Segmentation Fault Error! I kept searching and searching for what caused the error. It really drove me insane! BUT, after putting lots and lots of console messages in each point of the new code (always do that!!) I found out that the problem must be caused by the return node somehow. Well, after taking a look at the creation function I saw that we used "FUNC_DECL" as the type of this node. This had to be some copy-paste error, as I kept copy-pasting to write all the functions faster. Well, either way the correct type that I've set now is "RETURN_NODE".Changes in printing and traversal
While writing the code I also made some changes in the messages that get printed out using the functions "ast_print_node" and "ast_traversal". I'm not talking about the new nodes that we will talk about later on, but about changes in the previous nodes :)I don't know every single change that I made, so just go to GitHub to see the changes for yourself in "ast.c". By the way, the code that I've uploaded on GitHub is about both parts. The changes in "ast.c" will be shown today tho..
After covering these small tweaks I made, let's now get into today's problem.
Visualizing the Problem
Our program is build up of declarations and statements of a main function (that we somewhat took care of already), but can also contain optional functions, with their own declarations and statements. A simple visualization of that looks as following:So, the topic of this article are these optional functions. But, before getting into them, we should first talk about the declarations in general, which of course are also used in functions. In previous articles we already defined a node that stores multiple statements, but we never really did that for declarations! Similar to statements, we have to define a new node that will store multiple declaration nodes that occur in a specific part of the program, which can be the main or any optional function. Knowing that the rule "declarations" is left-recursive we will use a similar concept to statements, where we take the information stored in the previous "declarations"-rule and add the newly found "declaration" to it. When there's no other declaration yet, we will just add this new declaration. A visualization of that is:
In the same way the function declarations can also be of any count, including zero. So, we will have to define a AST node that will store multiple function declarations as they occur. Here's a visualization of that:
So, how exactly is a function declaration build up? Well, the rule "function" is split up into two parts: "function_head" and "function_tail". The first part contains the return type, identifier (function name) and parameters, whilst the second part contains the "actual" code of the function, which includes optional declarations, optional statements and an optional "return_node". But, how can we access the information about the newly "forming" function declaration node? Similar to other times we will have to define a global variable that stores a temporary function declaration node, so that we can set the entries of that function, while they occur. A visualization of that is:
Taking a look at this visualization we can see that we will have to define nodes for the return type and parameters. The rest has already been done in previous articles. The return type will store the datatype and if we have a pointer or not. The parameters node will store Param structures and the count of them (num_of_pars), which means that a parameter can be easily defined by an entry "Param par" inside of the YYSTYPE union. We also created the function "def_param", which creates a parameter, taking a type and name (and also if it's being referenced or not, which will be used in function call parameters). Visualizing parameters we can can see that:
So, after visualizing all these structures let's now write them down!
The new AST nodes and creation functions
Adding these new nodes we now end up with these somewhat final Node_Type snippet:typedef enum Node_Type {
BASIC_NODE, // no special usage (for roots only)
// declarations
DECLARATIONS, // declarations
DECL_NODE, // declaration
...
// functions
FUNC_DECLS, // function declarations
FUNC_DECL, // function declaration
RET_TYPE, // function return type
DECL_PARAMS, // function parameters
RETURN_NODE, // return statement of functions
}Node_Type;
Let's get into the new nodes and their functions!
AST Node of Declarations
A "declarations"-node needs to have a dynamic array of "declaration"-node pointers (type-casted to AST_Node of course) and also the count of these declarations (declaration_count). So, this structure is defined as following:typedef struct AST_Node_Declarations{
enum Node_Type type; // node type
// declarations
struct AST_Node **declarations;
int declaration_count;
}AST_Node_Declarations;
The creation function of this type will take the previous declarations and declaration_count of another "declarations"-node (or NULL and 0 for a single "declaration") that stores the declarations that occurred up to this points and the newly occurred "declaration" that has to be inserted. So, the two files of the AST structure now contain:
ast.h:
...
AST_Node *new_declarations_node(AST_Node **declarations, int declaration_count, AST_Node *declaration);
...
--------------------------------------------------------------------------------------------------------
ast.c:
...
AST_Node *new_declarations_node(AST_Node **declarations, int declaration_count, AST_Node *declaration){
// allocate memory
AST_Node_Declarations *v = malloc (sizeof (AST_Node_Declarations));
// set node type
v->type = DECLARATIONS;
// first declaration
if(declarations == NULL){
declarations = (AST_Node**) malloc (sizeof (AST_Node*));
declarations[0] = declaration;
declaration_count = 1;
}
// add new declaration
else{
declarations = (AST_Node**) realloc (declarations, (declaration_count + 1) * sizeof (AST_Node*));
declarations[declaration_count] = declaration;
declaration_count++;
}
// set entries
v->declarations = declarations;
v->declaration_count = declaration_count;
// return type-casted result
return (struct AST_Node *) v;
}
...
AST Node of Function Declarations
Similarly, the structure that stores multiple function declarations looks as following:typedef struct AST_Node_Func_Declarations{
enum Node_Type type; // node type
// declarations
struct AST_Node **func_declarations;
int func_declaration_count;
}AST_Node_Func_Declarations;
The code with which we create such a node is also similar! Just add the newly found function declaration to the previously found function declarations of the "function declarations" node, by also incrementing the count. So, the code looks as following:
ast.h:
...
AST_Node *new_func_declarations_node(AST_Node **func_declarations, int func_declaration_count, AST_Node *func_declaration);
...
----------------------------------------------------------------------------------------------------------------------------
ast.c:
...
AST_Node *new_func_declarations_node(AST_Node **func_declarations, int func_declaration_count, AST_Node *func_declaration){
// allocate memory
AST_Node_Func_Declarations *v = malloc (sizeof (AST_Node_Func_Declarations));
// set node type
v->type = FUNC_DECLS;
// first declaration
if(func_declarations == NULL){
func_declarations = (AST_Node**) malloc (sizeof (AST_Node*));
func_declarations[0] = func_declaration;
func_declaration_count = 1;
}
// add new declaration
else{
func_declarations = (AST_Node**) realloc (func_declarations, (func_declaration_count + 1) * sizeof (AST_Node*));
func_declarations[func_declaration_count] = func_declaration;
func_declaration_count++;
}
// set entries
v->func_declarations = func_declarations;
v->func_declaration_count = func_declaration_count;
// return type-casted result
return (struct AST_Node *) v;
}
...
AST Node of a Function Declaration
The already defined function declaration node needs to be extended to be able to store if the return type is a pointer or not, and should also be able to include the declarations, statements and return_node AST nodes of the function tail. So, in the end the structure looks as following:typedef struct AST_Node_Func_Decl{
enum Node_Type type; // node type
// return type
int ret_type;
// is pointer or not
int pointer; // 0: not pointer, 1: pointer
// symbol table entry
list_t *entry;
// declarations, statements and return
struct AST_Node *declarations;
struct AST_Node *statements;
struct AST_Node *return_node;
}AST_Node_Func_Decl;
The creation function will only contain the first 3 entries, as the others will be managed later on! There's no need of any complicated behavior. We just the entries. So, the node creation code of this is:
ast.h:
...
AST_Node *new_ast_func_decl_node(int ret_type, int pointer, list_t *entry);
...
----------------------------------------------------------------------------
ast.c:
...
AST_Node *new_ast_func_decl_node(int ret_type, int pointer, list_t *entry){
// allocate memory
AST_Node_Func_Decl *v = malloc (sizeof (AST_Node_Func_Decl));
// set entries
v->type = FUNC_DECL;
v->ret_type = ret_type;
v->pointer = pointer;
v->entry = entry;
// return type-casted result
return (struct AST_Node *) v;
}
...
AST Node of the Return Type
As already told before a return type node has to store the actual return data type and if it's a pointer or not. So, the structure becomes:typedef struct AST_Node_Ret_Type{
enum Node_Type type; // node type
// return type
int ret_type;
// is pointer or not
int pointer; // 0: not pointer, 1: pointer
}AST_Node_Ret_Type;
The creation function is also fairly simple and looks as following:
ast.h:
...
AST_Node *new_ast_ret_type_node(int ret_type, int pointer);
...
----------------------------------------------------------------
ast.c:
...
AST_Node *new_ast_ret_type_node(int ret_type, int pointer){
// allocate memory
AST_Node_Ret_Type *v = malloc (sizeof (AST_Node_Ret_Type));
// set entries
v->type = RET_TYPE;
v->ret_type = ret_type;
v->pointer = pointer;
// return type-casted result
return (struct AST_Node *) v;
}
...
AST Node for Function Parameter Declarations
Function parameter declarations are again using the same concept as before with declarations. So, a function declaration parameters node has to store a dynamic array of parameters, by also storing the count of elements in it (num_of_pars). The structure is:typedef struct AST_Node_Decl_Params{
enum Node_Type type; // node type
// parameters
Param *parameters;
int num_of_pars;
}AST_Node_Decl_Params;
The code with which we create such a node is also similar to declarations and function declarations, taking the parameters array and parameter count from previous function declarations parameters rules and just adding the newly found parameter to this array. So, the code looks as following:
ast.h:
...
AST_Node *new_ast_decl_params_node(Param *parameters, int num_of_pars, Param param);
...
-----------------------------------------------------------------------------------
ast.c:
...
AST_Node *new_ast_decl_params_node(Param *parameters, int num_of_pars, Param param){
// allocate memory
AST_Node_Decl_Params *v = malloc (sizeof (AST_Node_Decl_Params));
// set node type
v->type = DECL_PARAMS;
// first declaration
if(parameters == NULL){
parameters = (Param*) malloc (sizeof (Param));
parameters[0] = param;
num_of_pars = 1;
}
// add new declaration
else{
parameters = (Param*) realloc (parameters, (num_of_pars + 1) * sizeof (Param));
parameters[num_of_pars] = param;
num_of_pars++;
}
// set entries
v->parameters = parameters;
v->num_of_pars = num_of_pars;
// return type-casted result
return (struct AST_Node *) v;
}
...
Non-terminal type definitions
Adding all these new node types and functions, and knowing which non-terminals will be used in our rules later on, we can also define the following new non-terminal types:%type <node> functions_optional functions function
%type <node> parameters_optional parameters
%type <par> parameter
%type <node> return_type
Action Rules for Declarations
So, before finishing of today's part, let's also get into the action rules for the declarations. By already creating the declaration in previous articles and talking about the creation function today, we can easily use this function as following:declarations:
declarations declaration
{
AST_Node_Declarations *temp = (AST_Node_Declarations*) $1;
$$ = new_declarations_node(temp->declarations, temp->declaration_count, $2);
}
| declaration
{
$$ = new_declarations_node(NULL, 0, $1);
}
;
Pretty simple right? We are just type-casting back to the correct type to be able to access the array and counter, so that we can pass them as parameters of the function, along the declaration that's "to be added". In the second sub-rule we put NULL and 0, as there's no other declaration yet!
RESOURCES
References:
No references, just using code that I implemented in my previous articles.Images:
All of the images are custom-made!Previous parts of the series
- Introduction -> What is a compiler, what you have to know and what you will learn
- A simple C Language -> Simplified C, comparison with C, tokens, basic structure
- Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
- Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
- Using Symbol Table in the Lexer -> Flex and Symbol Table combination
- Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
- Bison basics -> Bison tutorial actually
- Creating a grammar for our Language -> Grammar and first Parser
- Combine Flex and Bison -> lexer and parser combined
- Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
- Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
- Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
- Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
- Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
- Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
- Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
- Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
- Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
- Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
- Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
- Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
- Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
- Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
- Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler
- Action Rules for If-Else Statements -> Statements-node, Action rules for “simple” and “complicated” if-else statements, running the compiler
- Action Rules for Loop Statements and some Fixes -> Some fixes, Action rules of while statements, Action rules of for statements, running the compiler
Final words | Next up on the project
And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.Next up on this series are:
- Semantic analysis (using more action rules in Bison)
- Intermediate Code generation (using the AST structure for more cases)
- Machine Code generation (MIPS Assembly)
So, see ya next time!
GitHub Account:
https://github.com/drifter1Keep on drifting! ;)
Thank you for your contribution @drifter1.
Good tutorial and well explained as always. Your tutorials are very professional. Keep up the good work. We are waiting for more tutorials.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]
Thank you for your review, @portugalcoin! Keep up the good work!
Hey, @drifter1!
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Hi @drifter1!
Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.505 which ranks you at #6323 across all Steem accounts.
Your rank has not changed in the last three days.
In our last Algorithmic Curation Round, consisting of 325 contributions, your post is ranked at #174.
Evaluation of your UA score:
Feel free to join our @steem-ua Discord server