Writing a simple Compiler on my own - Action Rules for Function Declarations (part 1) [C][Flex][Bison]

drifter1 (67)in #utopian-io • 7 years ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will be writing the needed Action Rules for Function Declarations. This topic came out as an enormous article and so I will split it into 2 parts. Today's article will mostly be about the preparation of the AST structures, whilst the next article (tomorrow) will contain most of the actual use of today's code in the parser. The next article will also contain a compiler validation using examples. :)

More specifically, the topics that we will cover today are:

Small Tweaks and Fixes
Visualizing the Problem
The new AST nodes and creation functions
Action Rules for Declarations

Requirements:

Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:

What Compiler Design is (mainly the steps)
For which exact Language the Compiler is build for (Tokens and Grammar)
How to use Flex and Bison
How to implement a lexer and parser for the language using those tools
What the Symbol Table is and how we implement it
How we combine Flex and Bison together
How we can pass information from the Lexer to the Parser
How we define operator priorities, precedencies and associativity
What Semantic Analysis is (Attributes, SDT etc.)
How we do the so called "Scope Resolution"
How we declare types and check the type inter-compatibility for different cases, including function parameters
How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
Intermediate Code Representations, including AST's
How we implement an AST (structure and management)
Action Rules for other cases

Difficulty:

Talking about the series in general this series can be rated:

Intermediate to Advanced

Today's topic(s) can be rated:

Medium

So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Small Tweaks and Fixes

There are some things that I miss-typed or forgot to change after copy-pasting, and also some other small things that I changed while adding code related to function declarations.

A small change in the "symtab.c" function insert

A symbol table item defined in "symtab.h" stores the length of the stored entry in "st_size". But, we never really set the value of that entry inside of the function "insert". This might not be that important, but still worth noting as I did change it to be able to access the length of "st_name" without the use of the "strings.h" function "strlen()", which makes code smaller in some cases...

So, a snipper of the code looks such as:

...

l = (list_t*) malloc(sizeof(list_t));
strncpy(l->st_name, name, len);
l->st_size = len;

...

Wrong type in creation function of Return_Node

While running the compiler I had a Segmentation Fault Error! I kept searching and searching for what caused the error. It really drove me insane! BUT, after putting lots and lots of console messages in each point of the new code (always do that!!) I found out that the problem must be caused by the return node somehow. Well, after taking a look at the creation function I saw that we used "FUNC_DECL" as the type of this node. This had to be some copy-paste error, as I kept copy-pasting to write all the functions faster. Well, either way the correct type that I've set now is "RETURN_NODE".

Changes in printing and traversal

While writing the code I also made some changes in the messages that get printed out using the functions "ast_print_node" and "ast_traversal". I'm not talking about the new nodes that we will talk about later on, but about changes in the previous nodes :)

I don't know every single change that I made, so just go to GitHub to see the changes for yourself in "ast.c". By the way, the code that I've uploaded on GitHub is about both parts. The changes in "ast.c" will be shown today tho..

After covering these small tweaks I made, let's now get into today's problem.

Visualizing the Problem

Our program is build up of declarations and statements of a main function (that we somewhat took care of already), but can also contain optional functions, with their own declarations and statements. A simple visualization of that looks as following:

So, the topic of this article are these optional functions. But, before getting into them, we should first talk about the declarations in general, which of course are also used in functions. In previous articles we already defined a node that stores multiple statements, but we never really did that for declarations! Similar to statements, we have to define a new node that will store multiple declaration nodes that occur in a specific part of the program, which can be the main or any optional function. Knowing that the rule "declarations" is left-recursive we will use a similar concept to statements, where we take the information stored in the previous "declarations"-rule and add the newly found "declaration" to it. When there's no other declaration yet, we will just add this new declaration. A visualization of that is:

In the same way the function declarations can also be of any count, including zero. So, we will have to define a AST node that will store multiple function declarations as they occur. Here's a visualization of that:

So, how exactly is a function declaration build up? Well, the rule "function" is split up into two parts: "function_head" and "function_tail". The first part contains the return type, identifier (function name) and parameters, whilst the second part contains the "actual" code of the function, which includes optional declarations, optional statements and an optional "return_node". But, how can we access the information about the newly "forming" function declaration node? Similar to other times we will have to define a global variable that stores a temporary function declaration node, so that we can set the entries of that function, while they occur. A visualization of that is:

Taking a look at this visualization we can see that we will have to define nodes for the return type and parameters. The rest has already been done in previous articles. The return type will store the datatype and if we have a pointer or not. The parameters node will store Param structures and the count of them (num_of_pars), which means that a parameter can be easily defined by an entry "Param par" inside of the YYSTYPE union. We also created the function "def_param", which creates a parameter, taking a type and name (and also if it's being referenced or not, which will be used in function call parameters). Visualizing parameters we can can see that:

So, after visualizing all these structures let's now write them down!

The new AST nodes and creation functions

Adding these new nodes we now end up with these somewhat final Node_Type snippet:

typedef enum Node_Type {
    BASIC_NODE,   // no special usage (for roots only)
    // declarations
    DECLARATIONS, // declarations
    DECL_NODE,    // declaration

    ...

    // functions
    FUNC_DECLS,   // function declarations
    FUNC_DECL,    // function declaration
    RET_TYPE,     // function return type
    DECL_PARAMS,  // function parameters
    RETURN_NODE,  // return statement of functions
}Node_Type;

Let's get into the new nodes and their functions!

AST Node of Declarations

A "declarations"-node needs to have a dynamic array of "declaration"-node pointers (type-casted to AST_Node of course) and also the count of these declarations (declaration_count). So, this structure is defined as following:

typedef struct AST_Node_Declarations{
    enum Node_Type type; // node type

    // declarations
    struct AST_Node **declarations;
    int declaration_count;
}AST_Node_Declarations;

The creation function of this type will take the previous declarations and declaration_count of another "declarations"-node (or NULL and 0 for a single "declaration") that stores the declarations that occurred up to this points and the newly occurred "declaration" that has to be inserted. So, the two files of the AST structure now contain:

ast.h:

...

AST_Node *new_declarations_node(AST_Node **declarations, int declaration_count, AST_Node *declaration);

...

--------------------------------------------------------------------------------------------------------

ast.c:

...

AST_Node *new_declarations_node(AST_Node **declarations, int declaration_count, AST_Node *declaration){
	// allocate memory
	AST_Node_Declarations *v = malloc (sizeof (AST_Node_Declarations));

	// set node type
	v->type = DECLARATIONS;

	// first declaration
	if(declarations == NULL){
		declarations = (AST_Node**) malloc (sizeof (AST_Node*));
		declarations[0] = declaration;
		declaration_count = 1;
	}
	// add new declaration
	else{
		declarations = (AST_Node**) realloc (declarations, (declaration_count + 1) * sizeof (AST_Node*));
		declarations[declaration_count] = declaration;
		declaration_count++;
	}

	// set entries
	v->declarations = declarations;
	v->declaration_count = declaration_count;

	// return type-casted result
	return (struct AST_Node *) v;
}

...

AST Node of Function Declarations

Similarly, the structure that stores multiple function declarations looks as following:

typedef struct AST_Node_Func_Declarations{
    enum Node_Type type; // node type

    // declarations
    struct AST_Node **func_declarations;
    int func_declaration_count;
}AST_Node_Func_Declarations;

The code with which we create such a node is also similar! Just add the newly found function declaration to the previously found function declarations of the "function declarations" node, by also incrementing the count. So, the code looks as following:

ast.h:

...

AST_Node *new_func_declarations_node(AST_Node **func_declarations, int func_declaration_count, AST_Node *func_declaration);

...

----------------------------------------------------------------------------------------------------------------------------

ast.c:

...

AST_Node *new_func_declarations_node(AST_Node **func_declarations, int func_declaration_count, AST_Node *func_declaration){
	// allocate memory
	AST_Node_Func_Declarations *v = malloc (sizeof (AST_Node_Func_Declarations));

	// set node type
	v->type = FUNC_DECLS;

	// first declaration
	if(func_declarations == NULL){
		func_declarations = (AST_Node**) malloc (sizeof (AST_Node*));
		func_declarations[0] = func_declaration;
		func_declaration_count = 1;
	}
	// add new declaration
	else{
		func_declarations = (AST_Node**) realloc (func_declarations, (func_declaration_count + 1) * sizeof (AST_Node*));
		func_declarations[func_declaration_count] = func_declaration;
		func_declaration_count++;
	}

	// set entries
	v->func_declarations = func_declarations;
	v->func_declaration_count = func_declaration_count;

	// return type-casted result
	return (struct AST_Node *) v;
}

...

AST Node of a Function Declaration

The already defined function declaration node needs to be extended to be able to store if the return type is a pointer or not, and should also be able to include the declarations, statements and return_node AST nodes of the function tail. So, in the end the structure looks as following:

typedef struct AST_Node_Func_Decl{
	enum Node_Type type; // node type

	// return type
	int ret_type;

	// is pointer or not
	int pointer; // 0: not pointer, 1: pointer

	// symbol table entry
	list_t *entry;

	// declarations, statements and return
	struct AST_Node *declarations;
	struct AST_Node *statements;
	struct AST_Node *return_node;
}AST_Node_Func_Decl;

The creation function will only contain the first 3 entries, as the others will be managed later on! There's no need of any complicated behavior. We just the entries. So, the node creation code of this is:

ast.h:

...

AST_Node *new_ast_func_decl_node(int ret_type, int pointer, list_t *entry);

...

----------------------------------------------------------------------------

ast.c:

...

AST_Node *new_ast_func_decl_node(int ret_type, int pointer, list_t *entry){
	// allocate memory
	AST_Node_Func_Decl *v = malloc (sizeof (AST_Node_Func_Decl));

	// set entries
	v->type = FUNC_DECL;
	v->ret_type = ret_type;
	v->pointer = pointer;
	v->entry = entry;

	// return type-casted result
	return (struct AST_Node *) v;
}

...

AST Node of the Return Type

As already told before a return type node has to store the actual return data type and if it's a pointer or not. So, the structure becomes:

typedef struct AST_Node_Ret_Type{
	enum Node_Type type; // node type

	// return type
	int ret_type;

	// is pointer or not
	int pointer; // 0: not pointer, 1: pointer
}AST_Node_Ret_Type;

The creation function is also fairly simple and looks as following:

ast.h:

...

AST_Node *new_ast_ret_type_node(int ret_type, int pointer); 

...

----------------------------------------------------------------

ast.c:

...

AST_Node *new_ast_ret_type_node(int ret_type, int pointer){
	// allocate memory
	AST_Node_Ret_Type *v = malloc (sizeof (AST_Node_Ret_Type));

	// set entries
	v->type = RET_TYPE;
	v->ret_type = ret_type;
	v->pointer = pointer;

	// return type-casted result
	return (struct AST_Node *) v;
}

...

AST Node for Function Parameter Declarations

Function parameter declarations are again using the same concept as before with declarations. So, a function declaration parameters node has to store a dynamic array of parameters, by also storing the count of elements in it (num_of_pars). The structure is:

typedef struct AST_Node_Decl_Params{
	enum Node_Type type; // node type

	// parameters
	Param *parameters;
	int num_of_pars;
}AST_Node_Decl_Params;

The code with which we create such a node is also similar to declarations and function declarations, taking the parameters array and parameter count from previous function declarations parameters rules and just adding the newly found parameter to this array. So, the code looks as following:

ast.h:

...

AST_Node *new_ast_decl_params_node(Param *parameters, int num_of_pars, Param param);

...

-----------------------------------------------------------------------------------

ast.c:

...

AST_Node *new_ast_decl_params_node(Param *parameters, int num_of_pars, Param param){
	// allocate memory
	AST_Node_Decl_Params *v = malloc (sizeof (AST_Node_Decl_Params));

	// set node type
	v->type = DECL_PARAMS;

	// first declaration
	if(parameters == NULL){
		parameters = (Param*) malloc (sizeof (Param));
		parameters[0] = param;
		num_of_pars = 1;
	}
	// add new declaration
	else{
		parameters = (Param*) realloc (parameters, (num_of_pars + 1) * sizeof (Param));
		parameters[num_of_pars] = param;
		num_of_pars++;
	}

	// set entries
	v->parameters = parameters;
	v->num_of_pars = num_of_pars;

	// return type-casted result
	return (struct AST_Node *) v;
}

...

Non-terminal type definitions

Adding all these new node types and functions, and knowing which non-terminals will be used in our rules later on, we can also define the following new non-terminal types:

%type <node> functions_optional functions function
%type <node> parameters_optional parameters
%type <par>  parameter
%type <node> return_type

Action Rules for Declarations

So, before finishing of today's part, let's also get into the action rules for the declarations. By already creating the declaration in previous articles and talking about the creation function today, we can easily use this function as following:

declarations: 
  declarations declaration
  {
    AST_Node_Declarations *temp = (AST_Node_Declarations*) $1;
    $$ = new_declarations_node(temp->declarations, temp->declaration_count, $2);
  }
  | declaration
  {
    $$ = new_declarations_node(NULL, 0, $1);
  }
;

Pretty simple right? We are just type-casting back to the correct type to be able to access the array and counter, so that we can pass them as parameters of the function, along the declaration that's "to be added". In the second sub-rule we put NULL and 0, as there's no other declaration yet!

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

Introduction -> What is a compiler, what you have to know and what you will learn
A simple C Language -> Simplified C, comparison with C, tokens, basic structure
Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
Using Symbol Table in the Lexer -> Flex and Symbol Table combination
Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
Bison basics -> Bison tutorial actually
Creating a grammar for our Language -> Grammar and first Parser
Combine Flex and Bison -> lexer and parser combined
Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler
Action Rules for If-Else Statements -> Statements-node, Action rules for “simple” and “complicated” if-else statements, running the compiler
Action Rules for Loop Statements and some Fixes -> Some fixes, Action rules of while statements, Action rules of for statements, running the compiler

Final words | Next up on the project

And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:

Semantic analysis (using more action rules in Bison)
Intermediate Code generation (using the AST structure for more cases)
Machine Code generation (MIPS Assembly)

Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

#tutorials #programming #compiler #coding

7 years ago in #utopian-io by drifter1 (67)

$19.54

Sort:

Trending

[-]

portugalcoin (72) 7 years ago

Thank you for your contribution @drifter1.

Good tutorial and well explained as always. Your tutorials are very professional. Keep up the good work. We are waiting for more tutorials.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

$7.82

13 votes

[-]

utopian-io (71) 7 years ago

Thank you for your review, @portugalcoin! Keep up the good work!

$0.00

[-]

utopian-io (71) 7 years ago

Hey, @drifter1!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

1 vote

[-]

steem-ua (64) 7 years ago

Hi @drifter1!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.505 which ranks you at #6323 across all Steem accounts.
Your rank has not changed in the last three days.

In our last Algorithmic Curation Round, consisting of 325 contributions, your post is ranked at #174.

Evaluation of your UA score:

You're on the right track, try to gather more followers.
The readers appreciate your great work!
Try to improve on your user engagement! The more interesting interaction in the comments of your post, the better!

Feel free to join our @steem-ua Discord server

$0.00

1 vote

STEEM 0.12

TRX 0.34

JST 0.033

BTC 120440.15

ETH 4483.35

SBD 0.78

Writing a simple Compiler on my own - Action Rules for Function Declarations (part 1) [C][Flex][Bison]

Introduction

More specifically, the topics that we will cover today are:

Requirements:

Difficulty:

Actual Tutorial Content

Small Tweaks and Fixes

A small change in the "symtab.c" function insert

Wrong type in creation function of Return_Node

Changes in printing and traversal

Visualizing the Problem

The new AST nodes and creation functions

AST Node of Declarations

AST Node of Function Declarations

AST Node of a Function Declaration

AST Node of the Return Type

AST Node for Function Parameter Declarations

Non-terminal type definitions

Action Rules for Declarations

RESOURCES

References:

Images:

Previous parts of the series

Final words | Next up on the project

GitHub Account:

Hi @drifter1!

Evaluation of your UA score:

Coin Marketplace