Writing a simple Compiler on my own - Action Rules for Function Declarations (part 2) [C][Flex][Bison]

drifter1 (67)in #utopian-io • 8 years ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. This is part 2 of the previous article that you should go and read before-hand! Today we will actually use all the AST Nodes and functions that we wrote last time as action code in our parser!

More specifically, the topics that we will cover today are:

Action Rules for Function declarations
Action Rules for Function declaration
Action Rules for Parameters
Traversing after parsing is done using "program"
Compiler validation using examples

Requirements:

Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:

What Compiler Design is (mainly the steps)
For which exact Language the Compiler is build for (Tokens and Grammar)
How to use Flex and Bison
How to implement a lexer and parser for the language using those tools
What the Symbol Table is and how we implement it
How we combine Flex and Bison together
How we can pass information from the Lexer to the Parser
How we define operator priorities, precedencies and associativity
What Semantic Analysis is (Attributes, SDT etc.)
How we do the so called "Scope Resolution"
How we declare types and check the type inter-compatibility for different cases, including function parameters
How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
Intermediate Code Representations, including AST's
How we implement an AST (structure and management)
Action Rules for other cases

Difficulty:

Talking about the series in general this series can be rated:

Intermediate to Advanced

Today's topic(s) can be rated:

Medium

So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Action Rules for Function declarations

Continuing on with function declarations now, we know that functions are optional. So, we first have to pass the information from the actual "functions" rule that has to store a function declarations node to the "functions_optional" rule. When there's no function declaration we will pass NULL.

We can visualize this as following (from last time):

So, the "functions_optional"-rule looks like this:

functions_optional: 
  functions
  {
    $$ = $1;
  }
  | /* empty */
  {
    $$ = NULL;
  }
;

For the actual "functions" rule that gives us a Function Declarations node we will have to use the node creation function. The parameters will be gotten from the declarations array and declaration count of a previous "function" non-terminal or will be NULL and 0, when it's the first declaration. The function declaration that's to be added will be given as the third and final parameter! So, the "functions"-rule looks as following:

functions: 
  functions function
  {
    AST_Node_Func_Declarations *temp = (AST_Node_Func_Declarations*) $1;
    $$ = new_func_declarations_node(temp->func_declarations, temp->func_declaration_count, $2);
  }
  | function
  {
    $$ = new_func_declarations_node(NULL, 0, $1);
  }
;

But, to do this we also have to set up the function declaration...

Action Rules for Function declaration

As we already said last time, we have to create a global temporary variable that stores the current function declaration node. That way the "far" childs of the "function"-rule will be able to access and change the information from this node. So, the parser now starts with:

%{

  ...

  // for functions
  AST_Node_Func_Decl *temp_function;
%}

The functions-rule already has some actions that manage the scope. We increase it before the function_head hide it after the function_tail, and so function declaration is over. Knowing that the function node will be stored in the "temp_function" variable, we just have to type-cast this node and give it directly to the function rule! So, the action code of "functions" is:

function: { incr_scope(); } function_head function_tail
{ 
    hide_scope();
    $$ = (AST_Node *) temp_function;
} 
;

The visualization of a function declaration, that I also had last time is:

Function head

Things start getting interesting in the function_head rule, where we also manage if we are declaring using the "declare" variable (if you remember). After "finding" the return type, pointer "flag" and symbol table entry, which are stored inside of the return_type node and ID itself, we can create the temporary function node! After type-casting the Return Type node we can access the ret_type and pointer variable, which can be easily passed to the node creation function, which will return the node to "temp_function". After doing that we should also set the type of the symbol table entry to "FUNCTION_TYPE" and the pointing type stored in "inf_type" to the return type of the function, that can be gotten from many points. After that the rule continues, managing the parameters, that I will leave for later...
So, let's get into the "current" action code of function_head:

function_head: { declare = 1; } return_type ID LPAREN
  { 
    declare = 0;

    AST_Node_Ret_Type *temp = (AST_Node_Ret_Type *) $2;
    temp_function = (AST_Node_Func_Decl *)
         new_ast_func_decl_node(temp->ret_type, temp->pointer, $3);

    temp_function->entry->st_type = FUNCTION_TYPE;
    temp_function->entry->inf_type = temp->ret_type;
  }

  ...

;

Return Type

The return type is pretty easy to set up as we already know the type that is stored in "type" (in form of an integer) and only have to give '0' or '1' as second parameter of the node creation function, showing if we have a pointer or not. So, the "return_type"-rule looks as following:

return_type:
	type
	{
		$$ = new_ast_ret_type_node($1, 0);
	}
	| type pointer
	{
		$$ = new_ast_ret_type_node($1, 1);
	}
;

Optionals

The optional declarations, statements and return node are also pretty simple. In the first two we just have to set the entry of the temporary function node "temp_function" to the declarations or statements node, or NULL if there's no declaration or statement. In the same way the entry "return_node" of the temporary node will be set to NULL or to the result that we get from the node creation function. For now, where we don't know the data type of expressions, we will just expect the return expression to have the same type as the temporary function node. So, these three function_tail sub-rules look as following:

declarations_optional: 
  declarations 
  {
    temp_function->declarations = $1;
  }
  | /* empty */
  {
    temp_function->declarations = NULL;
  }
;

statements_optional: 
  statements
  {
    temp_function->statements = $1;
  } 
  | /* empty */
  {
    temp_function->statements = NULL;
  }
;

return_optional:
  RETURN expression SEMI
  {
    temp_function->return_node = new_ast_return_node(temp_function->ret_type, $2);
  }
  | /* empty */
  {
    temp_function->return_node = NULL;
  }
;

Of course all these, including function_tail, don't store any information! These nodes are of no actual type! We could of course get into a different implementation, but this way is much simpler to understand :)

Action Rules for Parameters

Parameters can be visualized as following (from last time):

Parameters are also optional and so the "parameters_optional" has to get it's value from the "parameters" rule. When there's no parameter we will pass NULL. So, this rule looks as following:

parameters_optional: 
  parameters
  {
    $$ = $1;
  }
  | /* empty */
  {
    $$ = NULL;
  }
;

The parameters-rule will use the node creation function, similar to the function declarations. This means that we will be adding the newly found parameter to the parameters array that we get from the previous parameter declarations node, by also incrementing the counter of parameters. When the parameter is the first or only parameter then we will pass NULL and 0 as parameters of the node creation function, else we will pass the "previous" array and counter, that has been gotten from the previous node. So, the "parameters"-rule looks as following:

parameters: 
  parameters COMMA parameter
  {
    AST_Node_Decl_Params *temp = (AST_Node_Decl_Params *) $1;
    $$ = new_ast_decl_params_node(temp->parameters, temp->lnum_of_pars, $3);
  }
  | parameter
  {
    $$ = new_ast_decl_params_node(NULL, 0, $1);
  }
;

Defining the parameter non-terminal as a Param structure and knowing that we have defined a function called "def_param" that allows the creation of parameters with a specific data type, name and "reference way" (which is not needed here), we can easily set up this new parameter such as:

parameter : { declare = 1; } type variable
{ 
    declare = 0;
    $$ = def_param($2, $3->st_name, 0);
}
;

Of course we also set the value of "declare" correspondingly to '1' and '0' afterwards, as we are declaring new variables!

After making the parameters we can now finish the function_head rule. The parameters and number of parameters will be stored in the symbol table entry of the function that we are able to function though the temporary function node. When there's no parameter we will pass NULL and 0, else we will type-cast the parameters_optional node to access the array and counter of it. So, in the end the "function_head"-rule becomes:

function_head: { declare = 1; } return_type ID LPAREN
  { 
    declare = 0;

    AST_Node_Ret_Type *temp = (AST_Node_Ret_Type *) $2;
    temp_function = (AST_Node_Func_Decl *)
    new_ast_func_decl_node(temp->ret_type, temp->pointer, $3);

    temp_function->entry->st_type = FUNCTION_TYPE;
    temp_function->entry->inf_type = temp->ret_type;
  }
  parameters_optional RPAREN
  {
    if($6 != NULL){
      AST_Node_Decl_Params *temp = (AST_Node_Decl_Params *) $6;

      temp_function->entry->parameters = temp->parameters;
      temp_function->entry->num_of_pars = temp->num_of_pars;
    }
    else{
      temp_function->entry->parameters = NULL;
      temp_function->entry->num_of_pars = 0;
    }		
  }
;

Traversing after parsing is done using "program"

Instead of printing each node on it's own, causing duplicate nodes and a difficulty understanding what the compiler "found", we will traverse the declarations, statements and function_optional non-terminals. The first stores the declarations in form of a declarations nodes. The second stores the statements in form of a statements node. The last one stores the function declarations inside of a function declaration nodes. So, traversing each one of them, we can easily traverse the whole program!

So, the new program-rule looks as following:


 program: 
    declarations { ast_traversal($1); }
    statements   { ast_traversal($3); }
    RETURN SEMI functions_optional { ast_traversal($7); }
;

Compiler valdation using examples

After doing that we can now get into some examples. Let's split them up based on the number of functions.

Example.c with no functions

The first example contains no functions at all. The code of it looks as following:

int i;
double val, res;
val = 2.5;
res = val + 1;
return;

Let's run the compiler for it to see what we get:

As you can see the compiler worked perfectly without functions, as we wanted!

Example2.c with one function

The code of this example is:

int i;
double val = 2.5, res[10];
for(i = 0; i < 10; i++){
    res[i] = operation(val, i);
    val = res[i];
    print(res[i]);
    print('\n');
}
return;
double operation (double value, int i){ 
    // declarations
    double res;
    // statements
    res = value*i + i;
    return res;
}

Running the compiler for this example we get:

You can see that the compiler found:

the data type double which is represented by '2'
the function name "operation"
The two parameters "value" and 'i', which are of type '2' and '1' that represent double and integer respectively
The declaration of the double "res"
The assignment statement of "res", and with the correct post-fix order of operators
The return node that returns the identifier "res"

Full_example.c with 3 functions

Let's finally also run it for the "famous" example that contains most of the statements of the grammar. The code of it is:

// declarations
int i;                    // simple variable
char c = 'c';             // one with init
double val = 2.5, res[6]; // two variables, one with init and one array
double *p;                // pointer variable
// statements
p = &res;
for(i = 0; i < 10; i++){
	if(i > 5){
    	break;
	}
	else if(i == 5){
		i = 2 * i;
		val = func1();
		*p = add(val, i);
		print(res[i]);
		print("\n");
		continue;
	}
	else{
		*p = add(val, i);
    	val = res[i];
   	 	print(res[i]);
    	print("\n");
    	p = p + 1;
	}

	if(i == 2 && val == 4.5){
		print("iteration: 3\n");
	}
}

while(i <<12){
	print(i);
	print(" ");
	func2(c);
	i++;
}
print("\n");
return; /* RETURN SEMI */
// other functions (functions)
int func1(){ 		/* without parameters */
	return 5;
}
void func2(char c){ /* with one parameter */
	char *s;
	*s = c;
	print(*s);
}
double add (double a, int b){  /* with two parameters */
    double res;
    res = a + b + (-5);
    return res;
}

Running the compiler for it we get:

The compiler truly found that:

we have 3 functions
the first function has no parameters, declarations and statements, but only a return node of '5'
the second function has one parameter (char 'c' represented by data type '1'), one declaration (of one variable/name with data type '2', which is double) and two statements (an assignment of 's' and the "invisible for now" print function call)
the third function which has two parameters, one declaration, a somewhat complicated statement and a return node

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

Introduction -> What is a compiler, what you have to know and what you will learn
A simple C Language -> Simplified C, comparison with C, tokens, basic structure
Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
Using Symbol Table in the Lexer -> Flex and Symbol Table combination
Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
Bison basics -> Bison tutorial actually
Creating a grammar for our Language -> Grammar and first Parser
Combine Flex and Bison -> lexer and parser combined
Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler
Action Rules for If-Else Statements -> Statements-node, Action rules for “simple” and “complicated” if-else statements, running the compiler
Action Rules for Loop Statements and some Fixes -> Some fixes, Action rules of while statements, Action rules of for statements, running the compiler
Action Rules for Function Declarations (part 1) -> Small Tweaks and Fixes, Visualizing the Problem, The new AST nodes and creation functions, Action Rules for Declarations

Final words | Next up on the project

And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:

Semantic analysis (using more action rules in Bison)
Intermediate Code generation (using the AST structure for more cases)
Machine Code generation (MIPS Assembly)

Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

#tutorials #programming #compiler #coding

8 years ago in #utopian-io by drifter1 (67)

$23.34

Sort:

Trending

[-]

yokunjon (62) 8 years ago

I thank you for your contribution. Here are my thoughts. Note that, my thoughts are my personal ideas on your post and they are not directly related to the review and scoring unlike the answers I gave in the questionnaire;

Structure
- I appreciate the structure of your post. It is well done; organizes the post and makes it easier to read.
Language
- I personally think the usage of the first person reduces the efficiency of the post. I advise you to consider using less first person. Still, this is a personal thought, don't get me wrong! :)
Content
- The font in the images you used distracts and reduces the readability in my opinion. I advise using a font which distracts less.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

$7.96

14 votes

[-]

utopian-io (71) 8 years ago

Thank you for your review, @yokunjon! Keep up the good work!

$0.00

1 vote

[-]

utopian-io (71) 8 years ago

Hey, @drifter1!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

1 vote

[-]

steem-ua (64) 8 years ago

Hi @drifter1!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.505 which ranks you at #6323 across all Steem accounts.
Your rank has not changed in the last three days.

In our last Algorithmic Curation Round, consisting of 275 contributions, your post is ranked at #86.

Evaluation of your UA score:

You're on the right track, try to gather more followers.
The readers appreciate your great work!
Good user engagement!

Feel free to join our @steem-ua Discord server

$0.00

1 vote

STEEM 0.04

TRX 0.32

JST 0.101

BTC 63963.67

ETH 1925.08

USDT 1.00

SBD 0.37

Writing a simple Compiler on my own - Action Rules for Function Declarations (part 2) [C][Flex][Bison]

Introduction

More specifically, the topics that we will cover today are:

Requirements:

Difficulty:

Actual Tutorial Content

Action Rules for Function declarations

Action Rules for Function declaration

Function head

Return Type

Optionals

Action Rules for Parameters

Traversing after parsing is done using "program"

Compiler valdation using examples

Example.c with no functions

Example2.c with one function

Full_example.c with 3 functions

RESOURCES

References:

Images:

Previous parts of the series

Final words | Next up on the project

GitHub Account:

Hi @drifter1!

Evaluation of your UA score:

Coin Marketplace