Writing a simple Compiler on my own - Function Semantics (part 1) [C][Flex][Bison]

drifter1 (67)in #utopian-io • 8 years ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will write the needed code for the parameter definition, function declaration and also parameter type-compatibility checking, which are some of the function semantics. Again all this will not be included in the "actual" compiler yet...

The topics that we will cover today are:

Parameter definition code
Function declaration code
Function parameter type compatibility checking code

In part 2 we will discuss how we can check this compatibility after the function is declared using a "revisit queue", cause functions are being called before they are declared!

Requirements:

Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:

What Compiler Design is (mainly the steps)
For which exact Language the Compiler is build for (Tokens and Grammar)
How to use Flex and Bison
How to implement a lexer and parser for the language using those tools
What the Symbol Table is and how we implement it
How we combine Flex and Bison together
How we can pass information from the Lexer to the Parser
How we define operator priorities, precedencies and associativity
What Semantic Analysis is (Attributes, SDT etc.)
How we do the so called "Scope Resolution"
How we declare types and check the type inter-compatibility for different cases

Difficulty:

Talking about the series in general this series can be rated:

Intermediate to Advanced

Today's topic(s) can be rated:

Medium

So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Parameter Definition

Knowing that a Parameter is a somewhat complex structure, I think that creating a function that gives back such a structure is not a bad idea. The parameter struct is defined as:

typedef struct Param{
	// parameter type and name
	int par_type;
	char param_name[MAXTOKENLEN];

	// to store the value
	int ival; double fval; char *st_sval;
	int passing; // value or reference
}Param;

This shows us that a Parameter is mainly described by it's data type, name and passing way (by reference or value). Of course, when being in a function call a parameter is a constant, identifier or even expression, which gives us some "final" type. Let's call the function that manages this "def_param" and let this function return a Param struct variable. This means that the declaration of this function looks as following:

Param def_param(int par_type, char *param_name, int passing);

What the function has to do is simply set the values of the corresponding entries. This means that the entry "par_type" will get the value of the parameter "par_type"etc. In the end we also have to return the created structure. So, the actual code looks as following:

Param def_param(int par_type, char *param_name, int passing){
    Param param; /* Parameter struct */

    /* set the information */
    param.par_type = par_type;
    strcpy(param.param_name, param_name);
    param.passing = passing;

    /* return the structure */
    return param;
}

This function will be used in the parser action code when we are in a rule that contains parameters. And so in both the function calls and function declarations. There is not much more to say really. You will see more when it's time! :)

Function Declaration

We already declared a function for type declaration called "set_type", but this function is not enough for functions declarations. When declaring functions we also have to define the number of parameters and also the actual parameters that the function has. The actual parameters will come as an input of this function, meaning that we will manage them outside of this function. This management will be discussed when we get more into bison and how we integrate all these semantics. Let's call the function that does the function declaration "func_declare". The declaration of this function looks as following:

int func_declare(char *name, int ret_type, int num_of_pars, Param *parameters);

This function of course has to lookup the entry from the symbol table. Afterwards we declare the function, a step where we can also check if the function was already declared! If no error occurs then we just give the corresponding values to the corresponding entries of the symbol table entry.
The code is:

int func_declare(char *name, int ret_type, int num_of_pars, Param *parameters){
    /* lookup entry */
    list_t *l = lookup(name);

    /* if type is not defined yet */
    if(l->st_type != UNDEF){
        /* entry is of function type */
        l->st_type = FUNCTION_TYPE;

        /* return type is ret_type */
        l->inf_type = ret_type;

        /* parameter stuff */
        l->num_of_pars = num_of_pars;
        l->parameters = parameters;

        return 0; /* success */
    }
    /* already declared error */
    else{
        fprintf(stderr, "Function %s already declared!\n", name);
        exit(1);
    }
}

This function will be called when declaring functions and more specifically in some part(s) of the "function_head" rule. I guess you already see the big problem here! Functions are declared after they are being used, meaning that we will have to come up with some trick that checks the function declaration and compatibility of the parameters (comes next) after we get to this point. This already shows us that our compiler can't be one-pass! We will have to do some "revisiting" of specific things, which include function semantics for example.

Function Parameter Type Compatibility Checking

Last time we created a function called "get_result_type", which also had a special case for no operator (NONE). This case is useful for assignments and parameter compatibility checks. Using this function and the types of all the parameters of a function call and the function declaration, we can check if the parameters are compatible or not! Let's call such a function "func_param_check", which of course needs the name of the function, number of parameters and parameters of the function call. So, the declaration of this function looks as following:

int func_param_check(char *name, int num_of_pars, Param *parameters);

Getting more in-depth this function has to lookup for the symbol table entry of the function, which actually is the entry that stores the function declaration details. Then, we first have to check if the number of parameters is right. The number of parameters of the function call and function declaration of course has to be the same. If it's not we will return an error! Lastly we then check if each function parameter is compatible with getting the function call parameter. Using the previously mentioned function "get_result_type", type errors get managed "automatically" from that function, without having to check the result of the function. Either way, the result would be '1' if they are compatible, as we saw last time.
The code for all that looks as following:

int func_param_check(char *name, int num_of_pars, Param *parameters){
    int i, type_1, type_2;

    /* lookup entry */
    list_t *l = lookup(name);

    /* check number of parameters */
    if(l->num_of_pars != num_of_pars){
        fprintf(stderr, "Function call of %s has wrong num of parameters!\n", name);
        exit(1);
    }

    /* check if parameters are compatible */
    for(i = 0; i < num_of_pars; i++){
        /* type of parameter in function declaration */
        type_1 = l->parameters[i].par_type; 

        /* type of parameter in function call*/
        type_2 = parameters[i].par_type; 

        /* check compatibility for function call */
        get_result_type(type_1, type_2, NONE);
        /* error occurs automatically in the function */
    }

    return 0; /* success */
}

It's easy to guess where exactly such a function will be used. Well, it's when calling a function and so in a "function_call" statement or expression, to check if the parameters of the function call are compatible with the types of the function declaration. The function declaration of course occurs much much later (as we already mentioned earlier) and so a "revisit queue" has to be created so that we can check the parameter compatibility and function declaration when we get to the point where a function get's declared. This is a so called "Backtracking" technique that we will discuss in part 2 and so next time!

Note: We might tweak today's code later on, when it actually gets into use!

RESOURCES

References:

No actual references. Only stuff we discussed through-out the series, but now "implemented" somewhat!

Images:

All of the images are custom-made!

Previous parts of the series

Introduction -> What is a compiler, what you have to know and what you will learn
A simple C Language -> Simplified C, comparison with C, tokens, basic structure
Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
Using Symbol Table in the Lexer -> Flex and Symbol Table combination
Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
Bison basics -> Bison tutorial actually
Creating a grammar for our Language -> Grammar and first Parser
Combine Flex and Bison -> lexer and parser combined
Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration

Final words | Next up on the project

And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:

Semantic analysis (creating and using even more semantic rules)
Intermediate Code generation (Abstract Syntax Tree)
Machine Code generation (MIPS Assembly)

Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

#tutorials #compiler #programming #code

8 years ago in #utopian-io by drifter1 (67)

$32.20

Sort:

Trending

[-]

mcfarhat (70) 8 years ago

Another nice post, and great to see you progressing further through the compiler creation.
My advise for this time is you could probably have included more content into this session and possibly finalizing the semantics part.
How soon can we start seeing some of the real action of the compiler? at least maybe some unit tests of what has been created so far?
Screenshots with some results would be awesome!

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

$7.46

12 votes

[-]

drifter1 (67) 8 years ago (edited)

Thank you! I can clearly see that you are enjoying this project!

Right now the compilation steps that are finished are only the Lexical Analysis (via the flex lexer) and Syntax Analysis (via the bison parser). The first one combined characters to form lexical units or tokens, whilst the second one combined those tokens to form the syntax
Semantic Analysis and all that follows from now on is much more complex and extending the previous steps by adding action code to the parser rules (mainly), that try to control more things about the language.
One of the things that occurs in all the steps is the Symbol Table that stores information about the identifiers that get's a somewhat final form after Semantic analysis is done :)

Right now, compiling the example file "full_example.c" we get messages, such as:

These messages are mainly for debugging purposes, to check if the compiler works as it should, but you can see that:

we are being informed about identifiers and if they are inserted to the symbol table for the first time or are just references.
if we are leaving a scope (function in our language) we get notified about which identifiers are being hidden etc.

Some of the next articles will again be similar to today's article, cause we first have to create code that will be called as action code. Using the semantic code such as type checking, function declaration etc. in the actual compiler that we have right now, requires the introduction of attributes (attribute grammar), meaning that we will have to insert all the needed entries like data type, parameter etc. in a new structure that each non-terminal has. But, not all the non-terminals need all of these attributes! They don't have the same requirements of attributes and so we will end up creating an abstract syntax tree (AST), where the actual node can be of many types. But, all these types will be inter-compatible so that we can form a tree of node elements. The actual information/attributes will be stored in this AST, and together with the Symbol Table help us get into the final Machine Code Generation step.

I think I spoiled enough haha. We have a long road to take :)

$0.00

[-]

utopian-io (71) 8 years ago

Thank you for your review, @mcfarhat! Keep up the good work!

$0.00

[-]

steem-ua (64) 8 years ago

Hi @drifter1!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

$0.00

1 vote

[-]

ilovecoding (39) 8 years ago

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!

$0.00

2 votes

[-]

steemstem (73) 8 years ago

This post has been voted on by the SteemSTEM curation team and voting trail in collaboration with @curie.

If you appreciate the work we are doing then consider voting both projects for witness by selecting stem.witness and curie!

For additional information please join us on the SteemSTEM discord and to get to know the rest of the community!

$0.00

1 vote

[-]

utopian-io (71) 8 years ago

Hey, @drifter1!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

1 vote

STEEM 0.04

TRX 0.32

JST 0.100

BTC 63838.91

ETH 1916.53

USDT 1.00

SBD 0.37

Writing a simple Compiler on my own - Function Semantics (part 1) [C][Flex][Bison]

Introduction

The topics that we will cover today are:

Requirements:

Difficulty:

Actual Tutorial Content

Parameter Definition

Function Declaration

Function Parameter Type Compatibility Checking

RESOURCES

References:

Images:

Previous parts of the series

Final words | Next up on the project

GitHub Account:

Hi @drifter1!

Coin Marketplace