Writing a simple Compiler on my own - Function Semantics (part 1) [C][Flex][Bison]

in #utopian-io5 years ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

    Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will write the needed code for the parameter definition, function declaration and also parameter type-compatibility checking, which are some of the function semantics. Again all this will not be included in the "actual" compiler yet...

The topics that we will cover today are:

  1. Parameter definition code
  2. Function declaration code
  3. Function parameter type compatibility checking code
    In part 2 we will discuss how we can check this compatibility after the function is declared using a "revisit queue", cause functions are being called before they are declared!

Requirements:

    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases

Difficulty:

Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Parameter Definition

    Knowing that a Parameter is a somewhat complex structure, I think that creating a function that gives back such a structure is not a bad idea. The parameter struct is defined as:
typedef struct Param{
	// parameter type and name
	int par_type;
	char param_name[MAXTOKENLEN];
// to store the value int ival; double fval; char *st_sval; int passing; // value or reference }Param;
    This shows us that a Parameter is mainly described by it's data type, name and passing way (by reference or value). Of course, when being in a function call a parameter is a constant, identifier or even expression, which gives us some "final" type. Let's call the function that manages this "def_param" and let this function return a Param struct variable. This means that the declaration of this function looks as following:

Param def_param(int par_type, char *param_name, int passing);

    What the function has to do is simply set the values of the corresponding entries. This means that the entry "par_type" will get the value of the parameter "par_type"etc. In the end we also have to return the created structure. So, the actual code looks as following:

Param def_param(int par_type, char *param_name, int passing){
    Param param; /* Parameter struct */
/* set the information */ param.par_type = par_type; strcpy(param.param_name, param_name); param.passing = passing;
/* return the structure */ return param; }

    This function will be used in the parser action code when we are in a rule that contains parameters. And so in both the function calls and function declarations. There is not much more to say really. You will see more when it's time! :)

Function Declaration

    We already declared a function for type declaration called "set_type", but this function is not enough for functions declarations. When declaring functions we also have to define the number of parameters and also the actual parameters that the function has. The actual parameters will come as an input of this function, meaning that we will manage them outside of this function. This management will be discussed when we get more into bison and how we integrate all these semantics. Let's call the function that does the function declaration "func_declare". The declaration of this function looks as following:

int func_declare(char *name, int ret_type, int num_of_pars, Param *parameters);

    This function of course has to lookup the entry from the symbol table. Afterwards we declare the function, a step where we can also check if the function was already declared! If no error occurs then we just give the corresponding values to the corresponding entries of the symbol table entry.
The code is:
int func_declare(char *name, int ret_type, int num_of_pars, Param *parameters){
    /* lookup entry */
    list_t *l = lookup(name);
/* if type is not defined yet */ if(l->st_type != UNDEF){ /* entry is of function type */ l->st_type = FUNCTION_TYPE;
/* return type is ret_type */ l->inf_type = ret_type;
/* parameter stuff */ l->num_of_pars = num_of_pars; l->parameters = parameters;
return 0; /* success */ } /* already declared error */ else{ fprintf(stderr, "Function %s already declared!\n", name); exit(1); } }

    This function will be called when declaring functions and more specifically in some part(s) of the "function_head" rule. I guess you already see the big problem here! Functions are declared after they are being used, meaning that we will have to come up with some trick that checks the function declaration and compatibility of the parameters (comes next) after we get to this point. This already shows us that our compiler can't be one-pass! We will have to do some "revisiting" of specific things, which include function semantics for example.

Function Parameter Type Compatibility Checking

    Last time we created a function called "get_result_type", which also had a special case for no operator (NONE). This case is useful for assignments and parameter compatibility checks. Using this function and the types of all the parameters of a function call and the function declaration, we can check if the parameters are compatible or not! Let's call such a function "func_param_check", which of course needs the name of the function, number of parameters and parameters of the function call. So, the declaration of this function looks as following:

int func_param_check(char *name, int num_of_pars, Param *parameters);

    Getting more in-depth this function has to lookup for the symbol table entry of the function, which actually is the entry that stores the function declaration details. Then, we first have to check if the number of parameters is right. The number of parameters of the function call and function declaration of course has to be the same. If it's not we will return an error! Lastly we then check if each function parameter is compatible with getting the function call parameter. Using the previously mentioned function "get_result_type", type errors get managed "automatically" from that function, without having to check the result of the function. Either way, the result would be '1' if they are compatible, as we saw last time.
The code for all that looks as following:
int func_param_check(char *name, int num_of_pars, Param *parameters){
    int i, type_1, type_2;
/* lookup entry */ list_t *l = lookup(name);
/* check number of parameters */ if(l->num_of_pars != num_of_pars){ fprintf(stderr, "Function call of %s has wrong num of parameters!\n", name); exit(1); }
/* check if parameters are compatible */ for(i = 0; i < num_of_pars; i++){ /* type of parameter in function declaration */ type_1 = l->parameters[i].par_type;
/* type of parameter in function call*/ type_2 = parameters[i].par_type;
/* check compatibility for function call */ get_result_type(type_1, type_2, NONE); /* error occurs automatically in the function */ }
return 0; /* success */ }

    It's easy to guess where exactly such a function will be used. Well, it's when calling a function and so in a "function_call" statement or expression, to check if the parameters of the function call are compatible with the types of the function declaration. The function declaration of course occurs much much later (as we already mentioned earlier) and so a "revisit queue" has to be created so that we can check the parameter compatibility and function declaration when we get to the point where a function get's declared. This is a so called "Backtracking" technique that we will discuss in part 2 and so next time!

Note: We might tweak today's code later on, when it actually gets into use!

RESOURCES

References:

    No actual references. Only stuff we discussed through-out the series, but now "implemented" somewhat!

Images:

All of the images are custom-made!

Previous parts of the series


Final words | Next up on the project

     And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:
  • Semantic analysis (creating and using even more semantic rules)
  • Intermediate Code generation (Abstract Syntax Tree)
  • Machine Code generation (MIPS Assembly)
     Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)
Sort:  

Another nice post, and great to see you progressing further through the compiler creation.
My advise for this time is you could probably have included more content into this session and possibly finalizing the semantics part.
How soon can we start seeing some of the real action of the compiler? at least maybe some unit tests of what has been created so far?
Screenshots with some results would be awesome!

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thank you! I can clearly see that you are enjoying this project!

  • Right now the compilation steps that are finished are only the Lexical Analysis (via the flex lexer) and Syntax Analysis (via the bison parser). The first one combined characters to form lexical units or tokens, whilst the second one combined those tokens to form the syntax
  • Semantic Analysis and all that follows from now on is much more complex and extending the previous steps by adding action code to the parser rules (mainly), that try to control more things about the language.
  • One of the things that occurs in all the steps is the Symbol Table that stores information about the identifiers that get's a somewhat final form after Semantic analysis is done :)

Right now, compiling the example file "full_example.c" we get messages, such as:



These messages are mainly for debugging purposes, to check if the compiler works as it should, but you can see that:

  • we are being informed about identifiers and if they are inserted to the symbol table for the first time or are just references.
  • if we are leaving a scope (function in our language) we get notified about which identifiers are being hidden etc.

    Some of the next articles will again be similar to today's article, cause we first have to create code that will be called as action code. Using the semantic code such as type checking, function declaration etc. in the actual compiler that we have right now, requires the introduction of attributes (attribute grammar), meaning that we will have to insert all the needed entries like data type, parameter etc. in a new structure that each non-terminal has. But, not all the non-terminals need all of these attributes! They don't have the same requirements of attributes and so we will end up creating an abstract syntax tree (AST), where the actual node can be of many types. But, all these types will be inter-compatible so that we can form a tree of node elements. The actual information/attributes will be stored in this AST, and together with the Symbol Table help us get into the final Machine Code Generation step.

I think I spoiled enough haha. We have a long road to take :)

Thank you for your review, @mcfarhat! Keep up the good work!

Hi @drifter1!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!





This post has been voted on by the SteemSTEM curation team and voting trail in collaboration with @curie.

If you appreciate the work we are doing then consider voting both projects for witness by selecting stem.witness and curie!

For additional information please join us on the SteemSTEM discord and to get to know the rest of the community!

Hey, @drifter1!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.25
TRX 0.11
JST 0.032
BTC 63706.21
ETH 3073.80
USDT 1.00
SBD 3.76