Writing a simple Compiler on my own - Function Semantics (part 2) [C][Flex][Bison]

drifter1 (67)in #utopian-io • 8 years ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will talk about a way that let's us check the parameters of function call after the function get's declared.

The topics that we will cover today are:

Revisit Queue Concept
Revisit Queue Core Implementation
Inserting undeclared variables/identifiers in that queue
Run the compiler for the full example and discuss the results

Requirements:

Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:

What Compiler Design is (mainly the steps)
For which exact Language the Compiler is build for (Tokens and Grammar)
How to use Flex and Bison
How to implement a lexer and parser for the language using those tools
What the Symbol Table is and how we implement it
How we combine Flex and Bison together
How we can pass information from the Lexer to the Parser
How we define operator priorities, precedencies and associativity
What Semantic Analysis is (Attributes, SDT etc.)
How we do the so called "Scope Resolution"
How we declare types and check the type inter-compatibility for different cases, including function parameters

Difficulty:

Talking about the series in general this series can be rated:

Intermediate to Advanced

Today's topic(s) can be rated:

Medium

So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Revisit Queue Concept

The language for which we are implementing our compiler has one very difficult aspect: "Functions get declared in the end". So, we have a problem when calling functions before they are declared, cause we don't know what the function returns and also don't know if the parameters are actually compatible! The first one is quite difficult, cause we will have to suspend assignment-compatibility checks, as we don't know the type of value that comes from the function call. This is something that we will do later on when we are doing stuff on the "actual" compiler. The second thing is what we will try to prepare today.

So, what's the concept of all that I'm trying to discuss and implement today? Well, I already explained it sort of. Whenever something is not declared yet we will not throw off an "declaration error" directly, but will insert that identifier in a so called revisit queue. When the identifier that corresponds to that entry in the revisit queue get's declared we will then do some Backtracking actions to set up that entry correctly by also feeding that information to the expressions and assignments where it belongs to. BUT, this will only be allowed and checked for function calls (for now), meaning that anything that's not a function, and so some undeclared variable, will remain in that queue even after the whole parsing procedure is finished. This will show us that some variables are really undeclared, meaning that we now have an "Variable not declared error"!I don't included any revisit action yet and so will just output a warning so that we know that this is working right!

So, let's now get into the actual implementation!

Revisit Queue Core Implementation

This revisit queue will of course be some kind of structure. As the name already says I will implement it as a queue, inserting new identifiers in the end and taking identifiers starting from the beginning. Of course each revisit will search for it's corresponding identifier and so the actual structure might be irrelevant in the end. Each entry of that queue will store the name of the identifier, so that we are able to lookup the entry in the symbol table, and also the type of revisit in a form of a integer code that get's defined in the same way that we defined data types, parameter passing types etc. Afterwards we might include additional information.
So, the struct for the queue is:

typedef struct revisit_queue{
    // name of identifier
    char *st_name;
	
    // type of revisit
    int revisit_type;

    // maybe additional information to simplify the process ...

    struct revisit_queue *next;
}revisit_queue;

Similar to the symbol table this structure will be declared as a global static variable in the "symtab.h" header file of the symbol table. So, the declaration code of this structure is:

static revisit_queue *queue;

Before starting the parsing procedure we should also initialize this queue-pointer to NULL, so that we know that the structure is empty. This is also something that we have to check after the procedure is done so that we know if there are still entries (undeclared identifiers) in the queue. So, the parser should contain the lines:

...
// initialize revisit queue
queue = NULL;

... parsing procedure ...

if(queue != NULL){
  printf("Warning: Something has not been checked in the revisit queue!\n");
}
...

So, what are the functions that we have to include to manage such a structure? It's actually very simple. We only have to add entries to the queue when finding an undeclared identifier, remove entries or better said revisit them when they are declared and we might also include a dump file generation similar to the one of the symbol table. So, the functions are:

add_to_queue -> for adding identifiers to the revisit queue
revisit -> to revisit and remove an entry by also returning a code that tells us if it succeeded or not
revisit_dump -> to generate a dump file for the remaining entries

Let's get into each one on it's own!

add_to_queue function

Right now an entry of the revisit queue needs the name of the identifier and type of revisit, which means that these are also the parameters of the function! We don't need to return anything and so the function will be void. The main purpose of this function is to create and add a new entry. Of course we have two cases for that:

Queue is empty, where we just have to set up an entry and make the queue equal to that entry.
Queue is not empty, where we will have to find the last element and add then new entry after that element

So, the code is:

void Add_to_queue(char *name, int type){
    revisit_queue *q;

    /* queue is empty */
    if(queue == NULL){
        /* set up entry */
        q = (revisit_queue*) malloc(sizeof(revisit_queue));
        q->st_name = name;
        q->revisit_type = type;
        q->next = NULL;

        /* q "becomes" the queue */
        queue = q;
    }
    /* queue not empty */
    else{
        /* find last element */
        q = queue;
        while(q->next != NULL) q = q->next;

        /* add element to the end */
        q->next = (revisit_queue*) malloc(sizeof(revisit_queue));
        q->next->st_name = name;
        q->next->revisit_type = type;
        q->next->next = NULL;
    }		
}

Where this function has to be called will be discussed later on in this article :)

Revisit function

When revisiting an entry we of course have to know which entry we are referring to. This shows us that one very important parameter is the name of the identifier, which is more than enough! The main purpose of this function is doing the corresponding backtracking action for that identifier by also removing the entry afterwards. When an entry was found and taken care of we will return '0' to know that it was successful. By returning '1' we let the caller function know that no entries where found. This is quite important when declaring functions, cause it will show us that an function got declared, but not used yet previously (it might be useless).
To able to search for the entry and remove it at any position it might be found at, we have to take care of some cases:

Special case of being the first entry, where we do the revisit actions depending on the revisit type and then remove the entry by setting the queue to it's "next"
Simple case, where we just search for the entry pointing to our entry, do the action and then make that "pointing entry", point to the "next" of the found entry
Not being in the queue at all!

So, the code is:

int revisit(char *name){
    revisit_queue *q;

    /* special case - first entry */
    if( strcmp(queue->st_name, name) == 0 ){

        /* revisit entry depending on the type */
        switch(queue->revisit_type){
            case PARAM_CHECK:
                /* TO DO: run parameter check */
                break;
            /* ... */
        }

        /* remove entry by setting queue to "next" */
        queue = queue->next;

        return 0; // success
    }

    /* search for the entry that points to it */
    q = queue;
    while( strcmp(q->next->st_name, name) != 0 ) q = q->next;

    /* check if entry was not found */
    if(q == NULL){
        return 1;  // not found
    }

    /* revisit entry depending on the type */
    switch(q->next->revisit_type){
        case PARAM_CHECK:
            /* TO DO: run parameter check */
            break;
        /* ... */
    }

    /* remove entry by making the previous entry point at */
    /* the "next" of the entry that we want to remove  */
    q->next = q->next->next;

    return 0; // success
}

This function will be called when functions get declared, to take care of function calls that has not been checked yet! This actual integration will come in later articles...

Revisit_dump function

Last but not least we need a function that prints out what remained in the queue. We just have to loop through the queue, print the name of the identifier, and depending on the revisit type also the revisit type, but in text (not an integer value). Very simple process actually.
So, the code is:

void revisit_dump(FILE *of){
    int i;
    revisit_queue *q;
    q = queue;

    fprintf(of,"------------ -------------\n");
    fprintf(of,"Identifier   Revisit Type\n");
    fprintf(of,"------------ -------------\n");
    while(q != NULL){
        fprintf(of, "%-13s", q->st_name);
        if(q->revisit_type == PARAM_CHECK){
            fprintf(of,"%s","Parameter Check");
        }
        // more later on
        fprintf(of, "\n");
        q = q->next;	
    }
}

This will be called after the parsing procedure is done in the same way as the symbol table dump. The code is:

yyout = fopen("revisit_dump.out", "w");
revisit_dump(yyout);
fclose(yyout);

Inserting undeclared identifiers in that queue

The first thing I thought about was declaring a flag variable similar to the declare variable, but there is one big problem. We can't run an action before finding the ID that easily, which is what happens in function call statements. This shows us that we can't do the same programming trick that we did with declarations. So, to be able to check all this we ended up making all the stuff discussed through-out the article.
The simplest way of integrating the new revisit queue is by adding it to the correct point of the "insert" function that also takes care of declarations. Knowing when exactly we declare, we should have already taken care of insertions to the symbol table, when we are currently not declaring! So, what does this mean? Well, we already took care of the case of having an entry already inside of the symbol table, but of different scope, by creating a new entry only when currently declaring. In the same way, there is also this case of having no entry at all, that I completely forgotten in the Scope resolution article! So, when there is no entry yet, we should only add an entry when currently declaring. BUT, to take care of function calls, we should add the identifier either way, by also adding the identifier to the revisit queue, whenever we have "declare = 0".
So, the first part of the insert function becomes:

/* variable not yet in table */
if (l == NULL){
    /* check if we are really declaring */
    if(declare == 1){
        /* set up entry */
        l = (list_t*) malloc(sizeof(list_t));
        strncpy(l->st_name, name, len);
        l->st_type = type;
        l->scope = cur_scope;
        l->lines = (RefList*) malloc(sizeof(RefList));
        l->lines->lineno = lineno;
        l->lines->next = NULL;

        /* add to hashtable */
        l->next = hash_table[hashval];
        hash_table[hashval] = l; 
        printf("Inserted %s for the first time with linenumber %d!\n", name, lineno);
    }
    else{
        /* add it to check it again later */
        l = (list_t*) malloc(sizeof(list_t));
        strncpy(l->st_name, name, len);
        l->st_type = type;
        l->scope = cur_scope;
        l->lines = (RefList*) malloc(sizeof(RefList));
        l->lines->lineno = lineno;
        l->lines->next = NULL;
        l->next = hash_table[hashval];
        hash_table[hashval] = l;
        printf("Inserted %s at line %d to check it again later!\n", name, lineno);
	
        /* Adding identifier to the revisit queue! */
        add_to_queue(l->st_name, PARAM_CHECK);
    }
}

Full example result discussion

Running the compiler for "full_example.c" we now get more messages and also a new "revisit_dump.out" file after the procedure is done.
Messages of "check later":

The "queue has entries" warning:

Queue dump file:

Very interesting indeed! We can see that the function call identifiers have been truly inserted to the revisit queue. Now what remains to do afterwards is to revisit them for parameter checking whenever a function get's declared. Of course the revisit function will be called always, but a result of '1' (as we mentioned earlier) will tell the "caller" function that we had no entries to revisit, meaning that the function will be used later on in another function or be completely useless!

All the code and the dump files are in GitHub, as always :)

RESOURCES

References:

No actual references. Only stuff we discussed through-out the series, but now "implemented"...

Images:

All of the images are either custom-made or screenshots!

Previous parts of the series

Introduction -> What is a compiler, what you have to know and what you will learn
A simple C Language -> Simplified C, comparison with C, tokens, basic structure
Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
Using Symbol Table in the Lexer -> Flex and Symbol Table combination
Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
Bison basics -> Bison tutorial actually
Creating a grammar for our Language -> Grammar and first Parser
Combine Flex and Bison -> lexer and parser combined
Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking

Final words | Next up on the project

And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:

Semantic analysis (creating and using even more semantic rules)
Intermediate Code generation (Abstract Syntax Tree)
Machine Code generation (MIPS Assembly)

Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

#tutorials #compiler #programming #code

8 years ago in #utopian-io by drifter1 (67)

$24.14

Sort:

Trending

[-]

portugalcoin (72) 8 years ago

Thank you for your contribution @drifter1.
After analyzing your tutorial we suggest the following points below:

We suggest you put some more pictures on what you are explaining and developing.
Thank you for placing screenshots with the results.

Very good your work while developing this tutorial. We've been waiting for more tutorials.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

$6.18

10 votes

[-]

utopian-io (71) 8 years ago

Thank you for your review, @portugalcoin! Keep up the good work!

$0.00

[-]

utopian-io (71) 8 years ago

Hey, @drifter1!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

1 vote

[-]

ilovecoding (39) 8 years ago

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!

$0.00

2 votes

[-]

steem-ua (64) 8 years ago

Hi @drifter1!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.501 which ranks you at #6371 across all Steem accounts.
Your rank has dropped 38 places in the last three days (old rank 6333).

In our last Algorithmic Curation Round, consisting of 239 contributions, your post is ranked at #25.

Evaluation of your UA score:

You're on the right track, try to gather more followers.
The readers appreciate your great work!
Try to improve on your user engagement! The more interesting interaction in the comments of your post, the better!

Feel free to join our @steem-ua Discord server

$0.00

1 vote

STEEM 0.04

TRX 0.32

JST 0.103

BTC 63564.47

ETH 1885.34

USDT 1.00

SBD 0.37

Writing a simple Compiler on my own - Function Semantics (part 2) [C][Flex][Bison]

Introduction

The topics that we will cover today are:

Requirements:

Difficulty:

Actual Tutorial Content

Revisit Queue Concept

Revisit Queue Core Implementation

add_to_queue function

Revisit function

Revisit_dump function

Inserting undeclared identifiers in that queue

Full example result discussion

RESOURCES

References:

Images:

Previous parts of the series

Final words | Next up on the project

GitHub Account:

Hi @drifter1!

Evaluation of your UA score:

Coin Marketplace