Writing a simple Compiler on my own - Revisit Queue and Assignment Checking (part 1) [C][Flex][Bison]

in utopian-io •  2 months ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

    Hello it's a me again @drifter1! First of all, happy new year to all of you! I was very busy with exercises and studying, but here we are again!Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison.

    In this article we will start talking about the Assignment checking problem. Function calls are being used inside of the assignment expressions, which means that we will have to perform the assignment check as a revisit check, similar to how we did function parameter checking, in the previous articles! This topic is quite large and so I will split it into some parts!

The topics that we will cover today are:

  1. Visualizing the Problem
The actual implementation will be done in the next part(s)...

Requirements:

    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases, including function parameters
  • How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
  • Intermediate Code Representations, including AST's
  • How we implement an AST (structure and management)
  • Action Rules for AST nodes and more

Difficulty:

Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Visualizing the Problem

    An assignment statement consists of an variable that we assign a value too, the '=' sign and the actual value that we assign. The variable is of a specific datatype and so is the expression/value that we assign. The datatypes need to be compatible with each other. Getting the datatype of each branch is quite easy in our compiler, as we defined structures and functions that give us that specific information. So, in general, such a statement can be visualized as following:


    As you can see the datatype of the variable is stored inside of the symbol table entry, and more specifically in the "st_type" variable, but we will still use the more general "get_type()" function, to be more safe, as pointers and arrays store this information in the "inf_type" entry. In the same way, getting the datatype of the expression "assign_val" is also easy and done using a specified function. But, don't think that we are done, simply like that! The expression used in the right part of the assignment, might contain a function call, for which we might not know the return datatype yet!

The need of a "contains revisit" flag variable

    To know if such a revisit case occurred inside of that expression, we will define a new flag variable "cont_revisit" that tells us if the expression "contains a revisit". This can be managed very easily during the call of the "expression_data_type()" function, as the function identifier will have "UNDEF" in both entries!

    Whenever this value has a value of '1' we know that we have to do something with revisit queues. If the value is '0' we can already perform the assignment check, as no function call was inside of the assignment expression!

The revisit queue structure

    To be able to perform the assignment check afterwards, during a revisit, we will have to store some information in the revisit queue entry! More specifically, we just have to store the "assign_val" expression node in a "nodes" array and the count of those (num_of_assigns). As a visualization this looks like this:

Main Diagram

    The steps that we have to follow to solve this problem can be visualized by the following diagram:


    You can see that the "new" case of revisiting is quite interesting! We should first check if an entry exists, to not create a new one with the same name, for the same variable, that just occurs in more assignment statements. Either way, if we create or don't create a new entry, we will end up managing the "nodes" array, to add the information of the newly found assignment, and also increment the count of them. In the end, we should not forget to reset the flag variable, so that it is '0' again for the next assignment! The actual assignment check will happen much later on, after the parsing is done, so that all the functions have been declared, to let us use the correct return datatype for the assignment check!

Example program assignments

    Thinking about the "classic" example program "full_example.c" we know that we have to check the following assignments:


    The compiler should add these 3 assignments, in two separate entries, depending on the variable. After the parsing is finished and the identifier of the function call stores the correct return datatype, we can perform the revisit, so that we can check if the variable and assign-expression are inter-compatible or not. Of course, this example program is build in such a way, that no errors exist...

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

General Knowledge and Lexical Analysis

Syntax Analysis

  • Syntax Analysis Theory
  • Bison basics
  • Creating a grammar for our Language
  • Combine Flex and Bison
  • Passing information from Lexer to Parser
  • Finishing Off The Grammar/Parser (part 1)
  • Finishing Off The Grammar/Parser (part 2)
  • Semantic Analysis (1)

  • Semantic Analysis Theory
  • Semantics Examples
  • Scope Resolution using the Symbol Table
  • Type Declaration and Checking
  • Function Semantics (part 1)
  • Function Semantics (part 2)
  • Intermediate Code Generation (AST)

  • Abstract Syntax Tree Principle
  • Abstract Syntax Tree Structure
  • Abstract Syntax Tree Management
  • Action Rules for Declarations and Initializations
  • Action Rules for Expressions
  • Action Rules for Assignments and Simple Statements
  • Action Rules for If-Else Statements
  • Action Rules for Loop Statements and some Fixes
  • Action Rules for Function Declarations (part 1)
  • Action Rules for Function Declarations (part 2)
  • Action Rules for Function Calls
  • Semantic Analysis (2)

  • Datatype attribute for Expressions
  • Type Checking for Assignments
  • Revisit Queue and Parameter Checking (part 1)
  • Revisit Queue and Parameter Checking (part 2)
  • Revisit Queue and Parameter Checking (part 3)
  • Revisit Queue and Parameter Checking (part 4)

  • Final words | Next up on the project

         And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
    Next up on this series are:
    • Semantic analysis (using even more action rules in Bison)
    • Machine Code generation (MIPS Assembly)
         Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
    So, see ya next time!

    GitHub Account:

    https://github.com/drifter1

    Keep on drifting! ;)
    Authors get paid when people like you upvote their post.
    If you enjoyed what you read here, create your account today and start earning FREE STEEM!
    Sort Order:  

    Thank you for your contribution @drifter1.
    After analyzing your tutorial we suggest the following point:

    • Code sections are better displayed using the code markup, instead of placing them as screenshots.

    Thank you for your work on this contribution, we look forward to your next tutorial.

    Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

    To view those questions and the relevant answers related to your post, click here.


    Need help? Chat with us on Discord.

    [utopian-moderator]

    ·

    Thank you for your review, @portugalcoin! Keep up the good work!

    Hi @drifter1!

    Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
    Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
    Feel free to join our @steem-ua Discord server

    Hey, @drifter1!

    Thanks for contributing on Utopian.
    We’re already looking forward to your next contribution!

    Get higher incentives and support Utopian.io!
    Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

    Want to chat? Join us on Discord https://discord.gg/h52nFrV.

    Vote for Utopian Witness!