Writing a simple Compiler on my own - Generating Code for Array Initializations and String Messages

drifter1 (67)in #utopian-io • 6 years ago

[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler

Introduction

Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will generate Assembly Code for the case of Array Initializations (tweaking the code of the previous article) and also get into how we declare the String Messages, that get printed out using our custom print() function.

Requirements:

Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:

What Compiler Design is (mainly the steps)
For which exact Language the Compiler is build for (Tokens and Grammar)
How to use Flex and Bison
How to implement a lexer and parser for the language using those tools
What the Symbol Table is and how we implement it
How we combine Flex and Bison together
How we can pass information from the Lexer to the Parser
How we define operator priorities, precedencies and associativity
What Semantic Analysis is (Attributes, SDT etc.)
How we do the so called "Scope Resolution"
How we declare types and check the type inter-compatibility for different cases that include function parameters and assignments
How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
Intermediate Code Representations, including AST's
How we implement an AST (structure and management)
Action Rules for AST nodes and more
The target system's architecture and its assembly language (MIPS)
How we generate Assembly code for various cases

Difficulty:

Talking about the series in general this series can be rated:

Intermediate to Advanced

Today's topic(s) can be rated:

Medium

So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Array Initializations

Various Fixes and Tweaks

Before we get into Array Initializations, we should first make sure that everything is working properly.

variable rule

First of all, we have to be able to check if an array has been initialized or not. This can be done quite easily by setting the dynamic array "vals"of the symbol table item that corresponds to this array to NULL, when this array is not being filled with values from the beginning. So, inside of the "variable"-rule we have to write:

variable:

    ...

    | ID array
    {
        if(declare == 1){
            $1->st_type = ARRAY_TYPE;
            $1->array_size = $2;
            $1->vals = NULL;
            $$ = $1;
        }		
}
;

array_init rule

Making an example file that initializes arrays, we can also find out that the error message: "Array init at lineno doesn't contain the right amount of values!" gets triggered. This tells us that we don't check the value of the correct sub-rule! More specifically, we took the value of the ID, even though that value gets set later on. So, to fix this, we simply have to check if the value of the "array" sub-rule, that contains the array_size of the currently declared array is equal to the number of elements in the "vals" array, which is "vc". Additionally, we should also set the "st_type" to ARRAY_TYPE, cause else the array is being treated like a variable.

The code of the new "array_init"-rule is:

array_init: ID array ASSIGN LBRACE values RBRACE
{
	// $2 contains array_size
	if($2 != vc){
	    fprintf(stderr, "Array init at %d doesn't contain the right amount of values!\n", lineno);
		exit(1);
	}
	$1->st_type = ARRAY_TYPE;
	$1->vals = vals;
	$1->array_size = $2;
	$$ = $1;
	vc = 0;
}
;

generate_data_declarations() tweaks

Declaring an array in MIPS Assembly, we can also initialize this array directly from .data. To do that, we simply write:

name: .datatype {values}

So, supposing that the "res" double array of full_example.c gets the values: 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, meaning that res[6] = {0.5, 1.5, 2.5, 3.5, 4.5, 5.5}, we can write the following Assembly code:

res: .double 0.500000, 1.500000, 2.500000, 3.500000, 4.500000, 5.500000

Thus, we will use the previously declared ".space" declaration, when an array is not initialized, while using this "new" form, when the entries of the array have been initialized (vals != NULL).

Inside of the code generation function, we simply have to make separate cases for the "initialized" and "not initialized". To print out the values, in the case of array inits, we simply have to loop through the "vals" array, putting ", " after each value.

Here, a diagram that explains what we will be doing:

The datatypes and their sizes are:

The final code is:

else if (l->st_type == ARRAY_TYPE){
	/* not initialized */
	if(l->vals == NULL){
		if (l->inf_type == INT_TYPE){
			fprintf(fp, "%s: .space %d\n", l->st_name, l->array_size*4);
		}
		else if (l->inf_type  == REAL_TYPE){
			fprintf(fp, "%s: .space %d\n", l->st_name, l->array_size*8);
		}
		else if (l->inf_type  == CHAR_TYPE){
			fprintf(fp, "%s: .space %d\n", l->st_name, l->array_size*1);
		}
	}
	/* initialized */
	else{
		if (l->inf_type == INT_TYPE){
			fprintf(fp, "%s: .word ", l->st_name);
			for(j = 0; j < l->array_size - 1; j++){
				fprintf(fp, "%d, ", l->vals[j].ival);
			}
			fprintf(fp, "%d\n", l->vals[l->array_size - 1].ival);
		}
		else if (l->inf_type  == REAL_TYPE){
			fprintf(fp, "%s: .double ", l->st_name);
			for(j = 0; j < l->array_size - 1; j++){
				fprintf(fp, "%f, ", l->vals[j].fval);
			}
			fprintf(fp, "%f\n", l->vals[l->array_size - 1].fval);
		}
		else if (l->inf_type  == CHAR_TYPE){
			fprintf(fp, "%s: .byte ", l->st_name);
			for(j = 0; j < l->array_size - 1; j++){
				fprintf(fp, "%c, ", l->vals[j].cval);
			}
			fprintf(fp, "%c\n", l->vals[l->array_size - 1].cval);
		}
	}
}

String Messages

We don't store the string messages in any point except the AST Node that corresponds to calling parameters of function, supposing a STRING also occurs in the custom print() function. This now has to be changed!

Dynamic String Message Array

The strings that occur through-out the file parsing process will be stored in a new array called "str_messages". This array needs a counter variable "num_of_msg" that will store the number of elements (strings) that have been stored inside of this new array. To add an element to this dynamic array, we will define a new function called "add_to_str_messages(), that we will cover later on.

So, to summarize, we make the following changes to the symbol table files:

symtab.h:

...

static char **str_messages;

...

// String Messages
void add_to_str_messages(char *str);

--------------------------

symtab.c:

...

/* number of messages */
int num_of_msg = 0;

...

/* String Messages */
void add_to_str_messages(char *str){
    ...

}

add_to_str_messages() function

This function is quite simple to write, we just need a separate case for the first insertion, as the array has to be allocated for the first time using "malloc". Afterwards, we will reallocate that space using "realloc". To insert the new string, we have to allocate enough space for that entry in the corresponding entry of that array using "malloc" again. To copy the new string to that newly allocated space, we have to use "strcpy". Lastly, we also have to increase the counter "num_of_msg".

The final code is:

void add_to_str_messages(char *str){
	/* manage space for strings */
	if(num_of_msg == 0){
		str_messages = (char**) malloc(1*sizeof(char*));
	}
	else{
		str_messages = (char**) realloc(str_messages, (num_of_msg + 1)*sizeof(char*));
	}
	/* allocate space for the string */
	str_messages[num_of_msg] = (char*) malloc((strlen(str) + 1) * sizeof(char));

	/* copy string */
	strcpy(str_messages[num_of_msg], str);

	/* increase counter */
	num_of_msg++;
}

lexer.l

This function will be called inside of the lexer, in the {STRING}-rule, and of course before the string gets returned to the parser!

As a diagram we have:

We simply change the code to:

{STRING} 	{ 
			  add_to_str_messages(yytext);
			  yylval.val.sval = malloc(yyleng * sizeof(char));
			  strcpy(yylval.val.sval, yytext);  return STRING;
			}

generate_data_declarations() changes

To generate the code for those strings, which is of the format: "name: .asciiz msg", we of course have to change the generation function of ".data". We basically, just have to loop through the dynamic array "str_messages", printing out the string messages in that format.

So, the code looks like this:

for(i = 0; i < num_of_msg; i++){
	fprintf(fp, "msg%d: .asciiz %s\n", (i + 1), str_messages[i]);
}

Running for tweaked version of full_example.c

Consider the full_example.c file tweaked to initialize the res array with the values that we spoke of before. Running our compiler for that example we will basically get this new "initialized array" declaration, and also message declarations.

The "out.s" looks like this:

.data
c: .byte 'c'
i: .word 0
p: .word 0
val: .double 2.500000
res: .double 0.500000, 1.500000, 2.500000, 3.500000, 4.500000, 5.500000
msg1: .asciiz "\n"
msg2: .asciiz "\n"
msg3: .asciiz "iteration: 3\n"
msg4: .asciiz " "
msg5: .asciiz "\n"

Note, that the code is not optimized, creating the same message ("\n") for each occurrence in the file! Either way, we get all of them, and can work with them much easier, as they are labeled incrementally. The print() function will just have to use msgX as the parameter, when a string gets printed out!

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

General Knowledge and Lexical Analysis

Syntax Analysis

Semantic Analysis (1)

Intermediate Code Generation (AST)

Semantic Analysis (2)

Machine Code Generation

Final words | Next up on the project

And this is actually it for today's post! I hope that I explained everything as much as I needed to. Next time we will continue with more stuff around Machine Code generation!

Next up on this series, in general, are:

Machine Code generation (MIPS Assembly)
Various Optimizations in the Compiler's Code

Which are all topics for which we will need more than one article to complete them. Also, note that we might also get into AST Code Optimizations later on, or that we could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

#tutorials #busy #programming #compiler

6 years ago in #utopian-io by drifter1 (67)

$16.90

Sort:

Trending

[-]

portugalcoin (72) 6 years ago

Thank you for your contribution @drifter1.
We have been analyzing your tutorial and we suggest the following points:

Your tutorial is very good. Good job again for your contribution!
The images with the flowcharts looked very good in your contribution. The tutorial was more enjoyable to read.
Thanks for following our suggestions in your previous tutorial. This contribution was much more complete and intuitive.

Thank you for your work in developing this tutorial.
Looking forward to your upcoming tutorials.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Chat with us on Discord.

[utopian-moderator]

$7.33

[-]

utopian-io (71) 6 years ago

Thank you for your review, @portugalcoin! Keep up the good work!

$0.00

[-]

steemitboard (66) 6 years ago

Congratulations @drifter1! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published more than 400 posts. Your next target is to reach 450 posts.

_{You can view your badges on your Steem Board and compare to others on the Steem Ranking}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

To support your work, I also upvoted your post!

Do not miss the last post from @steemitboard:

	The Steem community has lost an epic member! Farewell @woflhart!
	SteemitBoard - Witness Update
	Do not miss the coming Rocky Mountain Steem Meetup and get a new community badge!

You can upvote this notification to help all Steem users. Learn how here!

$0.00

1 vote

[-]

steem-ua (64) 6 years ago

Hi @drifter1!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.094 which ranks you at #9398 across all Steem accounts.
Your rank has improved 47 places in the last three days (old rank 9445).

In our last Algorithmic Curation Round, consisting of 167 contributions, your post is ranked at #151.

Evaluation of your UA score:

You're on the right track, try to gather more followers.
The readers appreciate your great work!
Try to work on user engagement: the more people that interact with you via the comments, the higher your UA score!

Feel free to join our @steem-ua Discord server

$0.00

1 vote

[-]

utopian-io (71) 6 years ago

Hey, @drifter1!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

1 vote

STEEM 0.13

TRX 0.34

JST 0.035

BTC 108074.45

ETH 4399.12

USDT 1.00

SBD 0.83

Writing a simple Compiler on my own - Generating Code for Array Initializations and String Messages

Introduction

Requirements:

Difficulty:

Actual Tutorial Content

Array Initializations

Various Fixes and Tweaks

variable rule

array_init rule

generate_data_declarations() tweaks

String Messages

Dynamic String Message Array

add_to_str_messages() function

lexer.l

generate_data_declarations() changes

Running for tweaked version of full_example.c

RESOURCES

References:

Images:

Previous parts of the series

General Knowledge and Lexical Analysis

Syntax Analysis

Semantic Analysis (1)

Intermediate Code Generation (AST)

Semantic Analysis (2)

Machine Code Generation

Final words | Next up on the project

GitHub Account:

Hi @drifter1!

Evaluation of your UA score:

Coin Marketplace