Own interpreted programming language with prolog. Part 1 - Lexer

in #science9 years ago

lex

I would like to encourage you to dig into this topic. Lexing is very simple process perhaps everyone has done it manualy in elementary school - Birds chirp ---Lexing process ---- (Subject – birds; predicate – chirp)

So what is lexer?

Lexer is a program which changes sequence of characters(source code) into a sequence of lexemes

Alphabet

Every programming language is limited, it has own alphabet known as "reserved keywords". Also programming language may have rules about how to create, define and name variables or any other program element, also operators like + - * / are part of programming language alphabet.

Why we need lexers?

Lexer role is to change characters understood by programmers to characters understood by the parser(what is parser i will explain in part 2). Eg. 2 + 2 may be translated into variableToken(2) plusToken variableToken(2)
Also lexing allows as to find errors like usage of unknown symbols or typos in most cases.

Our Alphabet

Our programming language will be lazy and functional
So our alphabet will consist of following characters/words
{, }, where, ), (, ;, \\(lambda), ->, <, >, <=. >=, =, /=, +, -, *, div, mod
The following conversion will be performed
{ -> tkLBrace
{ -> tkRBrace more you will find in the code
123 -> tkConstructor(123) number/digit constructor
a**** -> tkVariable(a****) new variable
A**** -> tkConstructor(A****) new Atom

Why prolog?

Prolog is great for such tasks as it is declarative programming language. Our lexer has only 68 lines of code and i even didn't make any effort to make it that short.

Talk is cheap show me the code

Here you are!

lekser(Tokens) -->
    white_space,
    (   (          "{",  !, { Token = tkLBrace }
                ;  "}",  !, { Token = tkRBrace }
                    ;  "where", !,{ Token = tkWhere }
                ;  ")",  !, { Token = tkRParen }
                ;  "(",  !, { Token = tkLParen }
                ;  ";",  !, { Token = tkSColon }
                ;  "\\", !, { Token = tkLambda }
                ;  "->", !, { Token = tkImpli  }
                ;  "<",  !, { Token = tkLT     }
                ;  ">",  !, { Token = tkGT     }
                ;  "<=", !, { Token = tkLeq    }
                ;  ">=", !, { Token = tkGeq    }
                ;  "=",  !, { Token = tkAssgn  }
                ;  "/=", !, { Token = tkNeq    }
                ;  "+",  !, { Token = tkPlus   }
                ;  "-",  !, { Token = tkMinus  }
                ;  "*",  !, { Token = tkTimes  }
                ; "div", !, { Token = tkDiv    }
                ; "mod", !, { Token = tkMod    }
                ;  digit(D),  !, number(D, N), { Token = tkKonstruktor(N) }
                        ;  lowletter(L), !, identifier(L, Id),{  Token = tkVariable(Id)}
                    ;  upletter(L), !, identifier(L, Id), { Token = tkKonstruktor(Id) }
      ;   [Un], { Token = tkUnknown, throw((unrecognized_token, Un)) }
      ),
      !,
         { Tokens = [Token | TokList] },
      lekser(TokList)
   ;  [],
         { Tokens = [] }
   ).
comment -->
    "--", !, uncomment.
comment --> [].
uncomment -->
    [Char], { code_type(Char, newline) }, !, white_space.
uncomment -->
    [_], uncomment.
white_space -->
    [Char], { code_type(Char,space) }, !, white_space.
white_space --> comment, !.
white_space -->
    [].
digit(D) -->
   [D],
      { code_type(D, digit) }.
digits([D|T]) -->
   digit(D),
   !,
   digits(T).
digits([]) -->
   [].
number(D, N) -->
   digits(Ds),
      { number_chars(N, [D|Ds]) }.
upletter(L) -->
    [L], { code_type(L, upper) }.
lowletter(L) -->
   [L], { code_type(L, lower) }.
alphanum([A|T]) -->
   [A], { code_type(A, csym) }, !, alphanum(T).
alphanum([]) -->
   [].
identifier(L, Id) -->
   alphanum(As),
      { atom_codes(Id, [L|As]) }.

Part 2

As parser part is a little bit harder to explain(and code) i will post it later. You can download and read about prolog here -> http://www.swi-prolog.org/ there are others implementations of prolog but i prefer swi. In case of any questions, feel free to ask in comments

Sort:  

I'm glad to see a post on Prolog. I used it when I worked in an AI research lab some years ago, for Natural Language Processing. It's well suited for that, as well as processing programming languages as you write about here.

i know prolog can be used i AI related cases, but how? Could you provide some paper about it?

Prolog was big in the '80s but it is still used today. For example, IBM used it in Watson:
https://www.cs.nmsu.edu/ALP/2011/03/natural-language-processing-with-prolog-in-the-ibm-watson-system/

Congratulations! This post has been upvoted from the communal account, @minnowsupport, by Whyduck from the Minnow Support Project. It's a witness project run by aggroed, ausbitbank, teamsteem, theprophet0, someguy123, neoxian, followbtcnews/crimsonclad, and netuoso. The goal is to help Steemit grow by supporting Minnows and creating a social network. Please find us in the Peace, Abundance, and Liberty Network (PALnet) Discord Channel. It's a completely public and open space to all members of the Steemit community who voluntarily choose to be there.

so good..
very very nice...

Coin Marketplace

STEEM 0.04
TRX 0.32
JST 0.082
BTC 59522.21
ETH 1570.73
USDT 1.00
SBD 0.42