Pogo Devlog (15 Part Series)
1 Compiling Python to Go (Pogo Pt:1)
2 Creating a Target (Pogo Pt: 2)
… 11 more parts…
3 Defining the Tokens (Pogo Pt: 3)
4 Lexing the Source (Pogo Pt: 4)
5 Upgrading Tokens (Pogo Pt: 5)
6 Parsing Python (Pogo Pt: 6)
7 Creating Python’s AST (Pogo Pt:7)
8 Finally, Golang from Python (Pogo Pt:8)
9 Semantic Analysis (Pogo Pt:9)
10 Implementing Functions (Pogo Pt:10)
11 Line Numbers in Errors (Pogo Pt:11)
12 Validating Function Parameters (Pogo Pt:12)
13 Calling Functions (Pogo Pt:13)
14 Returning From Functions (Pogo Pt:14)
15 Compiler Cleanup (Pogo Pt:15)
Intro
In this series I am creating a transpiler from Python to Golang called Pogo. In the last post we created a great bulk of the lexer, and we’re now starting to get ready for the parser.
This Post
The parser turns our slice of tokens into an abstract syntax tree. Each node of the tree will be one token, but our tokens don’t have the capability to be part of a tree, because they don’t have any concept of children. In this case, we will make a new type of upgraded token, and we will call it a Structure
, as it will define the structure of our program.
The Structure Struct
<span>type</span> <span>Structure</span> <span>struct</span> <span>{</span><span>children</span> <span>[]</span><span>Structure</span><span>code</span> <span>int</span><span>text</span> <span>string</span><span>}</span><span>type</span> <span>Structure</span> <span>struct</span> <span>{</span> <span>children</span> <span>[]</span><span>Structure</span> <span>code</span> <span>int</span> <span>text</span> <span>string</span> <span>}</span>type Structure struct { children []Structure code int text string }
Enter fullscreen mode Exit fullscreen mode
This may seem very similar to the Token
struct, if you remember it from one of the previous posts. Another note to remember is that the code
property of the Structure
will not be the same as a Token
‘s code
. This is because there will be more types of Structure
s. For example, there will not only be a Structure
for the keyword if
, but also a Structure
for an entire if
statement. Furthermore, there will be another Structure
for an if
–else
block, which will hold the structure of an if
, as many else if
s as are laid out, and the final else
. In this way we are trying to separate and group sections of our code into nice chunks that are easy and simple to use.
For debugging, I have created a method that can turn a Structure
into a string in a nice readable way. This method is recursive, calling all of the Structure
s children
.
<span>func</span> <span>(</span><span>st</span> <span>Structure</span><span>)</span> <span>stringify</span><span>()</span> <span>string</span> <span>{</span><span>text</span> <span>:=</span> <span>""</span><span>for</span> <span>i</span> <span>:=</span> <span>0</span><span>;</span> <span>i</span> <span><</span> <span>len</span><span>(</span><span>st</span><span>.</span><span>children</span><span>);</span> <span>i</span><span>++</span> <span>{</span><span>text</span> <span>+=</span> <span>"</span><span>\n</span><span>"</span> <span>+</span> <span>st</span><span>.</span><span>children</span><span>[</span><span>i</span><span>]</span><span>.</span><span>stringify</span><span>()</span><span>}</span><span>return</span> <span>st</span><span>.</span><span>text</span> <span>+</span> <span>strings</span><span>.</span><span>ReplaceAll</span><span>(</span><span>text</span><span>,</span> <span>"</span><span>\n</span><span>"</span><span>,</span> <span>"</span><span>\n\t</span><span>"</span><span>)</span><span>}</span><span>func</span> <span>(</span><span>st</span> <span>Structure</span><span>)</span> <span>stringify</span><span>()</span> <span>string</span> <span>{</span> <span>text</span> <span>:=</span> <span>""</span> <span>for</span> <span>i</span> <span>:=</span> <span>0</span><span>;</span> <span>i</span> <span><</span> <span>len</span><span>(</span><span>st</span><span>.</span><span>children</span><span>);</span> <span>i</span><span>++</span> <span>{</span> <span>text</span> <span>+=</span> <span>"</span><span>\n</span><span>"</span> <span>+</span> <span>st</span><span>.</span><span>children</span><span>[</span><span>i</span><span>]</span><span>.</span><span>stringify</span><span>()</span> <span>}</span> <span>return</span> <span>st</span><span>.</span><span>text</span> <span>+</span> <span>strings</span><span>.</span><span>ReplaceAll</span><span>(</span><span>text</span><span>,</span> <span>"</span><span>\n</span><span>"</span><span>,</span> <span>"</span><span>\n\t</span><span>"</span><span>)</span> <span>}</span>func (st Structure) stringify() string { text := "" for i := 0; i < len(st.children); i++ { text += "\n" + st.children[i].stringify() } return st.text + strings.ReplaceAll(text, "\n", "\n\t") }
Enter fullscreen mode Exit fullscreen mode
In this method we add a new line for every child, and using strings.ReplaceAll
we also indent every child. This means that when we have a child of a child, they will be shown with 2 indents in the terminal.
Now we are going to make the same sort of map we made for the tokens.
<span>var</span> <span>structureCode</span> <span>map</span><span>[</span><span>string</span><span>]</span><span>int</span> <span>=</span> <span>map</span><span>[</span><span>string</span><span>]</span><span>int</span><span>{</span><span>// Not implemented</span><span>"ILLEGAL"</span><span>:</span> <span>-</span><span>1</span><span>,</span><span>// Statements</span><span>"ST_IMPORT"</span><span>:</span> <span>0</span><span>,</span><span>"IF_ELSE_BLOCK"</span><span>:</span> <span>1</span><span>,</span><span>"ST_IF"</span><span>:</span> <span>2</span><span>,</span><span>"ST_ELIF"</span><span>:</span> <span>3</span><span>,</span><span>"ST_ELSE"</span><span>:</span> <span>4</span><span>,</span><span>"ST_FOR"</span><span>:</span> <span>5</span><span>,</span><span>"ST_DECLARATION"</span><span>:</span> <span>6</span><span>,</span><span>"ST_MANIPULATION"</span><span>:</span> <span>7</span><span>,</span><span>// Other</span><span>"BLOCK"</span><span>:</span> <span>32</span><span>,</span><span>"CALL"</span><span>:</span> <span>33</span><span>,</span><span>"EXPRESSION"</span><span>:</span> <span>34</span><span>,</span><span>"COMPARISON"</span><span>:</span> <span>35</span><span>,</span><span>"PROGRAM"</span><span>:</span> <span>36</span><span>,</span><span>"ASTERISK"</span><span>:</span> <span>37</span><span>,</span><span>"IDENTIFIER"</span><span>:</span> <span>38</span><span>,</span><span>"NEWLINE"</span><span>:</span> <span>39</span><span>,</span><span>"INDENT"</span><span>:</span> <span>40</span><span>,</span><span>"L_PAREN"</span><span>:</span> <span>41</span><span>,</span><span>"R_PAREN"</span><span>:</span> <span>42</span><span>,</span><span>"L_BLOCK"</span><span>:</span> <span>43</span><span>,</span><span>"R_BLOCK"</span><span>:</span> <span>44</span><span>,</span><span>"L_SQUIRLY"</span><span>:</span> <span>45</span><span>,</span><span>"R_SQUIRLY"</span><span>:</span> <span>46</span><span>,</span><span>"SEP"</span><span>:</span> <span>47</span><span>,</span><span>"COLON"</span><span>:</span> <span>48</span><span>,</span><span>"ANTI_COLON"</span><span>:</span> <span>49</span><span>,</span><span>"ASSIGN"</span><span>:</span> <span>50</span><span>,</span><span>"UNDETERMINED"</span><span>:</span> <span>51</span><span>,</span><span>"COMMENT_ONE"</span><span>:</span> <span>52</span><span>,</span><span>// Keywords</span><span>"K_IMPORT"</span><span>:</span> <span>64</span><span>,</span><span>"K_FROM"</span><span>:</span> <span>65</span><span>,</span><span>"K_FOR"</span><span>:</span> <span>66</span><span>,</span><span>"K_IN"</span><span>:</span> <span>67</span><span>,</span><span>"K_IF"</span><span>:</span> <span>68</span><span>,</span><span>"K_ELIF"</span><span>:</span> <span>69</span><span>,</span><span>"K_ELSE"</span><span>:</span> <span>70</span><span>,</span><span>// In-built functions</span><span>"IB_PRINT"</span><span>:</span> <span>96</span><span>,</span><span>"IB_RANGE"</span><span>:</span> <span>97</span><span>,</span><span>// Bool operands</span><span>"BO_NOT"</span><span>:</span> <span>128</span><span>,</span><span>"BO_AND"</span><span>:</span> <span>129</span><span>,</span><span>"BO_OR"</span><span>:</span> <span>130</span><span>,</span><span>// Math operands</span><span>"MO_PLUS"</span><span>:</span> <span>160</span><span>,</span><span>"MO_SUB"</span><span>:</span> <span>161</span><span>,</span><span>"MO_MUL"</span><span>:</span> <span>162</span><span>,</span><span>"MO_DIV"</span><span>:</span> <span>163</span><span>,</span><span>"MO_MODULO"</span><span>:</span> <span>164</span><span>,</span><span>// Literal</span><span>"L_BOOL"</span><span>:</span> <span>192</span><span>,</span><span>"L_INT"</span><span>:</span> <span>193</span><span>,</span><span>"L_STRING"</span><span>:</span> <span>194</span><span>,</span><span>// Comparison operands</span><span>"CO_EQUALS"</span><span>:</span> <span>224</span><span>,</span><span>"CO_NOT_EQUALS"</span><span>:</span> <span>225</span><span>,</span><span>"CO_GT"</span><span>:</span> <span>226</span><span>,</span><span>"CO_GT_EQUALS"</span><span>:</span> <span>227</span><span>,</span><span>"CO_LT"</span><span>:</span> <span>228</span><span>,</span><span>"CO_LT_EQUALS"</span><span>:</span> <span>229</span><span>,</span><span>}</span><span>var</span> <span>structureCode</span> <span>map</span><span>[</span><span>string</span><span>]</span><span>int</span> <span>=</span> <span>map</span><span>[</span><span>string</span><span>]</span><span>int</span><span>{</span> <span>// Not implemented</span> <span>"ILLEGAL"</span><span>:</span> <span>-</span><span>1</span><span>,</span> <span>// Statements</span> <span>"ST_IMPORT"</span><span>:</span> <span>0</span><span>,</span> <span>"IF_ELSE_BLOCK"</span><span>:</span> <span>1</span><span>,</span> <span>"ST_IF"</span><span>:</span> <span>2</span><span>,</span> <span>"ST_ELIF"</span><span>:</span> <span>3</span><span>,</span> <span>"ST_ELSE"</span><span>:</span> <span>4</span><span>,</span> <span>"ST_FOR"</span><span>:</span> <span>5</span><span>,</span> <span>"ST_DECLARATION"</span><span>:</span> <span>6</span><span>,</span> <span>"ST_MANIPULATION"</span><span>:</span> <span>7</span><span>,</span> <span>// Other</span> <span>"BLOCK"</span><span>:</span> <span>32</span><span>,</span> <span>"CALL"</span><span>:</span> <span>33</span><span>,</span> <span>"EXPRESSION"</span><span>:</span> <span>34</span><span>,</span> <span>"COMPARISON"</span><span>:</span> <span>35</span><span>,</span> <span>"PROGRAM"</span><span>:</span> <span>36</span><span>,</span> <span>"ASTERISK"</span><span>:</span> <span>37</span><span>,</span> <span>"IDENTIFIER"</span><span>:</span> <span>38</span><span>,</span> <span>"NEWLINE"</span><span>:</span> <span>39</span><span>,</span> <span>"INDENT"</span><span>:</span> <span>40</span><span>,</span> <span>"L_PAREN"</span><span>:</span> <span>41</span><span>,</span> <span>"R_PAREN"</span><span>:</span> <span>42</span><span>,</span> <span>"L_BLOCK"</span><span>:</span> <span>43</span><span>,</span> <span>"R_BLOCK"</span><span>:</span> <span>44</span><span>,</span> <span>"L_SQUIRLY"</span><span>:</span> <span>45</span><span>,</span> <span>"R_SQUIRLY"</span><span>:</span> <span>46</span><span>,</span> <span>"SEP"</span><span>:</span> <span>47</span><span>,</span> <span>"COLON"</span><span>:</span> <span>48</span><span>,</span> <span>"ANTI_COLON"</span><span>:</span> <span>49</span><span>,</span> <span>"ASSIGN"</span><span>:</span> <span>50</span><span>,</span> <span>"UNDETERMINED"</span><span>:</span> <span>51</span><span>,</span> <span>"COMMENT_ONE"</span><span>:</span> <span>52</span><span>,</span> <span>// Keywords</span> <span>"K_IMPORT"</span><span>:</span> <span>64</span><span>,</span> <span>"K_FROM"</span><span>:</span> <span>65</span><span>,</span> <span>"K_FOR"</span><span>:</span> <span>66</span><span>,</span> <span>"K_IN"</span><span>:</span> <span>67</span><span>,</span> <span>"K_IF"</span><span>:</span> <span>68</span><span>,</span> <span>"K_ELIF"</span><span>:</span> <span>69</span><span>,</span> <span>"K_ELSE"</span><span>:</span> <span>70</span><span>,</span> <span>// In-built functions</span> <span>"IB_PRINT"</span><span>:</span> <span>96</span><span>,</span> <span>"IB_RANGE"</span><span>:</span> <span>97</span><span>,</span> <span>// Bool operands</span> <span>"BO_NOT"</span><span>:</span> <span>128</span><span>,</span> <span>"BO_AND"</span><span>:</span> <span>129</span><span>,</span> <span>"BO_OR"</span><span>:</span> <span>130</span><span>,</span> <span>// Math operands</span> <span>"MO_PLUS"</span><span>:</span> <span>160</span><span>,</span> <span>"MO_SUB"</span><span>:</span> <span>161</span><span>,</span> <span>"MO_MUL"</span><span>:</span> <span>162</span><span>,</span> <span>"MO_DIV"</span><span>:</span> <span>163</span><span>,</span> <span>"MO_MODULO"</span><span>:</span> <span>164</span><span>,</span> <span>// Literal</span> <span>"L_BOOL"</span><span>:</span> <span>192</span><span>,</span> <span>"L_INT"</span><span>:</span> <span>193</span><span>,</span> <span>"L_STRING"</span><span>:</span> <span>194</span><span>,</span> <span>// Comparison operands</span> <span>"CO_EQUALS"</span><span>:</span> <span>224</span><span>,</span> <span>"CO_NOT_EQUALS"</span><span>:</span> <span>225</span><span>,</span> <span>"CO_GT"</span><span>:</span> <span>226</span><span>,</span> <span>"CO_GT_EQUALS"</span><span>:</span> <span>227</span><span>,</span> <span>"CO_LT"</span><span>:</span> <span>228</span><span>,</span> <span>"CO_LT_EQUALS"</span><span>:</span> <span>229</span><span>,</span> <span>}</span>var structureCode map[string]int = map[string]int{ // Not implemented "ILLEGAL": -1, // Statements "ST_IMPORT": 0, "IF_ELSE_BLOCK": 1, "ST_IF": 2, "ST_ELIF": 3, "ST_ELSE": 4, "ST_FOR": 5, "ST_DECLARATION": 6, "ST_MANIPULATION": 7, // Other "BLOCK": 32, "CALL": 33, "EXPRESSION": 34, "COMPARISON": 35, "PROGRAM": 36, "ASTERISK": 37, "IDENTIFIER": 38, "NEWLINE": 39, "INDENT": 40, "L_PAREN": 41, "R_PAREN": 42, "L_BLOCK": 43, "R_BLOCK": 44, "L_SQUIRLY": 45, "R_SQUIRLY": 46, "SEP": 47, "COLON": 48, "ANTI_COLON": 49, "ASSIGN": 50, "UNDETERMINED": 51, "COMMENT_ONE": 52, // Keywords "K_IMPORT": 64, "K_FROM": 65, "K_FOR": 66, "K_IN": 67, "K_IF": 68, "K_ELIF": 69, "K_ELSE": 70, // In-built functions "IB_PRINT": 96, "IB_RANGE": 97, // Bool operands "BO_NOT": 128, "BO_AND": 129, "BO_OR": 130, // Math operands "MO_PLUS": 160, "MO_SUB": 161, "MO_MUL": 162, "MO_DIV": 163, "MO_MODULO": 164, // Literal "L_BOOL": 192, "L_INT": 193, "L_STRING": 194, // Comparison operands "CO_EQUALS": 224, "CO_NOT_EQUALS": 225, "CO_GT": 226, "CO_GT_EQUALS": 227, "CO_LT": 228, "CO_LT_EQUALS": 229, }
Enter fullscreen mode Exit fullscreen mode
This may look very familiar, but with the slight difference of the statement section.
Why?
Token
s and Structure
s are very similar, so why didn’t I just use one?
Firstly, it separates the 2 processes of the lexer and the parser very well
Secondly, I didn’t want to have to give each Token
children
. This is inefficient both space-wise and takes up more space when initializing these Tokens
s in the compiler’s source code.
Next
Now that we have created Structure
s, we can start work on our parser. The parser will turn our linear form of Token
s into an abstract syntax tree. This is the last step we will do before we start emitting code, which creates the final Go code as output, completing the project in the most minimal way.
Pogo Devlog (15 Part Series)
1 Compiling Python to Go (Pogo Pt:1)
2 Creating a Target (Pogo Pt: 2)
… 11 more parts…
3 Defining the Tokens (Pogo Pt: 3)
4 Lexing the Source (Pogo Pt: 4)
5 Upgrading Tokens (Pogo Pt: 5)
6 Parsing Python (Pogo Pt: 6)
7 Creating Python’s AST (Pogo Pt:7)
8 Finally, Golang from Python (Pogo Pt:8)
9 Semantic Analysis (Pogo Pt:9)
10 Implementing Functions (Pogo Pt:10)
11 Line Numbers in Errors (Pogo Pt:11)
12 Validating Function Parameters (Pogo Pt:12)
13 Calling Functions (Pogo Pt:13)
14 Returning From Functions (Pogo Pt:14)
15 Compiler Cleanup (Pogo Pt:15)
暂无评论内容