ir.md (8892B)
1 # scc intermediate representation # 2 3 The scc IR tries to be be a simple and easily parseable intermediate 4 representation, and it makes it a bit terse and cryptic. The main 5 characteristic of the IR is that all the types and operations are 6 represented with only one letter, so parsing tables can be used 7 to parse it. 8 9 The language is composed of lines, representing statements. 10 Each statement is composed of tab-separated fields. 11 Declaration statements begin in column 0, expressions and 12 control flow begin with a tabulator. 13 When the frontend detects an error, it closes the output stream. 14 15 ## Types ## 16 17 Types are represented with uppercase letters: 18 19 * C -- signed 8-Bit integer 20 * I -- signed 16-Bit integer 21 * W -- signed 32-Bit integer 22 * Q -- signed 64-Bit integer 23 * K -- unsigned 8-Bit integer 24 * N -- unsigned 16-Bit integer 25 * Z -- unsigned 32-Bit integer 26 * O -- unsigned 64-Bit integer 27 * 0 -- void 28 * P -- pointer 29 * F -- function 30 * V -- vector 31 * U -- union 32 * S -- struct 33 * B -- bool 34 * J -- float 35 * D -- double 36 * H -- long double 37 38 This list has been built for the original Z80 backend, where 'int' 39 has the same size as 'short'. Several types (S, F, V, U and others) need 40 an identifier after the type letter for better differentiation 41 between multiple structs, functions, vectors and unions (S1, V12 ...) 42 naturally occuring in a C-program. 43 44 ## Storage classes ## 45 46 The storage classes are represented using uppercase letters: 47 48 * A -- automatic 49 * R -- register 50 * G -- public (global variable declared in the module) 51 * X -- extern (global variable declared in another module) 52 * Y -- private (variable in file-scope) 53 * T -- local (static variable in function-scope) 54 * M -- member (struct/union member) 55 * L -- label 56 57 ## Declarations/definitions ## 58 59 Variable names are composed of a storage class and an identifier 60 (e.g. A1, R2, T3). 61 Declarations and definitions are composed of a variable 62 name, a type and the name of the variable: 63 64 A1 I maxweight 65 R2 C flag 66 A3 S4 statstruct 67 68 ### Type declarations ### 69 70 Some declarations (e.g. structs) involve the declaration of member 71 variables. 72 Struct members are declared normally after the type declaration in 73 parentheses. 74 75 For example the struct declaration 76 77 struct foo { 78 int i; 79 long c; 80 } var1; 81 82 generates 83 84 S2 foo ( 85 M3 I i 86 M4 W c 87 ) 88 G5 S2 var1 89 90 ## Functions ## 91 92 A function prototype 93 94 int printf(char *cmd, int flag, void *data); 95 96 will generate a type declaration and a variable declaration 97 98 F5 P I P 99 X1 F5 printf 100 101 The first line gives the function-type specification 'F' with 102 an identifier '5' and subsequently lists the types of the 103 function parameters. 104 The second line declares the 'printf' function as a publicly 105 scoped variable. 106 107 Analogously, a statically declared function in file scope 108 109 static int printf(char *cmd, int flag, void *data); 110 111 generates 112 113 F5 P I P 114 T1 F5 printf 115 116 Thus, the 'printf' variable went into local scope ('T'). 117 118 A '{' in the first column starts the body of the previously 119 declared function: 120 121 int printf(char *cmd, int flag, void *data) {} 122 123 generates 124 125 F5 P I P 126 G1 F5 printf 127 { 128 A2 P cmd 129 A3 I flag 130 A4 P data 131 - 132 } 133 134 Again, the frontend must ensure that '{' appears only after the 135 declaration of a function. The character '-' marks the separation 136 between parameters and local variables: 137 138 int printf(register char *cmd, int flag, void *data) {int i;}; 139 140 generates 141 142 F5 P I P 143 G1 F5 printf 144 { 145 R2 P cmd 146 A3 I flag 147 A4 P data 148 - 149 A6 I i 150 } 151 152 ### Expressions ### 153 154 Expressions are emitted in reverse polish notation, simplifying 155 parsing and converting into a tree representation. 156 157 #### Operators #### 158 159 Operators allowed in expressions are: 160 161 * \+ -- addition 162 * \- -- substraction 163 * \* -- multiplication 164 * % -- modulo 165 * / -- division 166 * l -- left shift 167 * r -- right shift 168 * < -- less than 169 * > -- greather than 170 * ] -- greather or equal than 171 * [ -- less or equal than 172 * = -- equal than 173 * ! -- different than 174 * & -- bitwise and 175 * | -- bitwise or 176 * ^ -- bitwise xor 177 * ~ -- bitwise complement 178 * : -- asignation 179 * _ -- unary negation 180 * c -- function call 181 * p -- parameter 182 * . -- field 183 * , -- comma operator 184 * ? -- ternary operator 185 * ' -- take address 186 * a -- logical shortcut and 187 * o -- logical shortcut or 188 * @ -- content of pointer 189 190 Assignation has some suboperators: 191 192 * :/ -- divide and assign 193 * :% -- modulo and assign 194 * :+ -- addition and assign 195 * :- -- substraction and assign 196 * :l -- left shift and assign 197 * :r -- right shift and assign 198 * :& -- bitwise and and assign 199 * :^ -- bitwise xor and assign 200 * :| -- bitwise or and assign 201 * :i -- post increment 202 * :d -- post decrement 203 204 Every operator in an expression has a type descriptor. 205 206 #### Constants #### 207 208 Constants are introduced with the character '#'. For instance, 10 is 209 translated to #IA (all constants are emitted in hexadecimal), 210 where I indicates that it is an integer constant. 211 Strings are a special case because they are represented with 212 the " character. 213 The constant "hello" is emitted as "68656C6C6F. For example 214 215 int 216 main(void) 217 { 218 int i, j; 219 220 i = j+2*3; 221 } 222 223 generates 224 225 F1 226 G1 F1 main 227 { 228 - 229 A2 I i 230 A3 I j 231 A2 A3 #I6 +I :I 232 } 233 234 Type casts are expressed with a tuple denoting the 235 type conversion 236 237 int 238 main(void) 239 { 240 int i; 241 long j; 242 243 j = (long)i; 244 } 245 246 generates 247 248 F1 249 G1 F1 main 250 { 251 - 252 A2 I i 253 A3 W j 254 A2 A3 WI :I 255 } 256 257 ### Statements ### 258 #### Jumps ##### 259 260 Jumps have the following form: 261 262 j L# [expression] 263 264 the optional expression field indicates some condition which 265 must be satisfied to jump. Example: 266 267 int 268 main(void) 269 { 270 int i; 271 272 goto label; 273 label: 274 i -= i; 275 } 276 277 generates 278 279 F1 280 G1 F1 main 281 { 282 - 283 A2 I i 284 j L3 285 L3 286 A2 A2 :-I 287 } 288 289 Another form of jump is the return statement, which uses the 290 letter 'y' followed by a type identifier. 291 Depending on the type, an optional expression follows. 292 293 int 294 main(void) 295 { 296 return 16; 297 } 298 299 generates 300 301 F1 302 G1 F1 main 303 { 304 - 305 yI #I10 306 } 307 308 309 #### Loops #### 310 311 There are two special characters that are used to indicate 312 to the backend that the following statements are part of 313 a loop body. 314 315 * b -- beginning of loop 316 * e -- end of loop 317 318 #### Switch statement #### 319 320 Switches are represented using a table, in which the labels 321 where to jump for each case are indicated. Common cases are 322 represented with 'v' and default with 'f'. 323 The switch statement itself is represented with 's' followed 324 by the label where the jump table is located, and the 325 expression of the switch: 326 327 int 328 func(int n) 329 { 330 switch (n+1) { 331 case 1: 332 case 2: 333 case 3: 334 default: 335 ++n; 336 } 337 } 338 339 generates 340 341 F2 I 342 G1 F2 func 343 { 344 A1 I n 345 - 346 s L4 A1 #I1 +I 347 L5 348 L6 349 L7 350 L8 351 A1 #I1 :+I 352 j L3 353 L4 354 t #4 355 v L7 #I3 356 v L6 #I2 357 v L5 #I1 358 f L8 359 L3 360 } 361 362 The beginning of the jump table is indicated by the the letter 't', 363 followed by the number of cases (including default case) of the 364 switch. 365 366 ## Resumen ## 367 368 * C -- signed 8-Bit integer 369 * I -- signed 16-Bit integer 370 * W -- signed 32-Bit integer 371 * O -- signed 64-Bit integer 372 * M -- unsigned 8-Bit integer 373 * N -- unsigned 16-Bit integer 374 * Z -- unsigned 32-Bit integer 375 * Q -- unsigned 64-Bit integer 376 * 0 -- void 377 * P -- pointer 378 * F -- function 379 * V -- vector 380 * U -- union 381 * S -- struct 382 * B -- bool 383 * J -- float 384 * D -- double 385 * H -- long double 386 * A -- automatic 387 * R -- register 388 * G -- public (global variable declared in the module) 389 * X -- extern (global variable declared in another module) 390 * Y -- private (variable in file-scope) 391 * T -- local (static variable in function-scope) 392 * M -- member (struct/union member) 393 * L -- label 394 * { -- beginning of function body 395 * } -- end of function body 396 * \\ -- end of function parameters 397 * \+ -- addition 398 * \- -- substraction 399 * \* -- multiplication 400 * % -- modulo 401 * / -- division 402 * l -- left shift 403 * r -- right shift 404 * < -- less than 405 * > -- greather than 406 * ] -- greather or equal than 407 * [ -- less or equal than 408 * = -- equal than 409 * ! -- different than 410 * & -- bitwise and 411 * | -- bitwise or 412 * ^ -- bitwise xor 413 * ~ -- bitwise complement 414 * : -- asignation 415 * _ -- unary negation 416 * c -- function call 417 * p -- parameter 418 * . -- field 419 * , -- comma operator 420 * ? -- ternary operator 421 * ' -- take address 422 * a -- logical shortcut and 423 * o -- logical shortcut or 424 * @ -- content of pointer 425 * :/ -- divide and assign 426 * :% -- modulo and assign 427 * :+ -- addition and assign 428 * :- -- substraction and assign 429 * :l -- left shift and assign 430 * :r -- right shift and assign 431 * :& -- bitwise and and assign 432 * :^ -- bitwise xor and assign 433 * :| -- bitwise or and assign 434 * ;+ -- post increment 435 * ;- -- post decrement 436 * j -- jump 437 * y -- return 438 * b -- begin of loop 439 * d -- end of loop 440 * s -- switch statement 441 * t -- switch table 442 * v -- case entry in switch table 443 * f -- default entry in switch table