scc-ir.man (9873B)
1 .TH SCC-IR 7 scc\-VERSION 2 .SH NAME 3 scc-ir \- scc intermediate representation 4 .SH DESCRIPTION 5 The scc intermediate representation (IR) is a text-based format 6 used to communicate between the compiler frontend 7 .RB ( cc1 ) 8 and the compiler backend 9 .RB ( cc2 ). 10 It is designed to be simple and easily parseable: 11 all types and operators are represented by one or two characters, 12 so parsing tables can be used to process it. 13 .PP 14 The language is composed of lines representing statements. 15 Each line is composed of tab-separated fields. 16 Declarations begin in column 0; 17 expressions and control flow statements begin with a tab character. 18 When the frontend detects an error, 19 it closes the output stream. 20 .SH TYPES 21 Types are represented with single characters: 22 .PP 23 .TS 24 l l. 25 B bool 26 C signed 8-bit integer 27 K unsigned 8-bit integer 28 I signed 16-bit integer 29 N unsigned 16-bit integer 30 W signed 32-bit integer 31 Z unsigned 32-bit integer 32 Q signed 64-bit integer 33 O unsigned 64-bit integer 34 J float 35 D double 36 H long double 37 0 void 38 P pointer 39 F function 40 E function with ellipsis 41 V array (vector) 42 U union 43 S struct 44 1 \fI__builtin_va_arg\fR 45 .TE 46 .PP 47 Aggregate and composed types 48 .RB ( S , 49 .BR U , 50 .BR V ) 51 are followed by a numeric identifier 52 to distinguish between multiple types of the same kind: 53 .BR S3 , 54 .BR V5 , 55 .BR U2 . 56 .PP 57 The sizes in the table above are nominal. 58 Actual sizes depend on the target architecture. 59 For example, on amd64-sysv, 60 .B int 61 is 32-bit and uses 62 .BR W , 63 while on z80-scc it is 16-bit and uses 64 .BR I . 65 .SH STORAGE CLASSES 66 Storage classes are represented with uppercase letters: 67 .PP 68 .TS 69 l l. 70 A automatic (local variable) 71 R register 72 G global (public, defined in this module) 73 X extern (declared in another module) 74 Y private (file-scope static) 75 T local (function-scope static) 76 M struct/union member 77 L label 78 .TE 79 .PP 80 A variable name in the IR is composed of a storage class letter 81 followed by a numeric identifier, for example: 82 .BR A1 , 83 .BR G2 , 84 .BR T3 , 85 .BR L4 . 86 .SH DECLARATIONS 87 .SS Variable declarations 88 A variable declaration consists of a variable name, 89 its type, and a quoted source name: 90 .PP 91 .RS 92 .I var 93 .B \et 94 .I type 95 .B \et " 96 .I name 97 .RE 98 .PP 99 For example: 100 .PP 101 .RS 102 .nf 103 A4 W "i 104 G2 W "g 105 X3 P "ptr 106 .fi 107 .RE 108 .SS Function declarations 109 Function declarations include the return type 110 and use 111 .B F 112 for the function type 113 .RB ( E 114 if the function has an ellipsis parameter): 115 .PP 116 .RS 117 .I var 118 .B \et 119 .I return-type 120 .B \et F \et " 121 .I name 122 .RE 123 .PP 124 For example: 125 .PP 126 .RS 127 .nf 128 .ta 8n 16n 24n 129 G2 W F "main 130 X3 W E "printf 131 T4 0 F "helper 132 .fi 133 .RE 134 .PP 135 .B G 136 marks a public function, 137 .B T 138 a file-scope static function, and 139 .B X 140 an extern declaration. 141 .SS Function definitions 142 A function definition starts with the function declaration, 143 followed by 144 .B { 145 on its own line. 146 Function parameters are declared inside the body. 147 A 148 .B \e 149 (backslash) on its own line separates 150 parameters from local variable declarations. 151 The body ends with 152 .BR } . 153 .PP 154 For example, the C source: 155 .PP 156 .RS 157 .nf 158 int func(int a, int b) { 159 int c; 160 return a + b; 161 } 162 .fi 163 .RE 164 .PP 165 generates: 166 .PP 167 .RS 168 .nf 169 .ta 8n 16n 24n 170 G2 W F "func 171 { 172 A3 W "a 173 A4 W "b 174 \e 175 A6 W "c 176 h A3 A4 +W 177 } 178 .fi 179 .RE 180 .SS Struct and union declarations 181 A struct or union type declaration starts with a header line 182 containing the type letter and identifier, 183 a quoted tag name, 184 a hex-encoded size and a hex-encoded alignment: 185 .PP 186 .RS 187 .I type-id 188 .B \et " 189 .I tag 190 .B \et # 191 .IR size-letter size 192 .B \et # 193 .IR size-letter align 194 .RE 195 .PP 196 Member declarations follow, each including an offset field: 197 .PP 198 .RS 199 .I member-var 200 .B \et 201 .I type 202 .B \et " 203 .I name 204 .B \et # 205 .IR size-letter offset 206 .RE 207 .PP 208 For example, the C source: 209 .PP 210 .RS 211 .nf 212 struct point { 213 int x; 214 int y; 215 }; 216 struct point p; 217 .fi 218 .RE 219 .PP 220 generates (on amd64-sysv): 221 .PP 222 .RS 223 .nf 224 .ta 8n 16n 24n 225 S3 "point #O8 #O4 226 M4 W "x #O0 227 M5 W "y #O4 228 G6 S3 "p 229 .fi 230 .RE 231 .PP 232 Unions use 233 .B U 234 instead of 235 .BR S . 236 Members of a union typically share offset 0. 237 .SS Array type declarations 238 Array types are declared with 239 .BR V , 240 the element type, 241 and the number of elements in hexadecimal: 242 .PP 243 .RS 244 .nf 245 .ta 8n 16n 24n 246 V5 W #OA 247 .fi 248 .RE 249 .PP 250 This declares array type V5 with element type 251 .B W 252 (signed 32-bit integer) and 0xA (10) elements. 253 Array variable declarations reference the array type: 254 .PP 255 .RS 256 .nf 257 .ta 8n 16n 24n 258 A4 V5 "a 259 .fi 260 .RE 261 .SS Enum declarations 262 Enumerations are not emitted as types. 263 Enum variables are emitted with their underlying integer type 264 (typically 265 .BR W ): 266 .PP 267 .RS 268 .nf 269 G7 W "c 270 .fi 271 .RE 272 .SH INITIALIZERS 273 When a variable has an initializer, 274 the declaration line ends without a newline and is followed by 275 .B ( 276 on the same line. 277 The initializer expressions follow, 278 one per line, 279 and the initializer is closed with 280 .B ) 281 on its own line. 282 .PP 283 For example: 284 .PP 285 .RS 286 .nf 287 int g = 42; 288 .fi 289 .RE 290 .PP 291 generates: 292 .PP 293 .RS 294 .nf 295 .ta 8n 16n 24n 296 G2 W "g ( 297 #W2A 298 ) 299 .fi 300 .RE 301 .PP 302 Array and struct initializers list each element: 303 .PP 304 .RS 305 .nf 306 int a[3] = {1, 2, 3}; 307 .fi 308 .RE 309 .PP 310 generates: 311 .PP 312 .RS 313 .nf 314 .ta 8n 16n 24n 315 V3 W #O3 316 G2 V3 "a ( 317 #W1 318 #W2 319 #W3 320 ) 321 .fi 322 .RE 323 .PP 324 String initializers use a quoted form for printable runs 325 and individual byte constants for non-printable characters: 326 .PP 327 .RS 328 .nf 329 .ta 8n 16n 24n 330 #"hello 331 #C0 332 .fi 333 .RE 334 .SH EXPRESSIONS 335 Expressions are emitted in reverse Polish notation (RPN), 336 with tab-separated tokens on a single line. 337 Every operator is followed by a type letter. 338 .SS Constants 339 Constants are introduced with 340 .BR # , 341 followed by a type letter and a hexadecimal value: 342 .PP 343 .RS 344 .nf 345 #W2A 346 .fi 347 .RE 348 .PP 349 This represents the integer constant 42 (0x2A) of type 350 .BR W . 351 .PP 352 Floating-point constants are emitted as the hexadecimal encoding 353 of their IEEE 754 representation: 354 .PP 355 .RS 356 .nf 357 #J3FC00000 358 #D4004000000000000 359 .fi 360 .RE 361 .PP 362 These represent float 1.5 and double 2.5, respectively. 363 .PP 364 String constants are emitted using 365 .B #" 366 for printable character runs: 367 .PP 368 .RS 369 .nf 370 #"hello 371 .fi 372 .RE 373 .SS Arithmetic operators 374 .TS 375 l l. 376 + addition 377 \- subtraction 378 * multiplication 379 / division 380 % modulo 381 l left shift 382 r right shift 383 .TE 384 .SS Comparison operators 385 .TS 386 l l. 387 < less than 388 > greater than 389 [ less or equal 390 ] greater or equal 391 \&= equal 392 ! not equal 393 .TE 394 .SS Bitwise operators 395 .TS 396 l l. 397 & bitwise and 398 | bitwise or 399 ^ bitwise xor 400 ~ bitwise complement (unary) 401 .TE 402 .SS Logical operators 403 .TS 404 l l. 405 a logical and (short-circuit) 406 o logical or (short-circuit) 407 n logical negation 408 .TE 409 .SS Unary operators 410 .TS 411 l l. 412 \&_ arithmetic negation 413 ~ bitwise complement 414 n logical negation 415 \&' address-of 416 @ pointer dereference 417 .TE 418 .SS Assignment 419 .TS 420 l l. 421 : assignment 422 :* multiply and assign 423 :/ divide and assign 424 :% modulo and assign 425 :+ add and assign 426 :\- subtract and assign 427 :l left shift and assign 428 :r right shift and assign 429 :& bitwise and and assign 430 :^ bitwise xor and assign 431 :| bitwise or and assign 432 :i post-increment 433 :d post-decrement 434 .TE 435 .SS Other operators 436 .TS 437 l l. 438 , comma 439 ? ternary (conditional) 440 \&. struct/union field access 441 g type cast (followed by target type letter) 442 .TE 443 .SS Function calls 444 Function calls use 445 .B p 446 to push each argument, 447 .B c 448 for the call itself, 449 and 450 .B z 451 for calls to variadic functions. 452 Each is followed by the type of the result: 453 .PP 454 .RS 455 .nf 456 .ta 8n 16n 24n 32n 40n 48n 457 X2 Y9 'P pP #W2A pW zW 458 .fi 459 .RE 460 .PP 461 This pushes a pointer argument 462 .RB ( pP ), 463 pushes an integer argument 464 .RB ( pW ), 465 and calls a variadic function returning 466 .BR W 467 .RB ( zW ). 468 .SS Builtin functions 469 Builtin function calls use 470 .B m 471 as the operator, 472 preceded by a quoted builtin name: 473 .PP 474 .RS 475 .nf 476 "__builtin_va_arg m 477 .fi 478 .RE 479 .SS Expression example 480 The C expression: 481 .PP 482 .RS 483 .nf 484 i = j + 2 * 3; 485 .fi 486 .RE 487 .PP 488 generates (on amd64-sysv): 489 .PP 490 .RS 491 .nf 492 .ta 8n 16n 24n 32n 40n 493 A4 A5 #W6 +W :W 494 .fi 495 .RE 496 .PP 497 Note that constant folding has reduced 498 .I 2*3 499 to 500 .IR 6 . 501 The expression is in RPN: 502 push A4, push A5, push #W6, add (yielding W), assign (yielding W). 503 .SS Type casts 504 Casts are emitted as the operator 505 .B g 506 followed by the target type letter. 507 A cast to 508 .B void 509 is not emitted. 510 For example: 511 .PP 512 .RS 513 .nf 514 j = (long)i; 515 .fi 516 .RE 517 .PP 518 generates (on amd64-sysv): 519 .PP 520 .RS 521 .nf 522 .ta 8n 16n 24n 32n 523 A5 A4 gQ :Q 524 .fi 525 .RE 526 .SH STATEMENTS 527 .SS Labels 528 Labels begin in column 0 and consist of 529 .B L 530 followed by a numeric identifier: 531 .PP 532 .RS 533 .nf 534 L3 535 .fi 536 .RE 537 .SS Unconditional jumps 538 An unconditional jump uses 539 .B j 540 followed by a label: 541 .PP 542 .RS 543 .nf 544 .ta 8n 16n 545 j L3 546 .fi 547 .RE 548 .SS Conditional branches 549 A conditional branch uses 550 .BR y , 551 followed by a label. 552 The expression to evaluate follows on the next line. 553 If the expression evaluates to true (non-zero), the branch is taken: 554 .PP 555 .RS 556 .nf 557 .ta 8n 16n 24n 32n 558 y L5 A4 #W5 <W 559 .fi 560 .RE 561 .PP 562 Note that the frontend negates the condition: 563 the C code 564 .I "if (i > 5)" 565 is emitted as a branch on 566 .IR "i <= 5" , 567 jumping past the then-block when the original condition is false. 568 .SS Return 569 The return statement uses 570 .BR h . 571 If the function returns a value, 572 the expression follows on the same line: 573 .PP 574 .RS 575 .nf 576 .ta 8n 16n 24n 32n 577 h A3 A4 +W 578 .fi 579 .RE 580 .PP 581 A void return is emitted as 582 .B h 583 alone, followed by a blank expression line. 584 .SS Loops 585 Two markers indicate loop boundaries to the backend: 586 .PP 587 .TS 588 l l. 589 b beginning of loop body 590 e end of loop body 591 .TE 592 .PP 593 For example, a 594 .B while 595 loop: 596 .PP 597 .RS 598 .nf 599 while (i < 10) { ++i; } 600 .fi 601 .RE 602 .PP 603 generates: 604 .PP 605 .RS 606 .nf 607 .ta 8n 16n 24n 32n 608 j L5 609 L4 610 b 611 A4 #W1 :+W 612 L5 613 e 614 y L4 A4 #WA <W 615 L6 616 .fi 617 .RE 618 .SS Switch statements 619 A switch statement is bracketed by 620 .B s 621 (begin) and 622 .B t 623 (end). 624 The 625 .B s 626 marker is followed by the switch expression. 627 Case entries are emitted with 628 .BR v , 629 and the default entry with 630 .BR f . 631 The 632 .B t 633 marker takes the label where execution continues after the switch. 634 .PP 635 For example: 636 .PP 637 .RS 638 .nf 639 switch (n+1) { 640 case 1: 641 case 2: 642 case 3: 643 default: 644 ++n; 645 } 646 .fi 647 .RE 648 .PP 649 generates: 650 .PP 651 .RS 652 .nf 653 .ta 8n 16n 24n 32n 654 s A3 #W1 +W 655 v L6 #W1 656 L6 657 v L7 #W2 658 L7 659 v L8 #W3 660 L8 661 f L9 662 L9 663 A3 #W1 :+W 664 t L5 665 L5 666 .fi 667 .RE 668 .PP 669 Each 670 .B v 671 entry is followed by a label and a constant value. 672 The 673 .B f 674 (default) entry is followed by a label only. 675 .SH SEE ALSO 676 .BR scc-cc (1)