scc

simple c99 compiler
git clone git://git.simple-cc.org/scc
Log | Files | Refs | Submodules | README | LICENSE

scc-ir.man (9873B)


      1 .TH SCC-IR 7 scc\-VERSION
      2 .SH NAME
      3 scc-ir \- scc intermediate representation
      4 .SH DESCRIPTION
      5 The scc intermediate representation (IR) is a text-based format
      6 used to communicate between the compiler frontend
      7 .RB ( cc1 )
      8 and the compiler backend
      9 .RB ( cc2 ).
     10 It is designed to be simple and easily parseable:
     11 all types and operators are represented by one or two characters,
     12 so parsing tables can be used to process it.
     13 .PP
     14 The language is composed of lines representing statements.
     15 Each line is composed of tab-separated fields.
     16 Declarations begin in column 0;
     17 expressions and control flow statements begin with a tab character.
     18 When the frontend detects an error,
     19 it closes the output stream.
     20 .SH TYPES
     21 Types are represented with single characters:
     22 .PP
     23 .TS
     24 l l.
     25 B	bool
     26 C	signed 8-bit integer
     27 K	unsigned 8-bit integer
     28 I	signed 16-bit integer
     29 N	unsigned 16-bit integer
     30 W	signed 32-bit integer
     31 Z	unsigned 32-bit integer
     32 Q	signed 64-bit integer
     33 O	unsigned 64-bit integer
     34 J	float
     35 D	double
     36 H	long double
     37 0	void
     38 P	pointer
     39 F	function
     40 E	function with ellipsis
     41 V	array (vector)
     42 U	union
     43 S	struct
     44 1	\fI__builtin_va_arg\fR
     45 .TE
     46 .PP
     47 Aggregate and composed types
     48 .RB ( S ,
     49 .BR U ,
     50 .BR V )
     51 are followed by a numeric identifier
     52 to distinguish between multiple types of the same kind:
     53 .BR S3 ,
     54 .BR V5 ,
     55 .BR U2 .
     56 .PP
     57 The sizes in the table above are nominal.
     58 Actual sizes depend on the target architecture.
     59 For example, on amd64-sysv,
     60 .B int
     61 is 32-bit and uses
     62 .BR W ,
     63 while on z80-scc it is 16-bit and uses
     64 .BR I .
     65 .SH STORAGE CLASSES
     66 Storage classes are represented with uppercase letters:
     67 .PP
     68 .TS
     69 l l.
     70 A	automatic (local variable)
     71 R	register
     72 G	global (public, defined in this module)
     73 X	extern (declared in another module)
     74 Y	private (file-scope static)
     75 T	local (function-scope static)
     76 M	struct/union member
     77 L	label
     78 .TE
     79 .PP
     80 A variable name in the IR is composed of a storage class letter
     81 followed by a numeric identifier, for example:
     82 .BR A1 ,
     83 .BR G2 ,
     84 .BR T3 ,
     85 .BR L4 .
     86 .SH DECLARATIONS
     87 .SS Variable declarations
     88 A variable declaration consists of a variable name,
     89 its type, and a quoted source name:
     90 .PP
     91 .RS
     92 .I var
     93 .B \et
     94 .I type
     95 .B \et "
     96 .I name
     97 .RE
     98 .PP
     99 For example:
    100 .PP
    101 .RS
    102 .nf
    103 A4	W	"i
    104 G2	W	"g
    105 X3	P	"ptr
    106 .fi
    107 .RE
    108 .SS Function declarations
    109 Function declarations include the return type
    110 and use
    111 .B F
    112 for the function type
    113 .RB ( E
    114 if the function has an ellipsis parameter):
    115 .PP
    116 .RS
    117 .I var
    118 .B \et
    119 .I return-type
    120 .B \et F \et "
    121 .I name
    122 .RE
    123 .PP
    124 For example:
    125 .PP
    126 .RS
    127 .nf
    128 .ta 8n 16n 24n
    129 G2	W	F	"main
    130 X3	W	E	"printf
    131 T4	0	F	"helper
    132 .fi
    133 .RE
    134 .PP
    135 .B G
    136 marks a public function,
    137 .B T
    138 a file-scope static function, and
    139 .B X
    140 an extern declaration.
    141 .SS Function definitions
    142 A function definition starts with the function declaration,
    143 followed by
    144 .B {
    145 on its own line.
    146 Function parameters are declared inside the body.
    147 A
    148 .B \e
    149 (backslash) on its own line separates
    150 parameters from local variable declarations.
    151 The body ends with
    152 .BR } .
    153 .PP
    154 For example, the C source:
    155 .PP
    156 .RS
    157 .nf
    158 int func(int a, int b) {
    159 	int c;
    160 	return a + b;
    161 }
    162 .fi
    163 .RE
    164 .PP
    165 generates:
    166 .PP
    167 .RS
    168 .nf
    169 .ta 8n 16n 24n
    170 G2	W	F	"func
    171 {
    172 A3	W	"a
    173 A4	W	"b
    174 \e
    175 A6	W	"c
    176 	h	A3	A4	+W
    177 }
    178 .fi
    179 .RE
    180 .SS Struct and union declarations
    181 A struct or union type declaration starts with a header line
    182 containing the type letter and identifier,
    183 a quoted tag name,
    184 a hex-encoded size and a hex-encoded alignment:
    185 .PP
    186 .RS
    187 .I type-id
    188 .B \et "
    189 .I tag
    190 .B \et #
    191 .IR size-letter size
    192 .B \et #
    193 .IR size-letter align
    194 .RE
    195 .PP
    196 Member declarations follow, each including an offset field:
    197 .PP
    198 .RS
    199 .I member-var
    200 .B \et
    201 .I type
    202 .B \et "
    203 .I name
    204 .B \et #
    205 .IR size-letter offset
    206 .RE
    207 .PP
    208 For example, the C source:
    209 .PP
    210 .RS
    211 .nf
    212 struct point {
    213 	int x;
    214 	int y;
    215 };
    216 struct point p;
    217 .fi
    218 .RE
    219 .PP
    220 generates (on amd64-sysv):
    221 .PP
    222 .RS
    223 .nf
    224 .ta 8n 16n 24n
    225 S3	"point	#O8	#O4
    226 M4	W	"x	#O0
    227 M5	W	"y	#O4
    228 G6	S3	"p
    229 .fi
    230 .RE
    231 .PP
    232 Unions use
    233 .B U
    234 instead of
    235 .BR S .
    236 Members of a union typically share offset 0.
    237 .SS Array type declarations
    238 Array types are declared with
    239 .BR V ,
    240 the element type,
    241 and the number of elements in hexadecimal:
    242 .PP
    243 .RS
    244 .nf
    245 .ta 8n 16n 24n
    246 V5	W	#OA
    247 .fi
    248 .RE
    249 .PP
    250 This declares array type V5 with element type
    251 .B W
    252 (signed 32-bit integer) and 0xA (10) elements.
    253 Array variable declarations reference the array type:
    254 .PP
    255 .RS
    256 .nf
    257 .ta 8n 16n 24n
    258 A4	V5	"a
    259 .fi
    260 .RE
    261 .SS Enum declarations
    262 Enumerations are not emitted as types.
    263 Enum variables are emitted with their underlying integer type
    264 (typically
    265 .BR W ):
    266 .PP
    267 .RS
    268 .nf
    269 G7	W	"c
    270 .fi
    271 .RE
    272 .SH INITIALIZERS
    273 When a variable has an initializer,
    274 the declaration line ends without a newline and is followed by
    275 .B (
    276 on the same line.
    277 The initializer expressions follow,
    278 one per line,
    279 and the initializer is closed with
    280 .B )
    281 on its own line.
    282 .PP
    283 For example:
    284 .PP
    285 .RS
    286 .nf
    287 int g = 42;
    288 .fi
    289 .RE
    290 .PP
    291 generates:
    292 .PP
    293 .RS
    294 .nf
    295 .ta 8n 16n 24n
    296 G2	W	"g	(
    297 	#W2A
    298 )
    299 .fi
    300 .RE
    301 .PP
    302 Array and struct initializers list each element:
    303 .PP
    304 .RS
    305 .nf
    306 int a[3] = {1, 2, 3};
    307 .fi
    308 .RE
    309 .PP
    310 generates:
    311 .PP
    312 .RS
    313 .nf
    314 .ta 8n 16n 24n
    315 V3	W	#O3
    316 G2	V3	"a	(
    317 	#W1
    318 	#W2
    319 	#W3
    320 )
    321 .fi
    322 .RE
    323 .PP
    324 String initializers use a quoted form for printable runs
    325 and individual byte constants for non-printable characters:
    326 .PP
    327 .RS
    328 .nf
    329 .ta 8n 16n 24n
    330 	#"hello
    331 	#C0
    332 .fi
    333 .RE
    334 .SH EXPRESSIONS
    335 Expressions are emitted in reverse Polish notation (RPN),
    336 with tab-separated tokens on a single line.
    337 Every operator is followed by a type letter.
    338 .SS Constants
    339 Constants are introduced with
    340 .BR # ,
    341 followed by a type letter and a hexadecimal value:
    342 .PP
    343 .RS
    344 .nf
    345 #W2A
    346 .fi
    347 .RE
    348 .PP
    349 This represents the integer constant 42 (0x2A) of type
    350 .BR W .
    351 .PP
    352 Floating-point constants are emitted as the hexadecimal encoding
    353 of their IEEE 754 representation:
    354 .PP
    355 .RS
    356 .nf
    357 #J3FC00000
    358 #D4004000000000000
    359 .fi
    360 .RE
    361 .PP
    362 These represent float 1.5 and double 2.5, respectively.
    363 .PP
    364 String constants are emitted using
    365 .B #"
    366 for printable character runs:
    367 .PP
    368 .RS
    369 .nf
    370 #"hello
    371 .fi
    372 .RE
    373 .SS Arithmetic operators
    374 .TS
    375 l l.
    376 +	addition
    377 \-	subtraction
    378 *	multiplication
    379 /	division
    380 %	modulo
    381 l	left shift
    382 r	right shift
    383 .TE
    384 .SS Comparison operators
    385 .TS
    386 l l.
    387 <	less than
    388 >	greater than
    389 [	less or equal
    390 ]	greater or equal
    391 \&=	equal
    392 !	not equal
    393 .TE
    394 .SS Bitwise operators
    395 .TS
    396 l l.
    397 &	bitwise and
    398 |	bitwise or
    399 ^	bitwise xor
    400 ~	bitwise complement (unary)
    401 .TE
    402 .SS Logical operators
    403 .TS
    404 l l.
    405 a	logical and (short-circuit)
    406 o	logical or (short-circuit)
    407 n	logical negation
    408 .TE
    409 .SS Unary operators
    410 .TS
    411 l l.
    412 \&_	arithmetic negation
    413 ~	bitwise complement
    414 n	logical negation
    415 \&'	address-of
    416 @	pointer dereference
    417 .TE
    418 .SS Assignment
    419 .TS
    420 l l.
    421 :	assignment
    422 :*	multiply and assign
    423 :/	divide and assign
    424 :%	modulo and assign
    425 :+	add and assign
    426 :\-	subtract and assign
    427 :l	left shift and assign
    428 :r	right shift and assign
    429 :&	bitwise and and assign
    430 :^	bitwise xor and assign
    431 :|	bitwise or and assign
    432 :i	post-increment
    433 :d	post-decrement
    434 .TE
    435 .SS Other operators
    436 .TS
    437 l l.
    438 ,	comma
    439 ?	ternary (conditional)
    440 \&.	struct/union field access
    441 g	type cast (followed by target type letter)
    442 .TE
    443 .SS Function calls
    444 Function calls use
    445 .B p
    446 to push each argument,
    447 .B c
    448 for the call itself,
    449 and
    450 .B z
    451 for calls to variadic functions.
    452 Each is followed by the type of the result:
    453 .PP
    454 .RS
    455 .nf
    456 .ta 8n 16n 24n 32n 40n 48n
    457 	X2	Y9	'P	pP	#W2A	pW	zW
    458 .fi
    459 .RE
    460 .PP
    461 This pushes a pointer argument
    462 .RB ( pP ),
    463 pushes an integer argument
    464 .RB ( pW ),
    465 and calls a variadic function returning
    466 .BR W
    467 .RB ( zW ).
    468 .SS Builtin functions
    469 Builtin function calls use
    470 .B m
    471 as the operator,
    472 preceded by a quoted builtin name:
    473 .PP
    474 .RS
    475 .nf
    476 "__builtin_va_arg	m
    477 .fi
    478 .RE
    479 .SS Expression example
    480 The C expression:
    481 .PP
    482 .RS
    483 .nf
    484 i = j + 2 * 3;
    485 .fi
    486 .RE
    487 .PP
    488 generates (on amd64-sysv):
    489 .PP
    490 .RS
    491 .nf
    492 .ta 8n 16n 24n 32n 40n
    493 	A4	A5	#W6	+W	:W
    494 .fi
    495 .RE
    496 .PP
    497 Note that constant folding has reduced
    498 .I 2*3
    499 to
    500 .IR 6 .
    501 The expression is in RPN:
    502 push A4, push A5, push #W6, add (yielding W), assign (yielding W).
    503 .SS Type casts
    504 Casts are emitted as the operator
    505 .B g
    506 followed by the target type letter.
    507 A cast to
    508 .B void
    509 is not emitted.
    510 For example:
    511 .PP
    512 .RS
    513 .nf
    514 j = (long)i;
    515 .fi
    516 .RE
    517 .PP
    518 generates (on amd64-sysv):
    519 .PP
    520 .RS
    521 .nf
    522 .ta 8n 16n 24n 32n
    523 	A5	A4	gQ	:Q
    524 .fi
    525 .RE
    526 .SH STATEMENTS
    527 .SS Labels
    528 Labels begin in column 0 and consist of
    529 .B L
    530 followed by a numeric identifier:
    531 .PP
    532 .RS
    533 .nf
    534 L3
    535 .fi
    536 .RE
    537 .SS Unconditional jumps
    538 An unconditional jump uses
    539 .B j
    540 followed by a label:
    541 .PP
    542 .RS
    543 .nf
    544 .ta 8n 16n
    545 	j	L3
    546 .fi
    547 .RE
    548 .SS Conditional branches
    549 A conditional branch uses
    550 .BR y ,
    551 followed by a label.
    552 The expression to evaluate follows on the next line.
    553 If the expression evaluates to true (non-zero), the branch is taken:
    554 .PP
    555 .RS
    556 .nf
    557 .ta 8n 16n 24n 32n
    558 	y	L5	A4	#W5	<W
    559 .fi
    560 .RE
    561 .PP
    562 Note that the frontend negates the condition:
    563 the C code
    564 .I "if (i > 5)"
    565 is emitted as a branch on
    566 .IR "i <= 5" ,
    567 jumping past the then-block when the original condition is false.
    568 .SS Return
    569 The return statement uses
    570 .BR h .
    571 If the function returns a value,
    572 the expression follows on the same line:
    573 .PP
    574 .RS
    575 .nf
    576 .ta 8n 16n 24n 32n
    577 	h	A3	A4	+W
    578 .fi
    579 .RE
    580 .PP
    581 A void return is emitted as
    582 .B h
    583 alone, followed by a blank expression line.
    584 .SS Loops
    585 Two markers indicate loop boundaries to the backend:
    586 .PP
    587 .TS
    588 l l.
    589 b	beginning of loop body
    590 e	end of loop body
    591 .TE
    592 .PP
    593 For example, a
    594 .B while
    595 loop:
    596 .PP
    597 .RS
    598 .nf
    599 while (i < 10) { ++i; }
    600 .fi
    601 .RE
    602 .PP
    603 generates:
    604 .PP
    605 .RS
    606 .nf
    607 .ta 8n 16n 24n 32n
    608 	j	L5
    609 L4
    610 	b
    611 	A4	#W1	:+W
    612 L5
    613 	e
    614 	y	L4	A4	#WA	<W
    615 L6
    616 .fi
    617 .RE
    618 .SS Switch statements
    619 A switch statement is bracketed by
    620 .B s
    621 (begin) and
    622 .B t
    623 (end).
    624 The
    625 .B s
    626 marker is followed by the switch expression.
    627 Case entries are emitted with
    628 .BR v ,
    629 and the default entry with
    630 .BR f .
    631 The
    632 .B t
    633 marker takes the label where execution continues after the switch.
    634 .PP
    635 For example:
    636 .PP
    637 .RS
    638 .nf
    639 switch (n+1) {
    640 case 1:
    641 case 2:
    642 case 3:
    643 default:
    644 	++n;
    645 }
    646 .fi
    647 .RE
    648 .PP
    649 generates:
    650 .PP
    651 .RS
    652 .nf
    653 .ta 8n 16n 24n 32n
    654 	s	A3	#W1	+W
    655 	v	L6	#W1
    656 L6
    657 	v	L7	#W2
    658 L7
    659 	v	L8	#W3
    660 L8
    661 	f	L9
    662 L9
    663 	A3	#W1	:+W
    664 	t	L5
    665 L5
    666 .fi
    667 .RE
    668 .PP
    669 Each
    670 .B v
    671 entry is followed by a label and a constant value.
    672 The
    673 .B f
    674 (default) entry is followed by a label only.
    675 .SH SEE ALSO
    676 .BR scc-cc (1)