SSD is a simple meta language for defining the syntax of programming languages. The advantages of SSD as a notation are that it
SSD has been used as the input language of the altgen parser generator.
The best way to introduce SSD is with an example. Here we specify the concrete syntax of a small programming language featuring variables and functions. All expressions are numeric; no distinction is made between integers and floats. There is a unary negation operator and the usual four binary operators (+, −, *, and /).
ID -> \p{L}[\p{L}\d_]* NUMLIT -> \d+(\.\d+)?([Ee][+-]?\d+)? STRLIT -> "([^\p{Cc}]|\\[^\p{Cc}])*" SKIP -> [\p{Zs}\t\n\r]|//.* PROGRAM => BLOCK BLOCK => (DEC ";")* (STMT ";")+ DEC => "var" ID ("=" EXP)? | "fun" ID "(" IDLIST? ")" "=" EXP STMT => ID "=" EXP | "read" IDLIST | "write" EXPLIST | "while" EXP "do" BLOCK "end" IDLIST => ID ("," ID)* EXPLIST => EXP ("," EXP)* EXP => TERM (("+" | "-") TERM)* TERM => FACTOR (("*" | "/") FACTOR)* FACTOR => NUMLIT | STRLIT | ID | CALL | "(" EXP ")" CALL => ID "(" EXPLIST? ")"
Note the following
Note how the token set for the language defined in SSD is inferred from both the microsyntax rules and the appearance of token literals in the macrosyntax. It is not necessary to create a list of tokens. Tokens that appear directly in the macrosyntax will override those defined with regexes in the microsyntax; in the example above, "while" is its own kind of token, and not an ID because it was given literally in the macrosyntax.
Compiler generators using SSD can easily manufacture scanners (lexical analyzers) by directly using the SSD regexes. For languages with complex token formation rules, SSD allows a kind of preprocessing of regexes. For example
ID -> \p{L}[\p{L}\d_]* NUMLIT -> \d+(\.\d+)?([Ee][+-]?\d+)? HEX -> [0-0a-fA-F] ESCAPE -> \\[nt'"\\]|\\#HEX#{1,8}; CHAR -> [^\p{Cc}'"\\]|#ESCAPE# CHARLIT -> '(#CHAR#|")' STRINGLIT -> "(#CHAR#|')" SKIP -> [\x20\x09\x0A\x0D]|//.*
Here HEX, ESCAPE, and CHAR, are not tokens, but rather macro-like regex fragments that are interpolated into other token rules. SSD does not allow any recursion in the use of these fragments.
SSD supports the definition of abstract syntax trees as well. TODO!
The following sketches of SSD are attempts at describing various existing and experimental languages. They are not substitutes for official specifications and may contain bugs; they are provided as-is without warranties, etc.
Iki is a very small language. Here is a complete syntactic description, including the abstract syntax, using the original SSD notation:
The most recent version of SSD makes three changes to the original:
ID -> \p{L}[\p{L}\d_]* NUMLIT -> \d+(\.\d+)?([Ee][+-]?\d+)? STRLIT -> "([^\p{Cc}]|\\[^\p{Cc}])*" SKIP -> [\p{Zs}\t\n\r]|//.* PROGRAM => BLOCK {Program block} BLOCK => (decs::DEC ";")* (stmts::STMT ";")+ {Block decs stmts} DEC => "var" ID ("=" EXP)? {Var id exp} | "fun" ID "(" params:IDLIST? ")" "=" EXP {Fun id params exp} STMT => ID "=" EXP {Assign id exp} | "read" ids:IDLIST {Read ids} | "write" EXPLIST {Write explist} | "while" EXP "do" BLOCK "end" {While exp block} IDLIST => id::ID ("," id::ID)* EXPLIST => exp::EXP ("," exp::EXP)* EXP => t1:TERM (o:[+-] t2:TERM t1:{Bin o t1 t2})* TERM => f1:FACTOR (o:[*/] f2:FACTOR f1:{Bin o f1 f2})* FACTOR => value:NUMLIT {Numlit value} | value:STRLIT {Strlit value} | ID {Ref id} | CALL | "(" EXP ")" ^exp CALL => ID "(" args:EXPLIST? ")" {Call id args}
Because of the implicit naming of fields, we have had to introduce an ^-expression. This expression isn't parsed; it exists to make the "last evaluation" rule for the RHS work in certain cases. We are looking for cleaner ways to handle cases like this.
Carlos is a small, simple imperative language. It is statically typed with functions that can be nested and overloaded. Arrays and structs are heap-allocated and automatically garbage collected.
ID -> \p{L}[\p{L}\d_]* FLOATLIT -> \d+(\.\d+)?([Ee][+-]?\d+)? ESCAPE -> \\[nt'"\\]|\\[0-9a-fA-F]{1,8}; CHAR -> [^\p{Cc}'"\\]|#ESCAPE# CHARLIT -> '(#CHAR#|")' STRINGLIT -> "(#CHAR#|')" SKIP -> [\x20\x09\x0A\x0D]|//.* PROGRAM => STMT+ STMT => DEC | ASSIGNMENT ";" | CALL ";" | "break" ";" | "return" EXP? ";" | "print" ARGS ";" | "if" "(" EXP ")" BLOCK ("else" "if" "(" EXP ")" BLOCK)* ("else" BLOCK)? | "while" "(" EXP ")" BLOCK | "for" "(" (TYPE ID "=" EXP)? ";" EXP? ";" ASSIGNMENT? ")" BLOCK ASSIGNMENT => INCREMENT | VAR "=" EXP INCREMENT => INCOP VAR | VAR INCOP DEC => TYPEDEC | VARDEC | FUNDEC TYPEDEC => "struct" ID "{" (TYPE ID ";")* "}" TYPE => ("boolean" | "char" | "int" | "real" | "string" | ID) ("[" "]")? VARDEC => TYPE ID ("=" EXP)? ";" FUNDEC => (TYPE | "void") ID "(" PARAMS ")" BLOCK PARAMS => (TYPE ID ("," TYPE ID)*)? BLOCK => "{" STMT* "}" EXP => EXP1 ("||" EXP1)* EXP1 => EXP2 ("&&" EXP2)* EXP2 => EXP3 ("|" EXP3)* EXP3 => EXP4 ("^" EXP4)* EXP4 => EXP5 ("&" EXP5)* EXP5 => EXP6 (RELOP EXP6)? EXP6 => EXP7 (SHIFTOP EXP7)* EXP7 => EXP8 (ADDOP EXP8)* EXP8 => EXP9 (MULOP EXP9)* EXP9 => PREFIXOP? EXP10 EXP10 => LITERAL | VAR | INCREMENT | NEWOBJECT | "(" EXP ")" LITERAL => "null" | "true" | "false" | NUMLIT | CHARLIT | STRINGLIT VAR => (ID | CALL) ("[" EXP "]" | "." ID)? NEWOBJECT => "new" ID "{" ARGS "}" | "new" TYPE "[" "]" "{" ARGS "}" | "new" TYPE ("[" EXP "]")+ CALL => ID "(" ARGS ")" ARGS => (EXP ("," EXP)*)? RELOP -> <=?|==|!=|>=? SHIFTOP -> <<|>> ADDOP -> [+-] MULOP -> [*/%] PREFIXOP -> [-!~]|char|int|string|length INCOP -> \+\+|--
The following is a syntax for C89 as given by Kernighan and Ritchie in "The C Programming Language, 2nd edition" (Prentice Hall, 1988), Section A13, pages 234-239. Kernighan and Ritchie left out the microsyntax, so we added in rules based on a common sense reading of their text (Section A2).
For now, we have left out the preprocessor directives and the trigraph sequences.
H -> [0-9A-Fa-f] ID -> [A-Za-z_]([A-Za-z\d_])* DECINTLIT -> [1-9][0-9]* OCTINTLIT -> 0[0-7]* HEXINTLIT -> 0[Xx]#H#+ INTLIT -> (#DECINTLIT#|#OCTINTLIT#|#HEXINTLIT#)([Uu][Ll]?|[Ll][Uu]?)? EXP -> [Ee][+-]?\d+ FLOATLIT -> (\d+#EXP#|\.\d+#EXP#?|\d+\.\d+#EXP#?)[FfLl]? ENUMLIT -> ID ESCAPE -> \\(['"?\abfnrtv]|OO?O?|x#H##H#) CHARLIT -> '([^'\x0A]|#ESCAPE#)' STRINGLIT -> "([^"\x0A]|#ESCAPE#)*" LITERAL -> #INTLIT#|#FLOATLIT#|#ENUMLIT#|#CHARLIT#|#STRINGLIT# SKIP -> [\x09-\x0C\x20]|/\*([^*]|([*]+[^*/]))*[*]+/ TRANSLATION_UNIT => EXTERNAL_DECL+ EXTERNAL_DECL => DECL | FUNCTION_DEF FUNCTION_DEF => DECL_SPECIFIER* DECLARATOR DECL* COMPOUND_STMT DECL => DECL_SPECIFIER+ INIT_DECLARATORS? ';' DECL_SPECIFIER => STORAGE_CLASS_SPEC | TYPE_SPEC | TYPE_QUALIFIER STORAGE_CLASS_SPEC => 'auto' | 'register' | 'static' | 'extern' | 'typedef' TYPE_SPEC => 'void' | 'char' | 'short' | 'int' | 'long' | 'float' | 'double' | 'signed' | 'unsigned' | STRUCT_OR_UNION_SPEC | ENUM_SPEC | TYPEDEF_NAME TYPE_QUALIFIER => 'const' | 'volatile' STRUCT_OR_UNION_SPEC => ('struct' | 'union') (ID | ID? STRUCT_OR_UNION_BODY) STRUCT_OR_UNION_BODY => '{' STRUCT_DECL+ '}' INIT_DECLARATORS => INIT_DECLARATOR (',' INIT_DECLARATOR)* INIT_DECLARATOR => DECLARATOR ('=' INITIALIZER)? STRUCT_DECL => (TYPE_SPEC | TYPE_QUALIFIER)+ STRUCT_DECLARATORS ';' STRUCT_DECLARATORS => STRUCT_DECLARATOR (',' STRUCT_DECLARATOR)* STRUCT_DECLARATOR => DECLARATOR (':' CONST_EXPR)? | ':' CONST_EXPR ENUM_SPECIFIER => 'enum' (ID | ENUM_BODY | ID ENUM_BODY) ENUM_BODY => '{' ENUMERATOR (',' ENUMERATOR)* '}' ENUMERATOR => ID ('=' CONST_EXPR)? DECLARATOR => POINTER? DIRECT_DECLARATOR DIRECT_DECLARATOR => (ID | '(' DECLARATOR ')' ) ('[' CONST_EXPR? ']' | '(' PARAM_TYPES ')' | '(' IDS ')')* POINTER => ('*' TYPE_QUALIFIER*)+ PARAM_TYPES => PARAMS (',' '...')? PARAMS => PARAM_DECL (',' PARAM_DECL)* PARAM_DECL => DECL_SPECIFIER+ (DECLARATOR | ABS_DECLARATOR?) IDS => ID (',' ID)* INITIALIZER => EXP1 ('{' INITIALIZERS [,]? '}')? INITIALIZERS => INITIALIZER (',' INITIALIZER)* TYPE_NAME => (TYPE_SPEC | TYPE_QUALIFIER)+ ABS_DECLARATOR? ABS_DECLARATOR => POINTER? DIRECT_ABS_DECLARATOR DIRECT_ABS_DECLARATOR => '(' ABS_DECLARATOR ')' ( '[' CONST_EXPR? ']' | '(' PARAM_TYPES ')')* TYPEDEF_NAME => ID STMT => LABELED_STMT | EXPR_STMT | COMPOUND_STMT | SELECTION_STMT | ITERATION_STMT | JUMP_STMT LABELED_STMT => (ID | 'case' CONST_EXPR | 'default') ':' STMT EXPR_STMT => EXP? ';' COMPOUND_STMT => '{' DECL* STMT* '}' SELECTION_STMT => 'if' '(' EXP ')' STMT ('else' STMT)? | 'switch' '(' EXP ')' STMT ITERATION_STMT => 'while' '(' EXP ')' STMT | 'do' STMT 'while' '(' EXP ')' ';' | 'for' '(' EXP? ';' EXP? ';' EXP? ')' STMT JUMP_STMT => 'goto' ID ';' | 'continue' ';' | 'break' ';' | 'return' EXP? ';' EXP => EXP1 (',' EXP1)* EXP1 => (EXP14 ASSIGNOP)* EXP2 EXP2 => EXP3 ('?' EXP ':' EXP2)? EXP3 => EXP4 ('||' EXP4)* EXP4 => EXP5 ('&&' EXP5)* EXP5 => EXP6 ('|' EXP6)* EXP6 => EXP7 ('^' EXP7)* EXP7 => EXP8 ('&' EXP8)* EXP8 => EXP9 (('==' | '!=') EXP9)* EXP9 => EXP10 (('<' | '>' | '<=' | '>=') EXP10)* EXP10 => EXP11 (('<<' | '>>') EXP11)* EXP11 => EXP12 ([+-] EXP12)* EXP12 => EXP13 ([*/%] EXP13)* EXP13 => ('(' TYPE_NAME ')')* EXP14 EXP14 => ('++' | '--' | 'sizeof')* (EXP15 | UNARYOP EXP13 | 'sizeof' '(' TYPE_NAME ')') EXP15 => EXP16 ('++' | '--' | '.' ID | '->' ID | '[' EXP ']' | '(' ARGS? ')')* EXP16 => ID | LITERAL | '(' EXP ')' ARGS => EXP1 (',' EXP1)* ASSIGNOP => '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|=' UNARYOP => '&' | '*' | '+' | '-' | '~' | '!'
JSON is a data interchange format, designed by Douglas Crockford. The SSD for JSON is only six lines long. This means that JSON is spectacular, not that SSD is.
STRING -> "([^"\\\p{Cc}]|\\["\/bfnrt]|\\u[0-9A-Fa-f]{4})*" NUMBER -> -?(0|[1-9]\d*)(\.\d+)?([Ee][+-]?\d+)? SKIP -> \x20\x09\x0A\x0D VALUE => OBJECT | ARRAY | STRING | NUMBER | 'true' | 'false' | 'null' OBJECT => '{' (STRING ':' VALUE (',' STRING ':' VALUE)*)? '}' ARRAY => '[' (VALUE (',' VALUE)*)? ']'
Coming Soon
Here's a description of the description, using its own formalism.
NAME -> <L> [<L><Nd>_]* KEYWORD -> 'SKIP' HEX -> [0-9A-Fa-f] CODEPOINT -> '#' HEX^2 | '##' HEX^4 | '###' HEX^8 CATEGORY -> '<' [A-Z] [a-z]? '>' STRING -> ['] [^'<Cc>]* ['] SET -> '[' [^<Cc>]+ ']' SKIP -> [<Zs>#09#0A#0D] | '//' [^#0A#0D]* [#0A#0D] DOC => MICRORULE+ SKIPRULE? MACRORULE+ MICRORULE => NAME '->' MICROEXP SKIPRULE => 'SKIP' '->' MICROEXP MICROEXP => MICROEXP1 ('|' MICROEXP1 )* MICROEXP1 => MICROEXP2+ MICROEXP2 => MICROEXP3 [?*+]? | MICROEXP3 '^' <Nd>+ (',' <Nd>+)? MICROEXP3 => PRIMARY | '(' MICROEXP ')' MACRORULE => NAME '=>' MACROEXP MACROEXP => MACROEXP1 ('|' MACROEXP1)* MACROEXP1 => (MACROEXP2 ASTNODE?)+ MACROEXP2 => MACROEXP3 [?*+]? | MACROEXP3 '^' <Nd>+ (',' <Nd>+)? MACROEXP3 => TAG? PRIMARY | '(' MACROEXP ')' TAG => NAME [:=] ASTNODE => TAG? '{' NAME+ '}' PRIMARY => STRING | SET | CATEGORY | CODEPOINT | NAME