SSD — Simple Syntax Description

SSD is a simple meta language for defining the syntax of programming languages. The advantages of SSD as a notation are that it

SSD has been used as the input language of the altgen parser generator.

An Informal Overview of SSD

The best way to introduce SSD is with an example. Here we specify the concrete syntax of a small programming language featuring variables and functions. All expressions are numeric; no distinction is made between integers and floats. There is a unary negation operator and the usual four binary operators (+, −, *, and /).

ID       -> \p{L}[\p{L}\d_]*
NUMLIT   -> \d+(\.\d+)?([Ee][+-]?\d+)?
STRLIT   -> "([^\p{Cc}]|\\[^\p{Cc}])*"
SKIP     -> [\p{Zs}\t\n\r]|//.*
PROGRAM  => BLOCK
BLOCK    => (DEC ";")* (STMT ";")+
DEC      => "var" ID ("=" EXP)?
         |  "fun" ID "(" IDLIST? ")" "=" EXP
STMT     => ID "=" EXP
         |  "read" IDLIST
         |  "write" EXPLIST
         |  "while" EXP "do" BLOCK "end"
IDLIST   => ID ("," ID)*
EXPLIST  => EXP ("," EXP)*
EXP      => TERM (("+" | "-") TERM)*
TERM     => FACTOR (("*" | "/") FACTOR)*
FACTOR   => NUMLIT | STRLIT | ID | CALL | "(" EXP ")"
CALL     => ID "(" EXPLIST? ")"

Note the following

Note how the token set for the language defined in SSD is inferred from both the microsyntax rules and the appearance of token literals in the macrosyntax. It is not necessary to create a list of tokens. Tokens that appear directly in the macrosyntax will override those defined with regexes in the microsyntax; in the example above, "while" is its own kind of token, and not an ID because it was given literally in the macrosyntax.

Compiler generators using SSD can easily manufacture scanners (lexical analyzers) by directly using the SSD regexes. For languages with complex token formation rules, SSD allows a kind of preprocessing of regexes. For example

ID        ->  \p{L}[\p{L}\d_]*
NUMLIT    ->  \d+(\.\d+)?([Ee][+-]?\d+)?
HEX       ->  [0-0a-fA-F]
ESCAPE    ->  \\[nt'"\\]|\\#HEX#{1,8};
CHAR      ->  [^\p{Cc}'"\\]|#ESCAPE#
CHARLIT   ->  '(#CHAR#|")'
STRINGLIT ->  "(#CHAR#|')"
SKIP      ->  [\x20\x09\x0A\x0D]|//.*

Here HEX, ESCAPE, and CHAR, are not tokens, but rather macro-like regex fragments that are interpolated into other token rules. SSD does not allow any recursion in the use of these fragments.

SSD supports the definition of abstract syntax trees as well. TODO!

Examples

The following sketches of SSD are attempts at describing various existing and experimental languages. They are not substitutes for official specifications and may contain bugs; they are provided as-is without warranties, etc.

SSD for Iki

Iki is a very small language. Here is a complete syntactic description, including the abstract syntax, using the original SSD notation:

The most recent version of SSD makes three changes to the original:

  1. Rather than a custom microsyntactic language, we make use of existing and well-known regular expression notations. Regexes are introduced with the "@" symbol.
  2. We no longer permit raw regexes on the right hand sides of macrosyntax rules.
  3. We no longer require bindings for the production of abstract syntax tree expressions. Values in node expressions can now be inferred.
ID       -> \p{L}[\p{L}\d_]*
NUMLIT   -> \d+(\.\d+)?([Ee][+-]?\d+)?
STRLIT   -> "([^\p{Cc}]|\\[^\p{Cc}])*"
SKIP     -> [\p{Zs}\t\n\r]|//.*
PROGRAM  => BLOCK {Program block}
BLOCK    => (decs::DEC ";")* (stmts::STMT ";")+ {Block decs stmts}
DEC      => "var" ID ("=" EXP)? {Var id exp}
         |  "fun" ID "(" params:IDLIST? ")" "=" EXP {Fun id params exp}
STMT     => ID "=" EXP {Assign id exp}
         |  "read" ids:IDLIST {Read ids}
         |  "write" EXPLIST {Write explist}
         |  "while" EXP "do" BLOCK "end" {While exp block}
IDLIST   => id::ID ("," id::ID)*
EXPLIST  => exp::EXP ("," exp::EXP)*
EXP      => t1:TERM (o:[+-] t2:TERM t1:{Bin o t1 t2})*
TERM     => f1:FACTOR (o:[*/] f2:FACTOR f1:{Bin o f1 f2})*
FACTOR   => value:NUMLIT {Numlit value}
         |  value:STRLIT {Strlit value}
         |  ID {Ref id}
         |  CALL
         |  "(" EXP ")" ^exp
CALL     => ID "(" args:EXPLIST? ")" {Call id args}

Because of the implicit naming of fields, we have had to introduce an ^-expression. This expression isn't parsed; it exists to make the "last evaluation" rule for the RHS work in certain cases. We are looking for cleaner ways to handle cases like this.

SSD for Carlos

Carlos is a small, simple imperative language. It is statically typed with functions that can be nested and overloaded. Arrays and structs are heap-allocated and automatically garbage collected.

ID          ->  \p{L}[\p{L}\d_]*
FLOATLIT    ->  \d+(\.\d+)?([Ee][+-]?\d+)?
ESCAPE      ->  \\[nt'"\\]|\\[0-9a-fA-F]{1,8};
CHAR        ->  [^\p{Cc}'"\\]|#ESCAPE#
CHARLIT     ->  '(#CHAR#|")'
STRINGLIT   ->  "(#CHAR#|')"
SKIP        ->  [\x20\x09\x0A\x0D]|//.*
PROGRAM     =>  STMT+
STMT        =>  DEC
            |   ASSIGNMENT ";"
            |   CALL ";"
            |   "break" ";"
            |   "return" EXP? ";"
            |   "print" ARGS ";"
            |   "if" "(" EXP ")" BLOCK ("else" "if" "(" EXP ")" BLOCK)* ("else" BLOCK)?
            |   "while" "(" EXP ")" BLOCK
            |   "for" "(" (TYPE ID "=" EXP)? ";" EXP? ";" ASSIGNMENT? ")" BLOCK
ASSIGNMENT  =>  INCREMENT | VAR "=" EXP
INCREMENT   =>  INCOP VAR | VAR INCOP
DEC         =>  TYPEDEC | VARDEC | FUNDEC
TYPEDEC     =>  "struct" ID "{" (TYPE ID ";")* "}"
TYPE        =>  ("boolean" | "char" | "int" | "real" | "string" | ID) ("[" "]")?
VARDEC      =>  TYPE ID ("=" EXP)? ";"
FUNDEC      =>  (TYPE | "void") ID "(" PARAMS ")" BLOCK
PARAMS      =>  (TYPE ID ("," TYPE ID)*)?
BLOCK       =>  "{" STMT* "}"
EXP         =>  EXP1 ("||" EXP1)*
EXP1        =>  EXP2 ("&&" EXP2)*
EXP2        =>  EXP3 ("|" EXP3)*
EXP3        =>  EXP4 ("^" EXP4)*
EXP4        =>  EXP5 ("&" EXP5)*
EXP5        =>  EXP6 (RELOP EXP6)?
EXP6        =>  EXP7 (SHIFTOP EXP7)*
EXP7        =>  EXP8 (ADDOP EXP8)*
EXP8        =>  EXP9 (MULOP EXP9)*
EXP9        =>  PREFIXOP? EXP10
EXP10       =>  LITERAL
            |   VAR
            |   INCREMENT
            |   NEWOBJECT
            |   "(" EXP ")"
LITERAL     =>  "null"
            |   "true"
            |   "false"
            |   NUMLIT
            |   CHARLIT
            |   STRINGLIT
VAR         =>  (ID | CALL) ("[" EXP "]" | "." ID)?
NEWOBJECT   =>  "new" ID "{" ARGS "}"
            |   "new" TYPE "[" "]" "{" ARGS "}"
            |   "new" TYPE ("[" EXP "]")+
CALL        =>  ID "(" ARGS ")"
ARGS        =>  (EXP ("," EXP)*)?
RELOP       ->  <=?|==|!=|>=?
SHIFTOP     ->  <<|>>
ADDOP       ->  [+-]
MULOP       ->  [*/%]
PREFIXOP    ->  [-!~]|char|int|string|length
INCOP       ->  \+\+|--

SSD for C89 (Partial)

The following is a syntax for C89 as given by Kernighan and Ritchie in "The C Programming Language, 2nd edition" (Prentice Hall, 1988), Section A13, pages 234-239. Kernighan and Ritchie left out the microsyntax, so we added in rules based on a common sense reading of their text (Section A2).

For now, we have left out the preprocessor directives and the trigraph sequences.

H          -> [0-9A-Fa-f]
ID         -> [A-Za-z_]([A-Za-z\d_])*
DECINTLIT  -> [1-9][0-9]*
OCTINTLIT  -> 0[0-7]*
HEXINTLIT  -> 0[Xx]#H#+
INTLIT     -> (#DECINTLIT#|#OCTINTLIT#|#HEXINTLIT#)([Uu][Ll]?|[Ll][Uu]?)?
EXP        -> [Ee][+-]?\d+
FLOATLIT   -> (\d+#EXP#|\.\d+#EXP#?|\d+\.\d+#EXP#?)[FfLl]?
ENUMLIT    -> ID
ESCAPE     -> \\(['"?\abfnrtv]|OO?O?|x#H##H#)
CHARLIT    -> '([^'\x0A]|#ESCAPE#)'
STRINGLIT  -> "([^"\x0A]|#ESCAPE#)*"
LITERAL    -> #INTLIT#|#FLOATLIT#|#ENUMLIT#|#CHARLIT#|#STRINGLIT#
SKIP       -> [\x09-\x0C\x20]|/\*([^*]|([*]+[^*/]))*[*]+/

TRANSLATION_UNIT      => EXTERNAL_DECL+
EXTERNAL_DECL         => DECL | FUNCTION_DEF
FUNCTION_DEF          => DECL_SPECIFIER* DECLARATOR DECL* COMPOUND_STMT
DECL                  => DECL_SPECIFIER+ INIT_DECLARATORS? ';'
DECL_SPECIFIER        => STORAGE_CLASS_SPEC | TYPE_SPEC | TYPE_QUALIFIER
STORAGE_CLASS_SPEC    => 'auto' | 'register' | 'static' | 'extern' | 'typedef'
TYPE_SPEC             => 'void' | 'char' | 'short' | 'int' | 'long' | 'float'
                      |  'double' | 'signed' | 'unsigned'
                      |  STRUCT_OR_UNION_SPEC | ENUM_SPEC | TYPEDEF_NAME
TYPE_QUALIFIER        => 'const' | 'volatile'
STRUCT_OR_UNION_SPEC  => ('struct' | 'union') (ID | ID? STRUCT_OR_UNION_BODY)
STRUCT_OR_UNION_BODY  => '{' STRUCT_DECL+ '}'
INIT_DECLARATORS      => INIT_DECLARATOR (',' INIT_DECLARATOR)*
INIT_DECLARATOR       => DECLARATOR ('=' INITIALIZER)?
STRUCT_DECL           => (TYPE_SPEC | TYPE_QUALIFIER)+ STRUCT_DECLARATORS ';'
STRUCT_DECLARATORS    => STRUCT_DECLARATOR (',' STRUCT_DECLARATOR)*
STRUCT_DECLARATOR     => DECLARATOR (':' CONST_EXPR)? | ':' CONST_EXPR
ENUM_SPECIFIER        => 'enum' (ID | ENUM_BODY | ID ENUM_BODY)
ENUM_BODY             => '{' ENUMERATOR (',' ENUMERATOR)* '}'
ENUMERATOR            => ID ('=' CONST_EXPR)?
DECLARATOR            => POINTER? DIRECT_DECLARATOR
DIRECT_DECLARATOR     => (ID | '(' DECLARATOR ')' )
                             ('[' CONST_EXPR? ']' | '(' PARAM_TYPES ')'
                             | '(' IDS ')')*
POINTER               => ('*' TYPE_QUALIFIER*)+
PARAM_TYPES           => PARAMS (',' '...')?
PARAMS                => PARAM_DECL (',' PARAM_DECL)*
PARAM_DECL            => DECL_SPECIFIER+ (DECLARATOR | ABS_DECLARATOR?)
IDS                   => ID (',' ID)*
INITIALIZER           => EXP1 ('{' INITIALIZERS [,]? '}')?
INITIALIZERS          => INITIALIZER (',' INITIALIZER)*
TYPE_NAME             => (TYPE_SPEC | TYPE_QUALIFIER)+ ABS_DECLARATOR?
ABS_DECLARATOR        => POINTER? DIRECT_ABS_DECLARATOR
DIRECT_ABS_DECLARATOR => '(' ABS_DECLARATOR ')'
                             ( '[' CONST_EXPR? ']' | '(' PARAM_TYPES ')')*
TYPEDEF_NAME          => ID
STMT                  => LABELED_STMT | EXPR_STMT | COMPOUND_STMT | SELECTION_STMT
                      |  ITERATION_STMT | JUMP_STMT
LABELED_STMT          => (ID | 'case' CONST_EXPR | 'default') ':' STMT
EXPR_STMT             => EXP? ';'
COMPOUND_STMT         => '{' DECL* STMT* '}'
SELECTION_STMT        => 'if' '(' EXP ')' STMT ('else' STMT)?
                      |  'switch' '(' EXP ')' STMT
ITERATION_STMT        => 'while' '(' EXP ')' STMT
                      |  'do' STMT 'while' '(' EXP ')' ';'
                      |  'for' '(' EXP? ';' EXP? ';' EXP? ')' STMT
JUMP_STMT             => 'goto' ID ';' | 'continue' ';' | 'break' ';' | 'return' EXP? ';'
EXP                   => EXP1 (',' EXP1)*
EXP1                  => (EXP14 ASSIGNOP)* EXP2
EXP2                  => EXP3 ('?' EXP ':' EXP2)?
EXP3                  => EXP4 ('||' EXP4)*
EXP4                  => EXP5 ('&&' EXP5)*
EXP5                  => EXP6 ('|' EXP6)*
EXP6                  => EXP7 ('^' EXP7)*
EXP7                  => EXP8 ('&' EXP8)*
EXP8                  => EXP9 (('==' | '!=') EXP9)*
EXP9                  => EXP10 (('<' | '>' | '<=' | '>=') EXP10)*
EXP10                 => EXP11 (('<<' | '>>') EXP11)*
EXP11                 => EXP12 ([+-] EXP12)*
EXP12                 => EXP13 ([*/%] EXP13)*
EXP13                 => ('(' TYPE_NAME ')')* EXP14
EXP14                 => ('++' | '--' | 'sizeof')*
                         (EXP15 | UNARYOP EXP13 | 'sizeof' '(' TYPE_NAME ')')
EXP15                 => EXP16 ('++' | '--' | '.' ID | '->' ID | '[' EXP ']'
                             | '(' ARGS? ')')*
EXP16                 => ID | LITERAL | '(' EXP ')'
ARGS                  => EXP1 (',' EXP1)*
ASSIGNOP              => '=' | '*=' | '/=' | '%=' | '+=' | '-='
                      |  '<<=' | '>>=' | '&=' | '^=' | '|='
UNARYOP               => '&' | '*' | '+' | '-' | '~' | '!'

SSD for JSON

JSON is a data interchange format, designed by Douglas Crockford. The SSD for JSON is only six lines long. This means that JSON is spectacular, not that SSD is.

STRING -> "([^"\\\p{Cc}]|\\["\/bfnrt]|\\u[0-9A-Fa-f]{4})*"
NUMBER -> -?(0|[1-9]\d*)(\.\d+)?([Ee][+-]?\d+)?
SKIP   -> \x20\x09\x0A\x0D
VALUE  => OBJECT | ARRAY | STRING | NUMBER | 'true' | 'false' | 'null'
OBJECT => '{' (STRING ':' VALUE (',' STRING ':' VALUE)*)? '}'
ARRAY  => '[' (VALUE (',' VALUE)*)? ']'

SSD for JavaScript Good Parts Subset

Coming Soon

SSD for SSD

Here's a description of the description, using its own formalism.

NAME      -> <L> [<L><Nd>_]*
KEYWORD   -> 'SKIP'
HEX       -> [0-9A-Fa-f]
CODEPOINT -> '#' HEX^2 | '##' HEX^4 | '###' HEX^8
CATEGORY  -> '<' [A-Z] [a-z]? '>'
STRING    -> ['] [^'<Cc>]* [']
SET       -> '[' [^<Cc>]+ ']'
SKIP      -> [<Zs>#09#0A#0D] | '//' [^#0A#0D]* [#0A#0D]
DOC       => MICRORULE+ SKIPRULE? MACRORULE+
MICRORULE => NAME '->' MICROEXP
SKIPRULE  => 'SKIP' '->' MICROEXP
MICROEXP  => MICROEXP1 ('|' MICROEXP1 )*
MICROEXP1 => MICROEXP2+
MICROEXP2 => MICROEXP3 [?*+]? | MICROEXP3 '^' <Nd>+ (',' <Nd>+)?
MICROEXP3 => PRIMARY | '(' MICROEXP ')'
MACRORULE => NAME '=>' MACROEXP
MACROEXP  => MACROEXP1 ('|' MACROEXP1)*
MACROEXP1 => (MACROEXP2 ASTNODE?)+
MACROEXP2 => MACROEXP3 [?*+]? | MACROEXP3 '^' <Nd>+ (',' <Nd>+)?
MACROEXP3 => TAG? PRIMARY | '(' MACROEXP ')'
TAG       => NAME [:=]
ASTNODE   => TAG? '{' NAME+ '}'
PRIMARY   => STRING | SET | CATEGORY | CODEPOINT | NAME