A parse tree (or syntax tree) comes from formal grammars in Computer Science. It’s a tree representation showing how an input string is derived from a grammar—basically, how the string is structured according to rules.
What is a parse tree?
A parse tree:
• Starts from a start symbol (like Expr)
• Expands using grammar rules
• Ends in terminals (actual characters/tokens like 1, +, *)
Example grammar (simple math)
Expr → Term ((+|-) Term)*
Term → Factor ((*|/) Factor)*
Factor → Number | '(' Expr ')'
Number → [0-9]+
Why parse trees matter?
• Compilers (e.g. Roslyn)
• Interpreters
• Expression evaluators
• Query languages
• Config parsers
Implementing a simple parser in C#
We’ll build:
• A tokenizer
• A recursive descent parser
• A parse tree
1. Define node types
abstract class Node { }
class NumberNode : Node
{
public int Value;
public NumberNode(int value) => Value = value;
}
class BinaryOpNode : Node
{
public string Op;
public Node Left, Right;
public BinaryOpNode(string op, Node left, Node right)
{
Op = op;
Left = left;
Right = right;
}
}
2. Tokenizer
enum TokenType { Number, Plus, Minus, Mul, Div, LParen, RParen, EOF }
class Token
{
public TokenType Type;
public string Value;
public Token(TokenType type, string value = "")
{
Type = type;
Value = value;
}
}
class Lexer
{
private readonly string _text;
private int _pos;
public Lexer(string text) => _text = text;
public Token GetNextToken()
{
while (_pos < _text.Length)
{
char c = _text[_pos];
if (char.IsWhiteSpace(c)) { _pos++; continue; }
if (char.IsDigit(c))
{
int start = _pos;
while (_pos < _text.Length && char.IsDigit(_text[_pos]))
_pos++;
return new Token(TokenType.Number, _text[start.._pos]);
}
_pos++;
return c switch
{
'+' => new Token(TokenType.Plus),
'-' => new Token(TokenType.Minus),
'*' => new Token(TokenType.Mul),
'/' => new Token(TokenType.Div),
'(' => new Token(TokenType.LParen),
')' => new Token(TokenType.RParen),
_ => throw new Exception($"Invalid char: {c}")
};
}
return new Token(TokenType.EOF);
}
}
3. Recursive descent parser
class Parser
{
private readonly Lexer _lexer;
private Token _current;
public Parser(Lexer lexer)
{
_lexer = lexer;
_current = _lexer.GetNextToken();
}
private void Eat(TokenType type)
{
if (_current.Type == type)
_current = _lexer.GetNextToken();
else
throw new Exception("Unexpected token");
}
public Node Parse() => Expr();
private Node Expr()
{
var node = Term();
while (_current.Type == TokenType.Plus || _current.Type == TokenType.Minus)
{
var op = _current;
Eat(op.Type);
node = new BinaryOpNode(op.Type.ToString(), node, Term());
}
return node;
}
private Node Term()
{
var node = Factor();
while (_current.Type == TokenType.Mul || _current.Type == TokenType.Div)
{
var op = _current;
Eat(op.Type);
node = new BinaryOpNode(op.Type.ToString(), node, Factor());
}
return node;
}
private Node Factor()
{
if (_current.Type == TokenType.Number)
{
var value = int.Parse(_current.Value);
Eat(TokenType.Number);
return new NumberNode(value);
}
if (_current.Type == TokenType.LParen)
{
Eat(TokenType.LParen);
var node = Expr();
Eat(TokenType.RParen);
return node;
}
throw new Exception("Invalid syntax");
}
}
4. Using it
var parser = new Parser(new Lexer("2 + 3 * 4"));
Node tree = parser.Parse();
5. Evaluate the tree
int Evaluate(Node node)
{
return node switch
{
NumberNode n => n.Value,
BinaryOpNode b => b.Op switch
{
"Plus" => Evaluate(b.Left) + Evaluate(b.Right),
"Minus" => Evaluate(b.Left) - Evaluate(b.Right),
"Mul" => Evaluate(b.Left) * Evaluate(b.Right),
"Div" => Evaluate(b.Left) / Evaluate(b.Right),
_ => throw new Exception()
},
_ => throw new Exception()
};
}
Key takeaway
• Grammar defines rules
• Parser applies rules
• Parse tree shows structure
• Structure determines meaning (like operator precedence)
Formal definition
A parse tree for a grammar G is a tree where
• the root is the start symbol for G
• the interior nodes are the nonterminals of G
• the leaf nodes are the terminal symbols of G.
• the children of a node T (from left to right) correspond to the symbols on the right hand side of some production for T in G.
Another Parse Tree Implementation
According to this definition, let's create a parse tree in C# programming language.
We will first, define a terminal interface.
public interface ITerminal
{
}
Secondly, we will define a class for interior nodes that are nonterminals of G.
public class Nonterminal : ITerminal // Equation, interior node
{
public string Equation { get; set; }
public ITerminal LeftChild { get; set; }
public ITerminal RightChild { get; set; }
public Nonterminal(string equation, ITerminal left, ITerminal right)
{
Equation = equation;
LeftChild = left;
RightChild = right;
}
public override string ToString()
{
return String.Format("{0} {1} {2}", LeftChild, Equation, RightChild);
}
}
Lastly, we will define a class for leaf nodes that are terminals of G.
public class Terminal : ITerminal // Value, leaf node
{
public double Value { get; set; }
public Terminal(double val)
{
Value = val;
}
public override string ToString()
{
return String.Format("{0}", Value);
}
}
Using these classes, we can create a parse tree for any given string.
C# Parse Tree Example
Example: Given the following grammar, create a parse tree for the string "1 + 2 * 3":
var node2 = new Terminal(2);
var node3 = new Terminal(3);
var equation23 = new Nonterminal("*", node2, node3);
var node1 = new Terminal(1);
var root = new Nonterminal("+", node1, equation23);
Console.WriteLine(root);
Then, the output will be:
1 + 2 * 3