C# Parse Tree

C# Parse Tree

A parse tree (or syntax tree) comes from formal grammars in Computer Science. It’s a tree representation showing how an input string is derived from a grammar—basically, how the string is structured according to rules.

What is a parse tree?

A parse tree:

• Starts from a start symbol (like Expr)
• Expands using grammar rules
• Ends in terminals (actual characters/tokens like 1, +, *)

Example grammar (simple math)

Expr   → Term ((+|-) Term)*
Term   → Factor ((*|/) Factor)*
Factor → Number | '(' Expr ')'
Number → [0-9]+

Why parse trees matter?

• Compilers (e.g. Roslyn)
• Interpreters
• Expression evaluators
• Query languages
• Config parsers

Implementing a simple parser in C#

We’ll build:

• A tokenizer
• A recursive descent parser
• A parse tree

1. Define node types

abstract class Node { }

class NumberNode : Node
{
    public int Value;
    public NumberNode(int value) => Value = value;
}

class BinaryOpNode : Node
{
    public string Op;
    public Node Left, Right;

    public BinaryOpNode(string op, Node left, Node right)
    {
        Op = op;
        Left = left;
        Right = right;
    }
}

2. Tokenizer

enum TokenType { Number, Plus, Minus, Mul, Div, LParen, RParen, EOF }

class Token
{
    public TokenType Type;
    public string Value;

    public Token(TokenType type, string value = "")
    {
        Type = type;
        Value = value;
    }
}

class Lexer
{
    private readonly string _text;
    private int _pos;

    public Lexer(string text) => _text = text;

    public Token GetNextToken()
    {
        while (_pos < _text.Length)
        {
            char c = _text[_pos];

            if (char.IsWhiteSpace(c)) { _pos++; continue; }

            if (char.IsDigit(c))
            {
                int start = _pos;
                while (_pos < _text.Length && char.IsDigit(_text[_pos]))
                    _pos++;
                return new Token(TokenType.Number, _text[start.._pos]);
            }

            _pos++;
            return c switch
            {
                '+' => new Token(TokenType.Plus),
                '-' => new Token(TokenType.Minus),
                '*' => new Token(TokenType.Mul),
                '/' => new Token(TokenType.Div),
                '(' => new Token(TokenType.LParen),
                ')' => new Token(TokenType.RParen),
                _ => throw new Exception($"Invalid char: {c}")
            };
        }

        return new Token(TokenType.EOF);
    }
}

3. Recursive descent parser

class Parser
{
    private readonly Lexer _lexer;
    private Token _current;

    public Parser(Lexer lexer)
    {
        _lexer = lexer;
        _current = _lexer.GetNextToken();
    }

    private void Eat(TokenType type)
    {
        if (_current.Type == type)
            _current = _lexer.GetNextToken();
        else
            throw new Exception("Unexpected token");
    }

    public Node Parse() => Expr();

    private Node Expr()
    {
        var node = Term();

        while (_current.Type == TokenType.Plus || _current.Type == TokenType.Minus)
        {
            var op = _current;
            Eat(op.Type);
            node = new BinaryOpNode(op.Type.ToString(), node, Term());
        }

        return node;
    }

    private Node Term()
    {
        var node = Factor();

        while (_current.Type == TokenType.Mul || _current.Type == TokenType.Div)
        {
            var op = _current;
            Eat(op.Type);
            node = new BinaryOpNode(op.Type.ToString(), node, Factor());
        }

        return node;
    }

    private Node Factor()
    {
        if (_current.Type == TokenType.Number)
        {
            var value = int.Parse(_current.Value);
            Eat(TokenType.Number);
            return new NumberNode(value);
        }

        if (_current.Type == TokenType.LParen)
        {
            Eat(TokenType.LParen);
            var node = Expr();
            Eat(TokenType.RParen);
            return node;
        }

        throw new Exception("Invalid syntax");
    }
}

4. Using it

var parser = new Parser(new Lexer("2 + 3 * 4"));
Node tree = parser.Parse();

5. Evaluate the tree

int Evaluate(Node node)
{
    return node switch
    {
        NumberNode n => n.Value,
        BinaryOpNode b => b.Op switch
        {
            "Plus" => Evaluate(b.Left) + Evaluate(b.Right),
            "Minus" => Evaluate(b.Left) - Evaluate(b.Right),
            "Mul" => Evaluate(b.Left) * Evaluate(b.Right),
            "Div" => Evaluate(b.Left) / Evaluate(b.Right),
            _ => throw new Exception()
        },
        _ => throw new Exception()
    };
}

Key takeaway

• Grammar defines rules
• Parser applies rules
• Parse tree shows structure
• Structure determines meaning (like operator precedence)

Formal definition

A parse tree for a grammar G is a tree where

• the root is the start symbol for G
• the interior nodes are the nonterminals of G
• the leaf nodes are the terminal symbols of G.
• the children of a node T (from left to right) correspond to the symbols on the right hand side of some production for T in G.

Another Parse Tree Implementation

According to this definition, let's create a parse tree in C# programming language.

We will first, define a terminal interface.

public interface ITerminal
{
}

Secondly, we will define a class for interior nodes that are nonterminals of G.

public class Nonterminal : ITerminal // Equation, interior node
{
    public string Equation { get; set; }
    public ITerminal LeftChild { get; set; }
    public ITerminal RightChild { get; set; }

    public Nonterminal(string equation, ITerminal left, ITerminal right)
    {
        Equation = equation;
        LeftChild = left;
        RightChild = right;
    }

    public override string ToString()
    {
        return String.Format("{0} {1} {2}", LeftChild, Equation, RightChild);
    }
}

Lastly, we will define a class for leaf nodes that are terminals of G.

public class Terminal : ITerminal // Value, leaf node
{
    public double Value { get; set; }

    public Terminal(double val)
    {
        Value = val;
    }

    public override string ToString()
    {
        return String.Format("{0}", Value);
    }
}

Using these classes, we can create a parse tree for any given string.

C# Parse Tree Example

Example: Given the following grammar, create a parse tree for the string "1 + 2 * 3":

var node2 = new Terminal(2);
var node3 = new Terminal(3);

var equation23 = new Nonterminal("*", node2, node3);
var node1 = new Terminal(1);

var root = new Nonterminal("+", node1, equation23);
Console.WriteLine(root);

Then, the output will be:

1 + 2 * 3

Contents related to 'C# Parse Tree'

C# Equation and Formula Parser
C# Equation and Formula Parser
Break Parallel Foreach Earlier
Break Parallel Foreach Earlier
How to implement Alpha–beta pruning in C#?
How to implement Alpha–beta pruning in C#?