Efficient Configuration Storage with Sum-Product Trees

Overview

When dealing with combinatorial problems, the number of valid configurations can grow exponentially with problem size. The SumProductTree data structure provides a memory-efficient solution for storing and sampling from these large configuration spaces.

The Sum-Product Tree Approach

A SumProductTree is a specialized data structure that:

Uses polynomial memory to store an exponential number of configurations
Represents configurations as a sum-product expression tree
Enables lazy evaluation through depth-first search
Supports efficient directed sampling from the configuration space

This approach is particularly valuable when working with large graphs where storing all configurations explicitly would be prohibitively expensive.

Example: Independent Sets in Large Graphs

Let's examine how to use a SumProductTree for a large random regular graph:

using GenericTensorNetworks
graph = random_regular_graph(70, 3)
problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA())
tree = solve(problem, ConfigsAll(; tree_storage=true))[]

For this 70-vertex graph, storing all independent sets explicitly would require approximately 256 TB of storage! However, the SumProductTree representation requires only a fraction of this memory.

Sampling from the Configuration Space

One of the most powerful features of the SumProductTree is its ability to generate unbiased random samples from the configuration space:

samples = generate_samples(tree, 1000)

This generates 1000 random independent set configurations from our graph, allowing us to analyze statistical properties without enumerating the entire solution space.

Statistical Analysis: Hamming Distance Distribution

With these samples, we can compute useful properties such as the Hamming distance distribution between configurations. The Hamming distance measures how many bit positions differ between two configurations.

using CairoMakie
dist = hamming_distribution(samples, samples)

# Create a bar plot of the distribution
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming Distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig

This visualization reveals the structure of the solution space by showing how similar or dissimilar the independent set configurations tend to be to each other.

Applications

The SumProductTree approach is particularly valuable for:

Analyzing very large problem instances
Estimating statistical properties of solution spaces
Performing Monte Carlo sampling for approximation algorithms
Studying the structure of configuration spaces without exhaustive enumeration

By combining compact representation with efficient sampling, SumProductTree enables analysis of problem instances that would otherwise be computationally intractable.

julia> using GenericTensorNetworks
julia> graph = random_regular_graph(70, 3){70, 105} undirected simple Int64 graph
julia> problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA());
julia> tree = solve(problem, ConfigsAll(; tree_storage=true))[]+ (count = 1.7253411893466e13)
├─ + (count = 1.3102024693332e13)
│  ├─ + (count = 8.952936330692e12)
│  │  ├─ * (count = 2.848983824037e12)
│  │  │  ├─ OnehotVec{70, 2}(45, 1)
│  │  │  └─ + (count = 2.848983824037e12)
│  │  │     ├─ * (count = 9.03818096225e11)
│  │  │     │  ⋮
│  │  │     │
│  │  │     └─ + (count = 1.945165727812e12)
│  │  │        ⋮
│  │  │
│  │  └─ + (count = 6.103952506655e12)
│  │     ├─ * (count = 1.952565306521e12)
│  │     │  ├─ OnehotVec{70, 2}(37, 1)
│  │     │  └─ + (count = 1.952565306521e12)
│  │     │     ⋮
│  │     │
│  │     └─ + (count = 4.151387200134e12)
│  │        ├─ * (count = 9.2000819509e10)
│  │        │  ⋮
│  │        │
│  │        └─ + (count = 4.059386380625e12)
│  │           ⋮
│  │
│  └─ * (count = 4.14908836264e12)
│     ├─ + (count = 4.14908836264e12)
│     │  ├─ * (count = 1.305217020082e12)
│     │  │  ├─ OnehotVec{70, 2}(45, 1)
│     │  │  └─ + (count = 1.305217020082e12)
│     │  │     ⋮
│     │  │
│     │  └─ + (count = 2.843871342558e12)
│     │     ├─ * (count = 8.96804839853e11)
│     │     │  ⋮
│     │     │
│     │     └─ + (count = 1.947066502705e12)
│     │        ⋮
│     │
│     └─ OnehotVec{70, 2}(64, 1)
└─ * (count = 4.151387200134e12)
   ├─ * (count = 4.151387200134e12)
   │  ├─ OnehotVec{70, 2}(68, 1)
   │  └─ * (count = 4.151387200134e12)
   │     ├─ * (count = 1.0)
   │     │  ├─ OnehotVec{70, 2}(68, 1)
   │     │  └─ OnehotVec{70, 2}(68, 1)
   │     └─ + (count = 4.151387200134e12)
   │        ├─ * (count = 9.2000819509e10)
   │        │  ⋮
   │        │
   │        └─ + (count = 4.059386380625e12)
   │           ⋮
   │
   └─ OnehotVec{70, 2}(68, 1)

If one wants to store these configurations, he will need a hard disk of size 256 TB! However, this sum-product binary tree structure supports efficient and unbiased direct sampling.

julia> samples = generate_samples(tree, 1000)1000-element Vector{StaticBitVector{70, 2}}:
 0111000001000100011101110000001010001000000010000100000101000000100011
 1000100000101000011000110011001010001000100010000010010010100000010000
 0100010000010100010001000100000110101001100010000010010000100100000001
 0000011000110000010000000001000010101000000010001011011010111000100011
 0110000000110001010100100001000010001001000010000000000000000010010011
 0001010000000100010000000100101011001001000110011000000000000000110001
 0010101000100000011010000010000100001000100010101100010010110000000011
 1000100000000000011001000010000001001000010110101011000110010000010001
 1100001000110100111010000101100000001000101111001000000000000100000001
 0010000100111000011001100100000100001000000010000000010000010010010010
 ⋮
 1000100000010000100000000000010000010100011001000001000000000000000101
 1000101010000000100000000000000000000100000000111001000101000000010100
 1100000001101000000100110010011000000010101100000100000001001000000100
 0010100000000000000000001010000000110000010000000001010100011010100100
 0000100100000000000000010010110100110000011001010000001100000100000101
 0100000000101100000000010000110100110000101001001100001000000100000100
 0010000111000001000000010000000000000000000000110000000100000010100101
 0010000000010000000101000011010000000010010000001100100000000000000100
 0100011000010001000001001000000101000000000000001000101000001010000100

With these samples, one can already compute useful properties like Hamming distance (see hamming_distribution) distribution. The following code visualizes this distribution with CairoMakie.

using CairoMakie
dist = hamming_distribution(samples, samples)
# bar plot
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig