Efficient Configuration Storage with Sum-Product Trees

Overview

When dealing with combinatorial problems, the number of valid configurations can grow exponentially with problem size. The SumProductTree data structure provides a memory-efficient solution for storing and sampling from these large configuration spaces.

The Sum-Product Tree Approach

A SumProductTree is a specialized data structure that:

  • Uses polynomial memory to store an exponential number of configurations
  • Represents configurations as a sum-product expression tree
  • Enables lazy evaluation through depth-first search
  • Supports efficient directed sampling from the configuration space

This approach is particularly valuable when working with large graphs where storing all configurations explicitly would be prohibitively expensive.

Example: Independent Sets in Large Graphs

Let's examine how to use a SumProductTree for a large random regular graph:

using GenericTensorNetworks
graph = random_regular_graph(70, 3)
problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA())
tree = solve(problem, ConfigsAll(; tree_storage=true))[]

For this 70-vertex graph, storing all independent sets explicitly would require approximately 256 TB of storage! However, the SumProductTree representation requires only a fraction of this memory.

Sampling from the Configuration Space

One of the most powerful features of the SumProductTree is its ability to generate unbiased random samples from the configuration space:

samples = generate_samples(tree, 1000)

This generates 1000 random independent set configurations from our graph, allowing us to analyze statistical properties without enumerating the entire solution space.

Statistical Analysis: Hamming Distance Distribution

With these samples, we can compute useful properties such as the Hamming distance distribution between configurations. The Hamming distance measures how many bit positions differ between two configurations.

using CairoMakie
dist = hamming_distribution(samples, samples)

# Create a bar plot of the distribution
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming Distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig

This visualization reveals the structure of the solution space by showing how similar or dissimilar the independent set configurations tend to be to each other.

Applications

The SumProductTree approach is particularly valuable for:

  • Analyzing very large problem instances
  • Estimating statistical properties of solution spaces
  • Performing Monte Carlo sampling for approximation algorithms
  • Studying the structure of configuration spaces without exhaustive enumeration

By combining compact representation with efficient sampling, SumProductTree enables analysis of problem instances that would otherwise be computationally intractable.

julia> using GenericTensorNetworks
julia> graph = random_regular_graph(70, 3){70, 105} undirected simple Int64 graph
julia> problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA());
julia> tree = solve(problem, ConfigsAll(; tree_storage=true))[]+ (count = 1.7013023062945e13) ├─ + (count = 1.2927158771414e13) │ ├─ + (count = 8.786513589163e12) │ │ ├─ * (count = 2.734834451016e12) │ │ │ ├─ + (count = 2.734834451016e12) │ │ │ │ ├─ * (count = 8.5124841493e11) │ │ │ │ │ ⋮ │ │ │ │ │ │ │ │ │ └─ + (count = 1.883586036086e12) │ │ │ │ ⋮ │ │ │ │ │ │ │ └─ * (count = 1.0) │ │ │ ├─ OnehotVec{70, 2}(24, 1) │ │ │ └─ OnehotVec{70, 2}(24, 1) │ │ └─ + (count = 6.051679138147e12) │ │ ├─ * (count = 1.911033955896e12) │ │ │ ├─ * (count = 1.0) │ │ │ │ ⋮ │ │ │ │ │ │ │ └─ + (count = 1.911033955896e12) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 4.140645182251e12) │ │ ├─ * (count = 3.93063450134e11) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 3.747581732117e12) │ │ ⋮ │ │ │ └─ * (count = 4.140645182251e12) │ ├─ OnehotVec{70, 2}(8, 1) │ └─ * (count = 4.140645182251e12) │ ├─ * (count = 4.140645182251e12) │ │ ├─ OnehotVec{70, 2}(8, 1) │ │ └─ + (count = 4.140645182251e12) │ │ ⋮ │ │ │ └─ * (count = 1.0) │ ├─ OnehotVec{70, 2}(8, 1) │ └─ OnehotVec{70, 2}(8, 1) └─ * (count = 4.085864291531e12) ├─ * (count = 1.0) │ ├─ OnehotVec{70, 2}(48, 1) │ └─ OnehotVec{70, 2}(48, 1) └─ + (count = 4.085864291531e12) ├─ * (count = 1.303378308615e12) │ ├─ + (count = 1.303378308615e12) │ │ ├─ * (count = 4.20468046361e11) │ │ │ ⋮ │ │ │ │ │ └─ * (count = 8.82910262254e11) │ │ ⋮ │ │ │ └─ * (count = 1.0) │ ├─ OnehotVec{70, 2}(24, 1) │ └─ OnehotVec{70, 2}(24, 1) └─ + (count = 2.782485982916e12) ├─ * (count = 9.12855360321e11) │ ├─ * (count = 1.0) │ │ ⋮ │ │ │ └─ * (count = 9.12855360321e11) │ ⋮ │ └─ * (count = 1.869630622595e12) ├─ + (count = 1.869630622595e12) │ ⋮ │ └─ * (count = 1.0) ⋮

If one wants to store these configurations, he will need a hard disk of size 256 TB! However, this sum-product binary tree structure supports efficient and unbiased direct sampling.

julia> samples = generate_samples(tree, 1000)1000-element Vector{StaticBitVector{70, 2}}:
 0000001000000000000010111000000000011111100000000100010000000010001100
 0010001000000000000000110001000000011001110000001110000000110010000100
 0001000000000000000000110011000100010001000001000000110000100110010100
 0000001000000011110000110001100010110111000000000000000000010110100100
 0000000011000000100001010010000000111001000010000101001010100010000100
 0100110011000000001001010010000000011100000000000011000000000010000100
 0001110000010001000000010001000100110001110000000000000000000111010100
 0000001000000100000011110000101100011101001001001000000000000010000010
 1100000000010110000101110000000100010000101000001000110100000110000010
 1101000000000010011011111000000000010000001010000000110100000010000010
 ⋮
 0110000001001000110100000000000010000000000001010000010010000100100000
 0000100001010011010000000000001010000100010101010001000000000100101001
 0010100010110001000000000110010000100000010000010010000000000000000000
 0100000000000100000000000000010000000000000010010001001010010000010000
 0000000000001000010000000010100100000000001000010010100000100000100000
 1000011010000000001000000000000010001000001110010010000100000000010001
 0111110000111010011000000010010010000000011000011000000000000001100000
 1000000000000010000100001000010000000000100010110000101100000000010001
 1000100001000100000100000101001000000111110000010000001100000000001010

With these samples, one can already compute useful properties like Hamming distance (see hamming_distribution) distribution. The following code visualizes this distribution with CairoMakie.

using CairoMakie
dist = hamming_distribution(samples, samples)
# bar plot
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig
Example block output