Efficient Configuration Storage with Sum-Product Trees
Overview
When dealing with combinatorial problems, the number of valid configurations can grow exponentially with problem size. The SumProductTree
data structure provides a memory-efficient solution for storing and sampling from these large configuration spaces.
The Sum-Product Tree Approach
A SumProductTree
is a specialized data structure that:
- Uses polynomial memory to store an exponential number of configurations
- Represents configurations as a sum-product expression tree
- Enables lazy evaluation through depth-first search
- Supports efficient directed sampling from the configuration space
This approach is particularly valuable when working with large graphs where storing all configurations explicitly would be prohibitively expensive.
Example: Independent Sets in Large Graphs
Let's examine how to use a SumProductTree
for a large random regular graph:
using GenericTensorNetworks
graph = random_regular_graph(70, 3)
problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA())
tree = solve(problem, ConfigsAll(; tree_storage=true))[]
For this 70-vertex graph, storing all independent sets explicitly would require approximately 256 TB of storage! However, the SumProductTree
representation requires only a fraction of this memory.
Sampling from the Configuration Space
One of the most powerful features of the SumProductTree
is its ability to generate unbiased random samples from the configuration space:
samples = generate_samples(tree, 1000)
This generates 1000 random independent set configurations from our graph, allowing us to analyze statistical properties without enumerating the entire solution space.
Statistical Analysis: Hamming Distance Distribution
With these samples, we can compute useful properties such as the Hamming distance distribution between configurations. The Hamming distance measures how many bit positions differ between two configurations.
using CairoMakie
dist = hamming_distribution(samples, samples)
# Create a bar plot of the distribution
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming Distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig
This visualization reveals the structure of the solution space by showing how similar or dissimilar the independent set configurations tend to be to each other.
Applications
The SumProductTree
approach is particularly valuable for:
- Analyzing very large problem instances
- Estimating statistical properties of solution spaces
- Performing Monte Carlo sampling for approximation algorithms
- Studying the structure of configuration spaces without exhaustive enumeration
By combining compact representation with efficient sampling, SumProductTree
enables analysis of problem instances that would otherwise be computationally intractable.
julia> using GenericTensorNetworks
julia> graph = random_regular_graph(70, 3)
{70, 105} undirected simple Int64 graph
julia> problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA());
julia> tree = solve(problem, ConfigsAll(; tree_storage=true))[]
+ (count = 1.7013023062945e13) ├─ + (count = 1.2927158771414e13) │ ├─ + (count = 8.786513589163e12) │ │ ├─ * (count = 2.734834451016e12) │ │ │ ├─ + (count = 2.734834451016e12) │ │ │ │ ├─ * (count = 8.5124841493e11) │ │ │ │ │ ⋮ │ │ │ │ │ │ │ │ │ └─ + (count = 1.883586036086e12) │ │ │ │ ⋮ │ │ │ │ │ │ │ └─ * (count = 1.0) │ │ │ ├─ OnehotVec{70, 2}(24, 1) │ │ │ └─ OnehotVec{70, 2}(24, 1) │ │ └─ + (count = 6.051679138147e12) │ │ ├─ * (count = 1.911033955896e12) │ │ │ ├─ * (count = 1.0) │ │ │ │ ⋮ │ │ │ │ │ │ │ └─ + (count = 1.911033955896e12) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 4.140645182251e12) │ │ ├─ * (count = 3.93063450134e11) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 3.747581732117e12) │ │ ⋮ │ │ │ └─ * (count = 4.140645182251e12) │ ├─ OnehotVec{70, 2}(8, 1) │ └─ * (count = 4.140645182251e12) │ ├─ * (count = 4.140645182251e12) │ │ ├─ OnehotVec{70, 2}(8, 1) │ │ └─ + (count = 4.140645182251e12) │ │ ⋮ │ │ │ └─ * (count = 1.0) │ ├─ OnehotVec{70, 2}(8, 1) │ └─ OnehotVec{70, 2}(8, 1) └─ * (count = 4.085864291531e12) ├─ * (count = 1.0) │ ├─ OnehotVec{70, 2}(48, 1) │ └─ OnehotVec{70, 2}(48, 1) └─ + (count = 4.085864291531e12) ├─ * (count = 1.303378308615e12) │ ├─ + (count = 1.303378308615e12) │ │ ├─ * (count = 4.20468046361e11) │ │ │ ⋮ │ │ │ │ │ └─ * (count = 8.82910262254e11) │ │ ⋮ │ │ │ └─ * (count = 1.0) │ ├─ OnehotVec{70, 2}(24, 1) │ └─ OnehotVec{70, 2}(24, 1) └─ + (count = 2.782485982916e12) ├─ * (count = 9.12855360321e11) │ ├─ * (count = 1.0) │ │ ⋮ │ │ │ └─ * (count = 9.12855360321e11) │ ⋮ │ └─ * (count = 1.869630622595e12) ├─ + (count = 1.869630622595e12) │ ⋮ │ └─ * (count = 1.0) ⋮
If one wants to store these configurations, he will need a hard disk of size 256 TB! However, this sum-product binary tree structure supports efficient and unbiased direct sampling.
julia> samples = generate_samples(tree, 1000)
1000-element Vector{StaticBitVector{70, 2}}: 0000001000000000000010111000000000011111100000000100010000000010001100 0010001000000000000000110001000000011001110000001110000000110010000100 0001000000000000000000110011000100010001000001000000110000100110010100 0000001000000011110000110001100010110111000000000000000000010110100100 0000000011000000100001010010000000111001000010000101001010100010000100 0100110011000000001001010010000000011100000000000011000000000010000100 0001110000010001000000010001000100110001110000000000000000000111010100 0000001000000100000011110000101100011101001001001000000000000010000010 1100000000010110000101110000000100010000101000001000110100000110000010 1101000000000010011011111000000000010000001010000000110100000010000010 ⋮ 0110000001001000110100000000000010000000000001010000010010000100100000 0000100001010011010000000000001010000100010101010001000000000100101001 0010100010110001000000000110010000100000010000010010000000000000000000 0100000000000100000000000000010000000000000010010001001010010000010000 0000000000001000010000000010100100000000001000010010100000100000100000 1000011010000000001000000000000010001000001110010010000100000000010001 0111110000111010011000000010010010000000011000011000000000000001100000 1000000000000010000100001000010000000000100010110000101100000000010001 1000100001000100000100000101001000000111110000010000001100000000001010
With these samples, one can already compute useful properties like Hamming distance (see hamming_distribution
) distribution. The following code visualizes this distribution with CairoMakie
.
using CairoMakie
dist = hamming_distribution(samples, samples)
# bar plot
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig
