Efficient Configuration Storage with Sum-Product Trees
Overview
When dealing with combinatorial problems, the number of valid configurations can grow exponentially with problem size. The SumProductTree
data structure provides a memory-efficient solution for storing and sampling from these large configuration spaces.
The Sum-Product Tree Approach
A SumProductTree
is a specialized data structure that:
- Uses polynomial memory to store an exponential number of configurations
- Represents configurations as a sum-product expression tree
- Enables lazy evaluation through depth-first search
- Supports efficient directed sampling from the configuration space
This approach is particularly valuable when working with large graphs where storing all configurations explicitly would be prohibitively expensive.
Example: Independent Sets in Large Graphs
Let's examine how to use a SumProductTree
for a large random regular graph:
using GenericTensorNetworks
graph = random_regular_graph(70, 3)
problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA())
tree = solve(problem, ConfigsAll(; tree_storage=true))[]
For this 70-vertex graph, storing all independent sets explicitly would require approximately 256 TB of storage! However, the SumProductTree
representation requires only a fraction of this memory.
Sampling from the Configuration Space
One of the most powerful features of the SumProductTree
is its ability to generate unbiased random samples from the configuration space:
samples = generate_samples(tree, 1000)
This generates 1000 random independent set configurations from our graph, allowing us to analyze statistical properties without enumerating the entire solution space.
Statistical Analysis: Hamming Distance Distribution
With these samples, we can compute useful properties such as the Hamming distance distribution between configurations. The Hamming distance measures how many bit positions differ between two configurations.
using CairoMakie
dist = hamming_distribution(samples, samples)
# Create a bar plot of the distribution
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming Distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig
This visualization reveals the structure of the solution space by showing how similar or dissimilar the independent set configurations tend to be to each other.
Applications
The SumProductTree
approach is particularly valuable for:
- Analyzing very large problem instances
- Estimating statistical properties of solution spaces
- Performing Monte Carlo sampling for approximation algorithms
- Studying the structure of configuration spaces without exhaustive enumeration
By combining compact representation with efficient sampling, SumProductTree
enables analysis of problem instances that would otherwise be computationally intractable.
julia> using GenericTensorNetworks
julia> graph = random_regular_graph(70, 3)
{70, 105} undirected simple Int64 graph
julia> problem = GenericTensorNetwork(IndependentSet(graph); optimizer=TreeSA());
julia> tree = solve(problem, ConfigsAll(; tree_storage=true))[]
+ (count = 1.7253411893466e13) ├─ + (count = 1.3102024693332e13) │ ├─ + (count = 8.952936330692e12) │ │ ├─ * (count = 2.848983824037e12) │ │ │ ├─ OnehotVec{70, 2}(45, 1) │ │ │ └─ + (count = 2.848983824037e12) │ │ │ ├─ * (count = 9.03818096225e11) │ │ │ │ ⋮ │ │ │ │ │ │ │ └─ + (count = 1.945165727812e12) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 6.103952506655e12) │ │ ├─ * (count = 1.952565306521e12) │ │ │ ├─ OnehotVec{70, 2}(37, 1) │ │ │ └─ + (count = 1.952565306521e12) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 4.151387200134e12) │ │ ├─ * (count = 9.2000819509e10) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 4.059386380625e12) │ │ ⋮ │ │ │ └─ * (count = 4.14908836264e12) │ ├─ + (count = 4.14908836264e12) │ │ ├─ * (count = 1.305217020082e12) │ │ │ ├─ OnehotVec{70, 2}(45, 1) │ │ │ └─ + (count = 1.305217020082e12) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 2.843871342558e12) │ │ ├─ * (count = 8.96804839853e11) │ │ │ ⋮ │ │ │ │ │ └─ + (count = 1.947066502705e12) │ │ ⋮ │ │ │ └─ OnehotVec{70, 2}(64, 1) └─ * (count = 4.151387200134e12) ├─ * (count = 4.151387200134e12) │ ├─ OnehotVec{70, 2}(68, 1) │ └─ * (count = 4.151387200134e12) │ ├─ * (count = 1.0) │ │ ├─ OnehotVec{70, 2}(68, 1) │ │ └─ OnehotVec{70, 2}(68, 1) │ └─ + (count = 4.151387200134e12) │ ├─ * (count = 9.2000819509e10) │ │ ⋮ │ │ │ └─ + (count = 4.059386380625e12) │ ⋮ │ └─ OnehotVec{70, 2}(68, 1)
If one wants to store these configurations, he will need a hard disk of size 256 TB! However, this sum-product binary tree structure supports efficient and unbiased direct sampling.
julia> samples = generate_samples(tree, 1000)
1000-element Vector{StaticBitVector{70, 2}}: 0111000001000100011101110000001010001000000010000100000101000000100011 1000100000101000011000110011001010001000100010000010010010100000010000 0100010000010100010001000100000110101001100010000010010000100100000001 0000011000110000010000000001000010101000000010001011011010111000100011 0110000000110001010100100001000010001001000010000000000000000010010011 0001010000000100010000000100101011001001000110011000000000000000110001 0010101000100000011010000010000100001000100010101100010010110000000011 1000100000000000011001000010000001001000010110101011000110010000010001 1100001000110100111010000101100000001000101111001000000000000100000001 0010000100111000011001100100000100001000000010000000010000010010010010 ⋮ 1000100000010000100000000000010000010100011001000001000000000000000101 1000101010000000100000000000000000000100000000111001000101000000010100 1100000001101000000100110010011000000010101100000100000001001000000100 0010100000000000000000001010000000110000010000000001010100011010100100 0000100100000000000000010010110100110000011001010000001100000100000101 0100000000101100000000010000110100110000101001001100001000000100000100 0010000111000001000000010000000000000000000000110000000100000010100101 0010000000010000000101000011010000000010010000001100100000000000000100 0100011000010001000001001000000101000000000000001000101000001010000100
With these samples, one can already compute useful properties like Hamming distance (see hamming_distribution
) distribution. The following code visualizes this distribution with CairoMakie
.
using CairoMakie
dist = hamming_distribution(samples, samples)
# bar plot
fig = Figure()
ax = Axis(fig[1, 1]; xlabel="Hamming distance", ylabel="Frequency")
barplot!(ax, 0:length(dist)-1, dist)
fig
