Graph-In-Fracture Transformer with a Kolmogorov-Arnold Superposition Layer
A graph foundation model for discrete fracture networks in porous media.
GIFT-KASTL combines a graph transformer backbone for graph-wide representation learning with a structured downstream refinement layer inspired by the Kolmogorov-Arnold Superposition Theorem. The goal is to model complex fracture-network data in a way that is expressive, scientifically meaningful, and architecturally disciplined.
GIFT-KASTL is a graph foundation model for discrete fracture networks in porous media. The system combines a graph transformer backbone for global graph representation learning with a Kolmogorov-Arnold Superposition Layer (KASTL) applied at the output stage for structured nonlinear refinement. This design aims to produce scientifically meaningful predictions on graph-structured fracture data while preserving a clear mathematical interpretation of the downstream refinement mechanism.
Discrete fracture networks in porous media naturally form graph-structured scientific data, where prediction depends on both local relationships and long-range graph connectivity. Standard pipelines often struggle to capture that structure cleanly. GIFT-KASTL is built to address this challenge with a two-stage design: a graph transformer learns graph-wide representations, and a KASTL refinement layer introduces structured nonlinear composition at the output stage.
The project targets porous media and fracture-network data, where geometry, connectivity, and interaction structure matter simultaneously.
The graph transformer acts as the global representation learner, aggregating information across the fracture graph before prediction refinement.
KASTL gives the model a distinctive downstream refinement mechanism inspired by the Kolmogorov-Arnold Superposition Theorem.
GIFT-KASTL begins with graph fracture data and preprocessing, then uses a graph transformer backbone to learn global graph representations. The final prediction is refined through a Kolmogorov-Arnold-inspired downstream layer applied at the output stage.
This view highlights the internal flow of the proposed foundation model. Graph-structured fracture data is first encoded into node and edge representations, processed through a message-passing and graph transformer backbone, and then refined by the KASTL module to produce the final scientific prediction.
Proposed foundation model architecture. The pipeline begins with graph fracture data, builds node and edge representations, applies graph transformer message passing and training, and uses KASTL as the final structured refinement layer before prediction. This figure emphasizes the two-part system design: graph-wide representation learning first, theorem-inspired nonlinear refinement second.
GIFT-KASTL uses a graph transformer as its first-stage representation learner. Starting from node-feature matrix \(X^{(\ell)}\), each layer computes attention-weighted interactions through projected query, key, and value maps, following the standard transformer mechanism introduced by Vaswani et al.
For graph-structured data, this attention mechanism is adapted so that feature propagation reflects graph connectivity and structural information rather than treating the input as an unstructured sequence. In this way, the graph transformer aggregates global relational structure across the fracture network before the downstream KASTL layer performs structured nonlinear refinement.
The final stage of GIFT-KASTL introduces a structured refinement layer inspired by the Kolmogorov–Arnold Superposition Theorem. Rather than relying only on the raw transformer output, the model applies a second stage of nonlinear functional composition, giving the system a mathematically structured way to refine graph-based predictions.
For a continuous multivariate function \( f : [0,1]^n \to \mathbb{R} \), the Kolmogorov superposition principle gives the representation
where \( \mathcal{O}_j \) are one-dimensional outer functions and \( \mathcal{I}_{ij} \) are one-dimensional inner functions. In GIFT-KASTL, the inner and outer functions are chosen as follows:
Here \(E_k\) denotes the Euler secant numbers. These choices define the nonlinear composition mechanism used in the KASTL refinement stage. We apply the KAST for a single variable input so that the tuning on all predicted values can happen effectively and easily, thus avoiding any mathematical function complexities. The relation with the choice of inner functions and outer functions in our application is unique. This means that an inappropriate choice of such functions can lead to a bad tuning of the forecast values, leading to huge and unpleasant errors.
Inner function. The inner function introduces bounded nonlinear shaping before aggregation, transforming latent graph features into a structured intermediate form.
Outer function. The outer function composes the transformed representation and governs the final output geometry of the refinement stage.
Inner-function heatmap. A visual summary of how the inner-stage transformation behaves across its input domain.
Outer-function heatmap. A complementary view of the output-stage composition used by the KASTL refinement layer.
KASTL architecture. Multiple KASTL units are composed and aggregated through a structured summation-and-adjustment stage to refine the final output of the graph transformer backbone.
Following the global aggregation performed by the graph transformer, GIFT-KASTL applies a structured nonlinear refinement inspired by the Kolmogorov–Arnold superposition theorem. Instead of relying purely on learned multilayer compositions, KASTL decomposes transformations into structured inner and outer functions, introducing an explicit functional prior into the learning process.
This formulation allows KASTL to act as a refinement operator over learned representations, enabling structured nonlinear interactions while maintaining interpretability and functional control. In contrast to generic feedforward layers, this approach introduces mathematically grounded inductive bias into the output stage of the model.
Standard pipelines rely on generic feedforward layers for nonlinear transformation. KASTL replaces this with a structured composition of inner and outer functions, providing a disciplined way to refine predictions after the graph transformer stage.
The KASTL layer is motivated by the Kolmogorov–Arnold Superposition Theorem, which shows that multivariate functions can be represented through compositions of one-dimensional functions. This gives the architecture a principled mathematical foundation.
Graph-based scientific data often exhibits structured interactions rather than arbitrary nonlinear behavior. KASTL introduces a functional composition bias that aligns naturally with physical and spatial processes in fracture networks.
Instead of learning arbitrary nonlinear mappings, GIFT-KASTL introduces a structured functional decomposition that aligns with both mathematical theory and scientific modeling needs.
KASTL is not introduced as a decorative add-on. It is a structured refinement stage designed to bring theorem-inspired nonlinear composition into the output of the graph transformer. In this project, the layer is motivated by both mathematical form and practical modeling needs in scientific graph data.
The figure highlights six reasons for adopting the KASTL layer: adaptability, ease of use, high predictive accuracy, stability, probabilistic interpretability, and finite approximation structure. Together, these properties make KASTL a compelling downstream refinement mechanism for graph-based scientific machine learning.
To understand the intrinsic structure of fracture-network data, we analyze the singular value decomposition of node feature matrices across the DFN dataset. This provides a direct view of how spatial and geometric attributes contribute to the learned representation space.
Across the dataset, the singular spectrum remains balanced rather than collapsing into a dominant low-rank mode. This indicates that DFN node features do not admit trivial dimensional reduction: the geometry is distributed across multiple active feature directions rather than being dominated by a single coordinate or attribute.
Interactive singular-vector structure. High-weight regions indicate strong feature–mode alignment, while low-weight regions indicate weak or orthogonal contribution. Mixed-sign regions show that dominant modes are formed by coupled combinations of features rather than axis-aligned coordinates.
Mean singular spectrum. The gradual decay of normalized singular values shows that no single mode dominates the feature geometry. Instead, spectral mass is distributed across all principal directions.
Cumulative energy. The cumulative spectrum requires all components to capture most of the variance, reinforcing that DFN node features are consistently full-rank rather than strongly compressible.
Effective rank distribution. Across the dataset, effective rank remains concentrated at the full feature dimension, indicating a stable, dataset-wide pattern rather than an isolated example.
Representative example. A typical graph exhibits the same balanced spectral profile seen in the aggregate statistics, confirming that the dataset-level behavior is reflected at the level of individual realizations.
GIFT-KASTL is designed as a two-stage scientific graph learning pipeline. A graph transformer backbone first learns graph-wide representations from discrete fracture network data, and the KASTL layer then refines the final prediction through a structured nonlinear composition inspired by the Kolmogorov-Arnold Superposition Theorem. The current project page focuses on the architectural and mathematical identity of the system, while broader experiment panels and ablations can be layered in as the project evolves.
import torch
import torch.nn as nn
class KASTLLayer(nn.Module):
"""
Kolmogorov-Arnold Superposition Theorem Layer
used as a structured refinement stage after the graph transformer.
"""
def __init__(self, scale: float = 1.0):
super().__init__()
self.scale = scale
def inner_function(self, x: torch.Tensor) -> torch.Tensor:
return torch.atan(torch.sinh(x))
def outer_function(self, x: torch.Tensor) -> torch.Tensor:
# Truncated outer expansion inspired by the theorem-driven construction
return (
x
+ x**3 / 6
+ x**5 / 24
+ 61 * x**7 / 5040
+ 277 * x**9 / 72576
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
z = self.inner_function(x)
y = self.outer_function(z)
return self.scale * y
The poster provides a broader view of the system design, mathematical formulation, and current experimental material for GIFT-KASTL. The project page acts as the polished entry point; the poster can carry the denser technical details until additional experiments and benchmarks are added here.
These interactive plots summarize the footprint of the fracture-network datasets and the scaling behavior of the pipeline with graph size and connectivity.
These figures collectively illustrate a central principle: the complexity of learning on fracture networks is governed not only by graph size, but by structural and dynamical properties of the graph itself. While graph transformers provide expressive global representations, their computational and predictive behavior is shaped by topology, sparsity, and horizon-dependent error propagation. The KASTL refinement layer is designed precisely to stabilize these effects.
These figures illustrate that learning on fracture networks is governed not only by graph size, but also by structural topology, connectivity density, and horizon-dependent error propagation.
Nodes vs edges. Each point is a graph instance, colored by wall time.
The near-linear growth between nodes and edges indicates that DFN graphs preserve a stable connectivity regime as system size increases. This suggests that the underlying physical generation process induces structured, non-random graph topology rather than arbitrary densification. Importantly, wall-time variation across similar graph sizes reveals that topology—not just size—controls computational cost, highlighting the role of connectivity patterns in downstream learning complexity.
Wall time vs nodes. Interactive scaling with graph size. Most graphs remain in a controlled compute regime, with a small number of structurally harder outliers.
Although graph transformers are often associated with quadratic complexity in the number of nodes, the observed wall-time growth remains controlled across most of the dataset. This suggests that the DFN graphs occupy a structured regime in which sparsity, batching, and regularity mitigate worst-case scaling behavior. The visible outliers indicate that graph size alone does not determine compute cost; certain node configurations induce disproportionately expensive processing.
Wall time vs edges. Scaling with graph connectivity. Connectivity structure contributes to runtime variability and acts as a secondary driver of compute.
Edge count introduces an additional source of computational variability beyond node count alone. Since edges govern relational information flow and message passing structure, denser connectivity can amplify the cost of representation learning even for graphs of similar size. This indicates that the effective complexity of DFN learning depends not only on graph scale, but also on how connectivity is distributed across the network.
Prediction error by horizon. Distribution of prediction error across targets. Error spreads as horizon increases, motivating structured refinement for stable downstream prediction.
The widening error distribution across prediction horizons reflects the accumulation of approximation error as predictions are extended further from the observed regime. This is consistent with operator-learning and iterative forecasting settings, where small representation inaccuracies compound over time. In this perspective, the KASTL refinement layer is not merely an architectural addition, but a mechanism aimed at stabilizing downstream prediction under increasing horizon depth.
@misc{giftkastl,
title = {GIFT-KASTL: Graph-In-Fracture Transformer tuned by Kolmogorov-Arnold Superposition Theorem Layer, a novel graph foundation model for scientific porous media data},
author = {Himanshu Singh},
note = {Project website and technical materials},
year = {2026},
url = {https://himanshuvnm.github.io/gift-kastl/}
}