GIFT-KASTL

Project abstract

A graph foundation model with structured nonlinear refinement.

GIFT-KASTL is a graph foundation model for discrete fracture networks in porous media. The system combines a graph transformer backbone for global graph representation learning with a Kolmogorov-Arnold Superposition Layer (KASTL) applied at the output stage for structured nonlinear refinement. This design aims to produce scientifically meaningful predictions on graph-structured fracture data while preserving a clear mathematical interpretation of the downstream refinement mechanism.

Why this matters

Scientific fracture systems are graph-structured, global, and hard to model well.

Discrete fracture networks in porous media naturally form graph-structured scientific data, where prediction depends on both local relationships and long-range graph connectivity. Standard pipelines often struggle to capture that structure cleanly. GIFT-KASTL is built to address this challenge with a two-stage design: a graph transformer learns graph-wide representations, and a KASTL refinement layer introduces structured nonlinear composition at the output stage.

Scientific setting

The project targets porous media and fracture-network data, where geometry, connectivity, and interaction structure matter simultaneously.

System design

The graph transformer acts as the global representation learner, aggregating information across the fracture graph before prediction refinement.

Mathematical identity

KASTL gives the model a distinctive downstream refinement mechanism inspired by the Kolmogorov-Arnold Superposition Theorem.

Proposed architecture

A closer view of the graph foundation model pipeline.

This view highlights the internal flow of the proposed foundation model. Graph-structured fracture data is first encoded into node and edge representations, processed through a message-passing and graph transformer backbone, and then refined by the KASTL module to produce the final scientific prediction.

Proposed foundation model architecture for GIFT-KASTL

Proposed foundation model architecture. The pipeline begins with graph fracture data, builds node and edge representations, applies graph transformer message passing and training, and uses KASTL as the final structured refinement layer before prediction. This figure emphasizes the two-part system design: graph-wide representation learning first, theorem-inspired nonlinear refinement second.

Graph Transformer Backbone

Attention-based graph-wide representation learning before KASTL refinement (Graph Transformer as the first-stage global representation learner)

GIFT-KASTL uses a graph transformer as its first-stage representation learner. Starting from node-feature matrix \(X^{(\ell)}\), each layer computes attention-weighted interactions through projected query, key, and value maps, following the standard transformer mechanism introduced by Vaswani et al.

\[ \mathrm{Attention}\!\left(X^{(\ell)}\right) := \mathrm{softmax}\!\left(d_K^{-1/2} QK^{\top}\right)V, \] \[ Q := X^{(\ell)}W_Q,\qquad K := X^{(\ell)}W_K,\qquad V := X^{(\ell)}W_V. \]

For graph-structured data, this attention mechanism is adapted so that feature propagation reflects graph connectivity and structural information rather than treating the input as an unstructured sequence. In this way, the graph transformer aggregates global relational structure across the fracture network before the downstream KASTL layer performs structured nonlinear refinement.

In GIFT-KASTL, the graph transformer serves as the global representation learner, while KASTL acts afterward as a theorem-inspired output-stage refinement mechanism.

Selected references

Vaswani et al., Attention Is All You Need, 2017. arXiv
Dwivedi and Bresson, A Generalization of Transformer Networks to Graphs, 2020. arXiv
Ying et al., Do Transformers Really Perform Bad for Graph Representation?, 2021. arXiv

KASTL

Kolmogorov–Arnold Superposition Layer

The final stage of GIFT-KASTL introduces a structured refinement layer inspired by the Kolmogorov–Arnold Superposition Theorem. Rather than relying only on the raw transformer output, the model applies a second stage of nonlinear functional composition, giving the system a mathematically structured way to refine graph-based predictions.

Mathematical Formulation

Kolmogorov-style superposition as a refinement principle

For a continuous multivariate function \( f : [0,1]^n \to \mathbb{R} \), the Kolmogorov superposition principle gives the representation

\[ f(x_1,\ldots,x_n) = \sum_{j=0}^{2n} \mathcal{O}_j \left( \sum_{i=1}^{n}\mathcal{I}_{ij}(x_i) \right), \]

where \( \mathcal{O}_j \) are one-dimensional outer functions and \( \mathcal{I}_{ij} \) are one-dimensional inner functions. In GIFT-KASTL, the inner and outer functions are chosen as follows:

\[ \mathcal{I}_i(x)=\arctan(\sinh x) \] \[ \mathcal{O}_i(x)=\sum_{k=0}^{N<\infty}\frac{|E_k|}{(k+1)!}x^{k+1}, \]

Here \(E_k\) denotes the Euler secant numbers. These choices define the nonlinear composition mechanism used in the KASTL refinement stage. We apply the KAST for a single variable input so that the tuning on all predicted values can happen effectively and easily, thus avoiding any mathematical function complexities. The relation with the choice of inner functions and outer functions in our application is unique. This means that an inappropriate choice of such functions can lead to a bad tuning of the forecast values, leading to huge and unpleasant errors.

Inner function. The inner function introduces bounded nonlinear shaping before aggregation, transforming latent graph features into a structured intermediate form.

Outer function. The outer function composes the transformed representation and governs the final output geometry of the refinement stage.

Inner-function heatmap. A visual summary of how the inner-stage transformation behaves across its input domain.

Outer-function heatmap. A complementary view of the output-stage composition used by the KASTL refinement layer.

KASTL architecture. Multiple KASTL units are composed and aggregated through a structured summation-and-adjustment stage to refine the final output of the graph transformer backbone.

KASTL Refinement Layer

Kolmogorov–Arnold structured transformation as a post-attention refinement

Following the global aggregation performed by the graph transformer, GIFT-KASTL applies a structured nonlinear refinement inspired by the Kolmogorov–Arnold superposition theorem. Instead of relying purely on learned multilayer compositions, KASTL decomposes transformations into structured inner and outer functions, introducing an explicit functional prior into the learning process.

\[ f(x_1,\ldots,x_n) = \sum_{j=0}^{2n} \mathcal{O}_j\!\left( \sum_{i=1}^{n} \mathcal{I}_{ij}(x_i) \right) \]

This formulation allows KASTL to act as a refinement operator over learned representations, enabling structured nonlinear interactions while maintaining interpretability and functional control. In contrast to generic feedforward layers, this approach introduces mathematically grounded inductive bias into the output stage of the model.

Conceptually, GIFT-KASTL separates representation learning and functional refinement: the transformer learns global structure, while KASTL imposes structured nonlinear composition.

Conceptual references

Kolmogorov, A. N., On the representation of continuous functions, 1957.
Arnold, V. I., On functions of three variables, 1957.
Sprecher, D., On the structure of continuous functions, 1965.
Recent neural interpretations of Kolmogorov–Arnold representations in function approximation and neural networks.

Why KASTL

A structured alternative to arbitrary nonlinear layers.

Structured nonlinear refinement

Standard pipelines rely on generic feedforward layers for nonlinear transformation. KASTL replaces this with a structured composition of inner and outer functions, providing a disciplined way to refine predictions after the graph transformer stage.

Theorem-inspired design

The KASTL layer is motivated by the Kolmogorov–Arnold Superposition Theorem, which shows that multivariate functions can be represented through compositions of one-dimensional functions. This gives the architecture a principled mathematical foundation.

Scientific inductive bias

Graph-based scientific data often exhibits structured interactions rather than arbitrary nonlinear behavior. KASTL introduces a functional composition bias that aligns naturally with physical and spatial processes in fracture networks.

Instead of learning arbitrary nonlinear mappings, GIFT-KASTL introduces a structured functional decomposition that aligns with both mathematical theory and scientific modeling needs.

Why KASTL

Why use KASTL instead of a generic nonlinear refinement layer?

KASTL is not introduced as a decorative add-on. It is a structured refinement stage designed to bring theorem-inspired nonlinear composition into the output of the graph transformer. In this project, the layer is motivated by both mathematical form and practical modeling needs in scientific graph data.

Why KASTL figure showing adaptability, ease-of-use, high-end accuracies, stability, probabilistic analysis, and finite approximations

The figure highlights six reasons for adopting the KASTL layer: adaptability, ease of use, high predictive accuracy, stability, probabilistic interpretability, and finite approximation structure. Together, these properties make KASTL a compelling downstream refinement mechanism for graph-based scientific machine learning.

1. Adaptability

2. Easy-to-use

3. High-end accuracies

4. Stable

5. Probabilistic analysis

6. Finite approximations

Spectral structure

Singular-value structure of DFN node features

To understand the intrinsic structure of fracture-network data, we analyze the singular value decomposition of node feature matrices across the DFN dataset. This provides a direct view of how spatial and geometric attributes contribute to the learned representation space.

Across the dataset, the singular spectrum remains balanced rather than collapsing into a dominant low-rank mode. This indicates that DFN node features do not admit trivial dimensional reduction: the geometry is distributed across multiple active feature directions rather than being dominated by a single coordinate or attribute.

Interactive singular-vector structure. High-weight regions indicate strong feature–mode alignment, while low-weight regions indicate weak or orthogonal contribution. Mixed-sign regions show that dominant modes are formed by coupled combinations of features rather than axis-aligned coordinates.

Mean singular spectrum across DFN graphs

Mean singular spectrum. The gradual decay of normalized singular values shows that no single mode dominates the feature geometry. Instead, spectral mass is distributed across all principal directions.

Cumulative spectral energy across DFN graphs

Cumulative energy. The cumulative spectrum requires all components to capture most of the variance, reinforcing that DFN node features are consistently full-rank rather than strongly compressible.

Effective rank distribution across DFN graphs

Effective rank distribution. Across the dataset, effective rank remains concentrated at the full feature dimension, indicating a stable, dataset-wide pattern rather than an isolated example.

Representative singular spectrum of one DFN graph

Representative example. A typical graph exhibits the same balanced spectral profile seen in the aggregate statistics, confirming that the dataset-level behavior is reflected at the level of individual realizations.

Why this matters for GIFT-KASTL. These observations suggest that DFN node features are intrinsically coupled and cannot be reduced to independent low-dimensional coordinates. This supports the use of graph transformers for expressive global representation learning and motivates KASTL as a structured refinement stage that models nonlinear interactions across all active feature dimensions, rather than relying on simplified or low-rank assumptions.

Results

A clean research system with room for stronger empirical additions.

GIFT-KASTL is designed as a two-stage scientific graph learning pipeline. A graph transformer backbone first learns graph-wide representations from discrete fracture network data, and the KASTL layer then refines the final prediction through a structured nonlinear composition inspired by the Kolmogorov-Arnold Superposition Theorem. The current project page focuses on the architectural and mathematical identity of the system, while broader experiment panels and ablations can be layered in as the project evolves.

Core implementation · KASTL layer

import torch
import torch.nn as nn

class KASTLLayer(nn.Module):
    """
    Kolmogorov-Arnold Superposition Theorem Layer
    used as a structured refinement stage after the graph transformer.
    """

    def __init__(self, scale: float = 1.0):
        super().__init__()
        self.scale = scale

    def inner_function(self, x: torch.Tensor) -> torch.Tensor:
        return torch.atan(torch.sinh(x))

    def outer_function(self, x: torch.Tensor) -> torch.Tensor:
        # Truncated outer expansion inspired by the theorem-driven construction
        return (
            x
            + x**3 / 6
            + x**5 / 24
            + 61 * x**7 / 5040
            + 277 * x**9 / 72576
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        z = self.inner_function(x)
        y = self.outer_function(z)
        return self.scale * y

Poster and project assets

Use the poster as the deeper technical companion.

The poster provides a broader view of the system design, mathematical formulation, and current experimental material for GIFT-KASTL. The project page acts as the polished entry point; the poster can carry the denser technical details until additional experiments and benchmarks are added here.

Open Poster

Data and scaling

Interactive views of graph size, wall time, and prediction behavior.

These interactive plots summarize the footprint of the fracture-network datasets and the scaling behavior of the pipeline with graph size and connectivity.

These figures collectively illustrate a central principle: the complexity of learning on fracture networks is governed not only by graph size, but by structural and dynamical properties of the graph itself. While graph transformers provide expressive global representations, their computational and predictive behavior is shaped by topology, sparsity, and horizon-dependent error propagation. The KASTL refinement layer is designed precisely to stabilize these effects.

These figures illustrate that learning on fracture networks is governed not only by graph size, but also by structural topology, connectivity density, and horizon-dependent error propagation.

Nodes vs edges. Each point is a graph instance, colored by wall time.

Structural scaling of fracture networks

The near-linear growth between nodes and edges indicates that DFN graphs preserve a stable connectivity regime as system size increases. This suggests that the underlying physical generation process induces structured, non-random graph topology rather than arbitrary densification. Importantly, wall-time variation across similar graph sizes reveals that topology—not just size—controls computational cost, highlighting the role of connectivity patterns in downstream learning complexity.

Wall time vs nodes. Interactive scaling with graph size. Most graphs remain in a controlled compute regime, with a small number of structurally harder outliers.

Sub-quadratic empirical scaling with node count

Although graph transformers are often associated with quadratic complexity in the number of nodes, the observed wall-time growth remains controlled across most of the dataset. This suggests that the DFN graphs occupy a structured regime in which sparsity, batching, and regularity mitigate worst-case scaling behavior. The visible outliers indicate that graph size alone does not determine compute cost; certain node configurations induce disproportionately expensive processing.

Wall time vs edges. Scaling with graph connectivity. Connectivity structure contributes to runtime variability and acts as a secondary driver of compute.

Connectivity density acts as a latent complexity parameter

Edge count introduces an additional source of computational variability beyond node count alone. Since edges govern relational information flow and message passing structure, denser connectivity can amplify the cost of representation learning even for graphs of similar size. This indicates that the effective complexity of DFN learning depends not only on graph scale, but also on how connectivity is distributed across the network.

Prediction error by horizon. Distribution of prediction error across targets. Error spreads as horizon increases, motivating structured refinement for stable downstream prediction.

Prediction error accumulates across horizons

The widening error distribution across prediction horizons reflects the accumulation of approximation error as predictions are extended further from the observed regime. This is consistent with operator-learning and iterative forecasting settings, where small representation inaccuracies compound over time. In this perspective, the KASTL refinement layer is not merely an architectural addition, but a mechanism aimed at stabilizing downstream prediction under increasing horizon depth.