The Principles Behind Data Federation

When we talk about “federation” in technology, most people think about data centers, distributed systems, or large-scale infrastructures.
But at its core, federation is not just about hardware or networks—it is about how we organize operations.

Every organization faces the same dilemma: should we centralize everything in a single place, or should we let each team, system, or node run independently?
Pure centralization offers efficiency but quickly becomes a bottleneck. Pure decentralization offers flexibility but risks fragmentation and chaos.

Federation is the middle path.
It is an advanced way of organizing operations where multiple autonomous nodes work independently, yet remain interconnected and coordinated as part of a common system.

Each node in a federation:

Has the capacity to run its own processes, handle data, and support local operations.
Is connected to other nodes, sharing information and workloads when needed.
Operates autonomously, but can be coordinated as part of the wider mesh without losing independence.

This balance is what makes federation powerful. On the one hand, every node is free to adapt to its local environment. On the other, a federated layer makes it possible to observe, manage, and optimize the whole system as one coherent operation.

The result is a network where every node becomes a micro-hub of operations: self-sufficient, but also a meaningful part of a larger whole.

Federation is not just a technical pattern, it is an operational philosophy.
It reduces bottlenecks, improves resilience, and fosters local autonomy, without sacrificing the scale, consistency, and robustness that organizations need.

This principle is the foundation of data federation and, ultimately, the vision of Kubling as the operational database.

1. Data Autonomy

Each system has its own data structures and semantics. A federation must respect that autonomy: it should not force every system to adopt the same schema, but instead allow each source to keep its native representation.

2. Unified Access

Working with many systems should not mean learning a new language for each one. A federation provides a single, consistent way to query and operate across different data sources.

3. Relationships Across Sources

Data only becomes valuable when it can be related. Federation must make it possible to join, correlate, and reason across systems, even if their native structures were never designed to interact.

4. Real-Time Federation

A federated system must expose data as it is, where it is, so decisions and actions can happen in real time.

5. Extensibility Through Models

No federation can anticipate every system in advance. The ability to extend the federation by defining new models, adapters, or scripts is essential.

6. Enrichment Without Intrusion

A federation should not only expose existing schemas, but also allow extending and enriching them without modifying the original systems.

(In Kubling, when defining a remote data source you can either import its schema as-is, or define a custom schema in Kubling that maps remote fields while adding new “generated fields.” These fields can be constants, computed values, or results of functions—creating an overlay that enriches the model without touching the source.)

A Graph View of Federation

So far we have discussed the principles of data federation in abstract terms.
But to really understand how a federation behaves, it is useful to picture it as a graph.

In this view, each node in the graph represents a system that becomes part of the federation through a Kubling instance.
These nodes are not isolated: they are connected to one another, forming a mesh that can be navigated at different depths.

From the perspective of a user, application or an AI Agent, this means you can:

Connect directly to a Kubling Instance (KI), which exposes a single system.
Connect to a Kubling Aggregation Instance (KAI), which provides a higher-level view of a cluster of systems.
Connect to the Kubling Global Instance (KGI), which has visibility over the entire mesh and enables complex joins across clusters.

In the graph, you will also notice that some nodes are grouped within dotted lines. These represent clusters: subsets of systems that are related (for example, a data center, a functional domain, or a regional grouping).
Each cluster can be queried through its KAI, giving you aggregated visibility into everything inside the dotted boundary.
At the top, the KGI sits above all clusters and serves as the “global view” of the federation, making it possible to perform joins that cross boundaries—for example, combining metrics from one cluster with customer data from another.

It is also important to note that the arrows in the graph are not bidirectional.
A KI should not query its parent KAI. Instead, the relationship should flow upward:

When a KAI receives a query, it delegates down to its KIs.
When the KGI receives a query, it delegates down to the KAIs, which in turn query their KIs.

This makes the federation a layered system of delegation rather than a fully symmetric mesh, ensuring clarity in how data is accessed and aggregated.

The key insight is that federation is not flat.
Depending on the entry point, you can interact with a single system (KI), with a cluster of systems (KAI), or with the entire federated network (KGI).
This flexibility is what makes federation powerful: it allows you to reason locally when you need precision, or globally when you need context—all through the same consistent query interface.

What This Means in Practice

With Kubling, you can model and query at different layers while preserving identity and enabling cross-source relationships.
Here are some practical ways this shows up:

At a KI (single system) with self-identifying fields
Each foreign table can include fields that identify its origin, so queries at higher levels still know where each row comes from.

CREATE FOREIGN TABLE NAMESPACE
(
  clusterName string OPTIONS(val_constant '{{ cluster_name }}'),
  clusterUrl string OPTIONS(val_constant '{{ schema.properties.kubernetes_api_url }}'),
  identifier string OPTIONS(val_pk 'clusterName+metadata__name'),
  schema string OPTIONS(val_constant '{{ schema.name }}'),
  metadata__name string,
  metadata__labels json OPTIONS(parser_format 'asJsonPretty'),
  status__phase string,
  PRIMARY KEY(identifier),
  UNIQUE(clusterName, metadata__name)
)
OPTIONS(updatable true,
        supports_idempotency false,
        tags 'kubernetes;{{ schema.properties.cluster_name }};namespace');

Routing to aggregate homogeneous schemas with the same tables
Routing exposes a single logical entry point that decides which schema should receive each command based on defined rules. It is ideal when you have many schemas with the same tables (multi-tenant setups, sharding by region, per-cluster schemas, etc.).
The client queries one logical schema, and the engine routes internally to the appropriate physical schema.
```
SELECT clusterName, namespace, name, status
FROM k8s.DEPLOYMENT
WHERE clusterName IN ('k8s-eu-1', 'k8s-eu-2');
```

Applying Inserts Across the Federation
Federation is not only about querying data, it also enables creating resources consistently across multiple clusters using transactions. With Routing, inserts can be dispatched to the right cluster based on the clusterName field, allowing you to propagate changes with a single statement.

-- Create the 'observability' namespace in two clusters
INSERT INTO k8s.NAMESPACE (clusterName, metadata__name, metadata__labels)
VALUES 
  ('k8s-eu-1', 'observability', '{"team":"platform","env":"prod"}'),
  ('k8s-eu-2', 'observability', '{"team":"platform","env":"prod"}');

-- Deploy 'prometheus' into the 'observability' namespace across two clusters
INSERT INTO k8s.DEPLOYMENT (
  clusterName,
  metadata__namespace,
  metadata__name,
  metadata__labels,
  spec__selector__matchLabels,
  spec__template__metadata__labels,
  spec__template__spec__containers
)
VALUES
  (
    'k8s-eu-1',
    'observability',
    'prometheus',
    '{"app":"prometheus","tier":"monitoring"}',
    '{"app":"prometheus"}',
    '{"app":"prometheus"}',
    '[
      { "name": "prometheus",
        "image": "prom/prometheus:v2.53.0",
        "ports": [ { "containerPort": 9090 } ]
      }
    ]'
  ),
  (
    'k8s-eu-2',
    'observability',
    'prometheus',
    '{"app":"prometheus","tier":"monitoring"}',
    '{"app":"prometheus"}',
    '{"app":"prometheus"}',
    '[
      { "name": "prometheus",
        "image": "prom/prometheus:v2.53.0",
        "ports": [ { "containerPort": 9090 } ]
      }
    ]'
  );

This makes it possible to “broadcast” resources to multiple clusters while still preserving the ability to inspect or modify them individually at the KI level.

Note:
When connecting through the PostgreSQL protocol (e.g. psql), transaction control statements such as BEGIN, COMMIT, and ROLLBACK are fully supported.
When using JDBC clients, transactions are managed through the connection API.

Semantic Schema: Understanding the Federation

Federation connects systems into a common operational mesh.
But connection alone is not enough, each system still brings its own schema, naming conventions, and semantics.
For humans and for agents, this heterogeneity can make the federation hard to operate on effectively.

The Semantic Schema addresses this challenge by adding meaning directly into the schema definitions.
Through annotations, tables and fields are enriched with concise descriptions and domain-specific hints, while relationships between objects are explicitly declared.
The result is a simplified semantic layer that captures just enough information for an LLM (or any agent) to reason effectively about the data.

In practice, this means:

Autonomous systems remain independent, yet are described through a common vocabulary.
Agents can interpret schemas using semantic labels rather than raw technical names.
Natural language questions can be translated into unambiguous SQL queries, because the semantic schema provides the missing context.
Relationships across systems are not only modeled as joins, but also semantically described (e.g. Deployment affected_by Event, HPA affects Deployment), enabling powerful multi-hop reasoning.

This combination of federated architecture and semantic schema transforms the federation from a graph of systems into a graph of meaning.
It ensures that both humans and agents can navigate the federation with clarity, making operations more intuitive and resilient.

Closing Thoughts

Data federation is not just about accessing multiple systems. It is about respecting autonomy, unifying access, enabling relationships, operating in real time, and enriching models without intrusion.

These principles are not abstract theory, they are embedded in Kubling’s design.
By combining them with practical features such as Composite schemas, Aggregators, and generated keys, Kubling turns federation into something concrete: a way to interact with diverse systems through data itself, consistently and without ambiguity.

This is why we call it the operational database: because it makes operations reason in terms of data, not silos.