The Importance of Having a Unique, Global Primary Key Per Row

When you start federating multiple systems, a subtle but fundamental problem appears:
how do you uniquely identify a row across the entire federation?

In a single database, the answer is obvious: use the table’s primary key.
But in a federation, you are combining many systems, each with its own conventions, identifiers, and constraints. Some sources may have reliable primary keys, others may not. Some may use surrogate IDs (like UUIDs), others composite keys, and some—especially APIs—may not expose any primary key at all.

Without a way to establish a global identifier, the federation risks ambiguity:

Two different systems may produce rows with the same local key.
A join across heterogeneous sources may create duplicates or inconsistencies.
Updates and deletes may affect the wrong entity if the federation cannot guarantee uniqueness.

This is why a global primary key per row is not just a detail—it is the backbone of a consistent federation.

Why Having a Global Identity Is Crucial in a Federation

In a single system, identity is straightforward: the database enforces a primary key, and every row can be uniquely addressed.
But once you enter a federated environment, this guarantee vanishes. You are no longer dealing with one schema, but with many—each with its own rules, conventions, and sometimes no primary keys at all.

Without a global identity mechanism, the federation becomes fragile:

Two different systems may produce rows with the same local key.
Relationships across sources become unreliable or ambiguous.
Updates and deletes can target the wrong entities.

Global identity is not just a design detail.
It is the mechanism that keeps the federation coherent, safe to operate on, and understandable to both humans and agents.

How Kubling Solves It: `val_pk`

Kubling introduces the concept of calculated primary keys through the val_pk option.
This allows you to define a global identifier based on a combination of fields, without altering the original source.

For example:

CREATE FOREIGN TABLE CODE_REPO (
    identifier string OPTIONS (val_pk 'org+name'),
    org        string NOT NULL,
    name       string NOT NULL,
    id         integer,
    owner__login string
);

Here, the identifier is defined as a calculated primary key.
Instead of storing a value in the source, the engine computes a Base62-encoded value derived from the listed fields, combined with a CRC32 checksum to ensure uniqueness and integrity.
This is the value returned to the client, making comparisons and joins faster at the federation level.

When filtering by this field, the query planner automatically rewrites the condition into the original field set, but only after first validating the value’s integrity:

SELECT * FROM CODE_REPO WHERE identifier = '[{crc32}+my-org+checkout-service]';

becomes internally:

SELECT * FROM CODE_REPO 
WHERE org = 'my-org' AND name = 'checkout-service';

Note: The federation never relies on raw string concatenation for uniqueness.
The val_pk mechanism ensures that equality checks are efficient and consistent, even across large federations.

Benefits of a Federated PK

Global consistency: Every row in the federation can be referenced without ambiguity.
Schema enrichment: Even sources without native PKs can be given one.
Cross-source joins: Relationships across systems become reliable and efficient.
Safer operations: Updates and deletes target exactly the intended rows.
Agent compatibility: Agents can work with a single, predictable field (identifier) rather than reasoning about different combinations of fields per system.

Practical Example: Kubernetes Deployments

Consider a federation of Kubernetes clusters.
Each cluster has a DEPLOYMENT table, where the natural key is a combination of clusterName, namespace, and deploymentName.
With val_pk, you can make this explicit:

CREATE FOREIGN TABLE DEPLOYMENT (
    clusterName string,
    metadata__namespace string,
    metadata__name string,
    ...
    identifier string OPTIONS (val_pk 'clusterName+metadata__namespace+metadata__name'),
    PRIMARY KEY(identifier)
);

Now, when querying across clusters, you can safely refer to deployments by their global identifier without worrying about collisions.

Closing Thoughts

In data federation, identity is everything.
A federation without a global primary key is fragile: queries become ambiguous, joins unreliable, and operations unsafe.

By introducing calculated primary keys (val_pk), Kubling provides a practical way to establish global uniqueness while respecting each source’s autonomy.
This not only improves reliability and safety, but also enables higher-level reasoning, allowing both humans and agents to interact with the federation in a clear, unambiguous way.

In short: no global identity means no reliable federation, it will remain fragile and difficult to scale.