Skip to content
Kartograph v0.13.0

Extraction → Graph Mutations

This guide shows how the Extraction bounded context produces mutation operations that the Graph bounded context consumes. You’ll learn how to generate IDs, construct operations, and produce JSONL files that drive graph updates.

Here’s what a complete extraction workflow looks like:

from shared_kernel.graph_primitives import EntityIdGenerator
# 1. Create a repository scoped to your data source
repo = GraphExtractionReadOnlyRepository(
client=graph_client,
data_source_id="github-repo-123"
)
# 2. Generate deterministic IDs
alice_id = EntityIdGenerator.generate("Person", "alice-smith")
# Returns: "person:1a2b3c4d5e6f7890"
bob_id = EntityIdGenerator.generate("Person", "bob-jones")
# Returns: "person:abcdef0123456789"
relationship_id = EntityIdGenerator.generate_edge_id("KNOWS", alice_id, bob_id)
# Returns: "knows:9f8e7d6c5b4a3210"
# 3. Check what already exists
existing_alice = repo.find_nodes_by_slug("alice-smith", "Person")
# 4. Produce mutation operations
mutations = [
{
"op": "CREATE",
"type": "node",
"id": alice_id,
"label": "Person",
"set_properties": {
"data_source_id": "github-repo-123",
"source_path": "MAINTAINERS.md"
"slug": "alice-smith",
"name": "Alice Smith",
"email": "alice@example.com",
}
},
{
"op": "CREATE",
"type": "edge",
"id": relationship_id,
"label": "KNOWS",
"start_id": alice_id,
"end_id": bob_id,
"set_properties": {
"data_source_id": "github-repo-123",
"source_path": "MAINTAINERS.md"
"since": 2020,
"context": "colleagues",
}
}
]
# 5. Write to JSONL
with open("mutations.jsonl", "w") as f:
for mutation in mutations:
f.write(json.dumps(mutation) + "\n")
  • Use lowercase for entity types: "Person""person:..."
  • Use consistent slugs: "alice-smith" not "Alice Smith"
  • Use node label slug to check if entity exists

DEFINE is required for every type you use. It creates self-documenting ontology that helps agents understand:

  • What this type represents
  • When to use it
  • Where it’s typically found
  • What properties are required vs optional
{
"op": "DEFINE",
"type": "node",
"label": "Person",
"description": "A person entity representing an individual contributor, maintainer, or team member. Extracted from MAINTAINERS.md, git commit authors, @-mentions in pull requests, and people/ directory markdown files.",
"example_file_path": "people/alice-smith.md",
"example_in_file_path": "---\nname: Alice Smith\nemail: alice@example.com\ngithub: asmith\nrole: Senior Engineer\n---\n\n# Alice Smith\n\nAlice is a senior engineer focusing on backend systems.",
"required_properties": ["email", "name"]
}

Required fields:

  • label - The graph label, i.e. Entity Type/Relationship Type (PascalCase: "Person", "KNOWS")
  • description - What this type is and when to use it
  • example_file_path - Where this type is typically found
  • example_in_file_path - An actual example of this type as it appears in the file
  • required_properties - Array of property names that MUST be present. This is in addition to any globally-required properties (such as slug and data_source_id)

CREATE is idempotent - you can run it multiple times safely. It uses MERGE under the hood.

{
"op": "CREATE",
"type": "node",
"id": "person:1a2b3c4d5e6f7890",
"label": "Person",
"set_properties": {
"slug": "alice-smith",
"name": "Alice Smith",
"github_username": "asmith",
"data_source_id": "github-repo-123",
"source_path": "MAINTAINERS.md"
}
}

Required fields:

  • label - Graph label (PascalCase: "Person", "Repository")
  • set_properties must include:
    • data_source_id - Your data source identifier
    • source_path - Which file this entity came from

Additional required for edges:

  • start_id - ID of source node
  • end_id - ID of target node

UPDATE changes specific properties without affecting others.

{
"op": "UPDATE",
"type": "node",
"id": "person:1a2b3c4d5e6f7890",
"set_properties": {
"name": "Alice Smith-Jones",
"email": "alice.jones@example.com"
}
}

DELETE automatically removes connected edges (uses DETACH DELETE).

{
"op": "DELETE",
"type": "node",
"id": "person:obsolete123456"
}

When to use:

  • File was deleted from source
  • Entity no longer exists in external system
  • Cleanup during re-extraction

Operations do not need to be ordered in the JSONL data. The Graph bounded context will execute operations in the following order:

  1. DEFINE
  2. DELETE <edge>
  3. DELETE <node>
  4. CREATE <node>
  5. CREATE <edge>
  6. UPDATE <node>
  7. UPDATE <edge>

The extraction process produces a JSONL file (one JSON object per line), which might look something like:

{"op": "DEFINE","type": "node","label": "Person","description": "A person entity representing an individual contributor, maintainer, or team member. Extracted from MAINTAINERS.md, git commit authors, @-mentions in pull requests, and people/ directory markdown files.","example_file_path": "people/alice-smith.md","example_in_file_path": "---\nname: Alice Smith\nemail: alice@example.com\ngithub: asmith\nrole: Senior Engineer\n---\n\n# Alice Smith\n\nAlice is a senior engineer focusing on backend systems.","required_properties": ["name"]}
{"op": "DEFINE","type": "edge","label": "KNOWS","description": "Represents a professional relationship or acquaintance between two people, typically colleagues or collaborators. Extracted from co-authorship on pull requests, shared repository maintainership, or explicit mentions in people profiles.","example_file_path": "people/alice-smith.md","example_in_file_path": "## Colleagues\n\n- [@bob-jones](../people/bob-jones.md) - worked together since 2020\n- [@charlie-wilson](../people/charlie-wilson.md) - collaborated on Project X","required_properties": ["since"]}
{"op": "CREATE","type": "node","id": "person:1a2b3c4d5e6f7890","label": "Person","set_properties": {"slug": "alice-smith","name": "Alice Smith","data_source_id": "ds-123","source_path": "people/alice.md"}}
{"op": "CREATE","type": "node","id": "person:abcdef0123456789","label": "Person","set_properties": {"slug": "bob-jones","name": "Bob Jones","data_source_id": "ds-123","source_path": "people/bob.md"}}
{"op": "CREATE","type": "edge","id": "knows:9f8e7d6c5b4a3210","label": "KNOWS","start_id": "person:1a2b3c4d5e6f7890","end_id": "person:abcdef0123456789","set_properties": {"since": "2020","data_source_id": "ds-123","source_path": "people/alice.md"}}
Can I create multiple edges between the same nodes?

Yes! Different edge labels generate different IDs:

from shared_kernel.graph_primitives import EntityIdGenerator
alice_id = EntityIdGenerator.generate("Person", "alice-smith")
bob_id = EntityIdGenerator.generate("Person", "bob-jones")
# Different edge labels → different IDs
knows_id = EntityIdGenerator.generate_edge_id("KNOWS", alice_id, bob_id)
collaborates_id = EntityIdGenerator.generate_edge_id("COLLABORATES", alice_id, bob_id)
# knows_id ≠ collaborates_id (edge label is part of the hash)
What if I don’t know if an entity exists?

Use CREATE - it’s idempotent and will update if exists.

Can I batch operations?

Yes! All operations in a JSONL file execute in a single transaction.

What happens if one operation fails?

The entire batch rolls back. Fix the error and re-run.

Do I need to DEFINE types every time I run extraction?

No! DEFINE once when you first introduce a type. You can skip DEFINE in subsequent runs unless adding new types or updating definitions.