Extraction → Graph Mutations
Overview
Section titled “Overview”This guide shows how the Extraction bounded context produces mutation operations that the Graph bounded context consumes. You’ll learn how to generate IDs, construct operations, and produce JSONL files that drive graph updates.
Quick Example
Section titled “Quick Example”Here’s what a complete extraction workflow looks like:
from shared_kernel.graph_primitives import EntityIdGenerator
# 1. Create a repository scoped to your data sourcerepo = GraphExtractionReadOnlyRepository( client=graph_client, data_source_id="github-repo-123")
# 2. Generate deterministic IDsalice_id = EntityIdGenerator.generate("Person", "alice-smith")# Returns: "person:1a2b3c4d5e6f7890"
bob_id = EntityIdGenerator.generate("Person", "bob-jones")# Returns: "person:abcdef0123456789"
relationship_id = EntityIdGenerator.generate_edge_id("KNOWS", alice_id, bob_id)# Returns: "knows:9f8e7d6c5b4a3210"
# 3. Check what already existsexisting_alice = repo.find_nodes_by_slug("alice-smith", "Person")
# 4. Produce mutation operationsmutations = [ { "op": "CREATE", "type": "node", "id": alice_id, "label": "Person", "set_properties": { "data_source_id": "github-repo-123", "source_path": "MAINTAINERS.md" "slug": "alice-smith", "name": "Alice Smith", "email": "alice@example.com", } }, { "op": "CREATE", "type": "edge", "id": relationship_id, "label": "KNOWS", "start_id": alice_id, "end_id": bob_id, "set_properties": { "data_source_id": "github-repo-123", "source_path": "MAINTAINERS.md" "since": 2020, "context": "colleagues", } }]
# 5. Write to JSONLwith open("mutations.jsonl", "w") as f: for mutation in mutations: f.write(json.dumps(mutation) + "\n")ID Generation
Section titled “ID Generation”ID Format Rules
Section titled “ID Format Rules”- Use lowercase for entity types:
"Person"→"person:..." - Use consistent slugs:
"alice-smith"not"Alice Smith" - Use node label slug to check if entity exists
- Use random values in slugs (breaks determinism)
- Mix casing:
"Alice-Smith"vs"alice-smith"
Mutation Operations
Section titled “Mutation Operations”0. DEFINE (Schema Declaration)
Section titled “0. DEFINE (Schema Declaration)”DEFINE is required for every type you use. It creates self-documenting ontology that helps agents understand:
- What this type represents
- When to use it
- Where it’s typically found
- What properties are required vs optional
{ "op": "DEFINE", "type": "node", "label": "Person", "description": "A person entity representing an individual contributor, maintainer, or team member. Extracted from MAINTAINERS.md, git commit authors, @-mentions in pull requests, and people/ directory markdown files.", "example_file_path": "people/alice-smith.md", "example_in_file_path": "---\nname: Alice Smith\nemail: alice@example.com\ngithub: asmith\nrole: Senior Engineer\n---\n\n# Alice Smith\n\nAlice is a senior engineer focusing on backend systems.", "required_properties": ["email", "name"]}{ "op": "DEFINE", "type": "edge", "label": "KNOWS", "description": "Represents a professional relationship or acquaintance between two people, typically colleagues or collaborators. Extracted from co-authorship on pull requests, shared repository maintainership, or explicit mentions in people profiles.", "example_file_path": "people/alice-smith.md", "example_in_file_path": "## Colleagues\n\n- [@bob-jones](../people/bob-jones.md) - worked together since 2020\n- [@charlie-wilson](../people/charlie-wilson.md) - collaborated on Project X", "required_properties": ["since"]}Required fields:
label- The graph label, i.e. Entity Type/Relationship Type (PascalCase:"Person","KNOWS")description- What this type is and when to use itexample_file_path- Where this type is typically foundexample_in_file_path- An actual example of this type as it appears in the filerequired_properties- Array of property names that MUST be present. This is in addition to any globally-required properties (such assluganddata_source_id)
1. CREATE (Idempotent)
Section titled “1. CREATE (Idempotent)”CREATE is idempotent - you can run it multiple times safely. It uses MERGE under the hood.
{ "op": "CREATE", "type": "node", "id": "person:1a2b3c4d5e6f7890", "label": "Person", "set_properties": { "slug": "alice-smith", "name": "Alice Smith", "github_username": "asmith", "data_source_id": "github-repo-123", "source_path": "MAINTAINERS.md" }}{ "op": "CREATE", "type": "edge", "id": "knows:9f8e7d6c5b4a3210", "label": "KNOWS", "start_id": "person:1a2b3c4d5e6f7890", "end_id": "person:abcdef0123456789", "set_properties": { "since": 2020, "confidence": 0.95, "data_source_id": "github-repo-123", "source_path": "MAINTAINERS.md" }}Required fields:
label- Graph label (PascalCase:"Person","Repository")set_propertiesmust include:data_source_id- Your data source identifiersource_path- Which file this entity came from
Additional required for edges:
start_id- ID of source nodeend_id- ID of target node
2. UPDATE (Partial)
Section titled “2. UPDATE (Partial)”UPDATE changes specific properties without affecting others.
{ "op": "UPDATE", "type": "node", "id": "person:1a2b3c4d5e6f7890", "set_properties": { "name": "Alice Smith-Jones", "email": "alice.jones@example.com" }}{ "op": "UPDATE", "type": "node", "id": "person:1a2b3c4d5e6f7890", "remove_properties": ["old_email", "temp_field"]}{ "op": "UPDATE", "type": "node", "id": "person:1a2b3c4d5e6f7890", "set_properties": { "name": "Alice Smith-Jones" }, "remove_properties": ["maiden_name"]}3. DELETE (Cascade)
Section titled “3. DELETE (Cascade)”DELETE automatically removes connected edges (uses DETACH DELETE).
{ "op": "DELETE", "type": "node", "id": "person:obsolete123456"}When to use:
- File was deleted from source
- Entity no longer exists in external system
- Cleanup during re-extraction
Operation Ordering
Section titled “Operation Ordering”Operations do not need to be ordered in the JSONL data. The Graph bounded context will execute operations in the following order:
DEFINEDELETE <edge>DELETE <node>CREATE <node>CREATE <edge>UPDATE <node>UPDATE <edge>
JSONL Output Format
Section titled “JSONL Output Format”The extraction process produces a JSONL file (one JSON object per line), which might look something like:
{"op": "DEFINE","type": "node","label": "Person","description": "A person entity representing an individual contributor, maintainer, or team member. Extracted from MAINTAINERS.md, git commit authors, @-mentions in pull requests, and people/ directory markdown files.","example_file_path": "people/alice-smith.md","example_in_file_path": "---\nname: Alice Smith\nemail: alice@example.com\ngithub: asmith\nrole: Senior Engineer\n---\n\n# Alice Smith\n\nAlice is a senior engineer focusing on backend systems.","required_properties": ["name"]}{"op": "DEFINE","type": "edge","label": "KNOWS","description": "Represents a professional relationship or acquaintance between two people, typically colleagues or collaborators. Extracted from co-authorship on pull requests, shared repository maintainership, or explicit mentions in people profiles.","example_file_path": "people/alice-smith.md","example_in_file_path": "## Colleagues\n\n- [@bob-jones](../people/bob-jones.md) - worked together since 2020\n- [@charlie-wilson](../people/charlie-wilson.md) - collaborated on Project X","required_properties": ["since"]}{"op": "CREATE","type": "node","id": "person:1a2b3c4d5e6f7890","label": "Person","set_properties": {"slug": "alice-smith","name": "Alice Smith","data_source_id": "ds-123","source_path": "people/alice.md"}}{"op": "CREATE","type": "node","id": "person:abcdef0123456789","label": "Person","set_properties": {"slug": "bob-jones","name": "Bob Jones","data_source_id": "ds-123","source_path": "people/bob.md"}}{"op": "CREATE","type": "edge","id": "knows:9f8e7d6c5b4a3210","label": "KNOWS","start_id": "person:1a2b3c4d5e6f7890","end_id": "person:abcdef0123456789","set_properties": {"since": "2020","data_source_id": "ds-123","source_path": "people/alice.md"}}Can I create multiple edges between the same nodes?
Yes! Different edge labels generate different IDs:
from shared_kernel.graph_primitives import EntityIdGenerator
alice_id = EntityIdGenerator.generate("Person", "alice-smith")bob_id = EntityIdGenerator.generate("Person", "bob-jones")
# Different edge labels → different IDsknows_id = EntityIdGenerator.generate_edge_id("KNOWS", alice_id, bob_id)collaborates_id = EntityIdGenerator.generate_edge_id("COLLABORATES", alice_id, bob_id)
# knows_id ≠ collaborates_id (edge label is part of the hash)What if I don’t know if an entity exists?
Use CREATE - it’s idempotent and will update if exists.
Can I batch operations?
Yes! All operations in a JSONL file execute in a single transaction.
What happens if one operation fails?
The entire batch rolls back. Fix the error and re-run.
Do I need to DEFINE types every time I run extraction?
No! DEFINE once when you first introduce a type. You can skip DEFINE in subsequent runs unless adding new types or updating definitions.
Next Steps
Section titled “Next Steps”- Review the Mutation Operation Schema
- Read about Secure Enclave ID Design
- Explore the Architecture patterns