Data Discovery
Finding, querying, and exporting user data across components
The Challenge
In a system with isolated components, how do you answer: "What data exists for this user?" This matters for:
- GDPR exports — User requests all their data
- Account deletion — Find and remove everything
- Debugging — Understand the full user state
- Reporting — Aggregate data across components
Components don't know about each other, so there's no central "get all data" query. Instead, PSP provides discovery mechanisms.
Discovery Flow
The Catalogue tells you what components exist; each component provides its own query endpoints.
The Catalogue as Schema Registry
The Catalogue is the source of truth for what data exists:
// Response { "components": [ { "name": "todos", "schemas": ["Task", "TaskContainer", "Tag"], "query_endpoint": "/v1/todos" }, { "name": "budget", "schemas": ["BudgetPlan", "LedgerEntry", "TokenType"], "query_endpoint": "/v1/budget" } ] }
Each component registers its schemas, describing what data it stores and how to query it.
QuerySpec: Uniform Querying
Every repository supports QuerySpec—a standard way to filter, sort, and paginate:
@dataclass class QuerySpec: filters: list[Filter] sort_by: str | None = None sort_order: str = "asc" limit: int = 100 offset: int = 0 @dataclass class Filter: field: str op: FilterOp # eq, ne, gt, lt, gte, lte, in, contains value: Any
Usage across components is consistent:
# Same pattern for every component spec = QuerySpec(filters=[ Filter("owner_id", FilterOp.EQ, user_id) ]) tasks = task_repo.query(spec) budget_plans = budget_repo.query(spec) ledger_entries = ledger_repo.query(spec)
Facet Discovery
Facets (preferences. settings) are stored separately and discoverable by owner:
def export_facets(owner_id: UUID) -> dict: """Get all facets for a user.""" return { facet.facet_name: facet.payload for facet in facet_store.list_by_owner("person", owner_id) } # Returns: # { # "todos.preferences": {"show_completed": true, ...}, # "budget.settings": {"weekly_reset_day": "monday", ...} # }
Activity History
The ActivityLogRepo provides a timeline of all actions:
def export_activity(owner_id: UUID) -> list: """Get all activity for a user.""" spec = QuerySpec(filters=[ Filter("actor_id", FilterOp.EQ, owner_id) ], sort_by="timestamp", sort_order="desc") return activity_repo.query(spec) # Returns entries like: # { # "entity_type": "Task", # "entity_id": "...", # "action": "completed", # "timestamp": "2024-01-15T10:30:00Z", # "changes": {"status": ["pending", "completed"]} # }
Complete GDPR Export
Putting it together—a complete data export:
class GDPRExport: def __init__(self, catalogue, facet_store, activity_repo, *repos): self.catalogue = catalogue self.facet_store = facet_store self.activity_repo = activity_repo self.repos = repos def execute(self, owner_id: UUID) -> dict: spec = QuerySpec(filters=[ Filter("owner_id", FilterOp.EQ, owner_id) ]) return { # Entity data from each component "tasks": self.repos["task"].query(spec), "containers": self.repos["container"].query(spec), "budget_plans": self.repos["budget"].query(spec), # Preferences and settings "facets": { f.facet_name: f.payload for f in self.facet_store.list_by_owner("person", owner_id) }, # Activity history "activity": self.activity_repo.query( QuerySpec(filters=[Filter("actor_id", FilterOp.EQ, owner_id)]) ), }
Data Deletion
The same discovery pattern enables complete deletion:
class DeleteAccount: def execute(self, owner_id: UUID): # Delete all entities owned by this user for repo in self.repos: repo.delete_by_owner(owner_id) # Delete all facets self.facet_store.delete_by_owner("person", owner_id) # Activity log: anonymize, don't delete (audit requirements) self.activity_repo.anonymize_actor(owner_id) # Finally, delete identity self.person_repo.delete(owner_id)
Data discovery works because every entity has owner_id. This single convention makes cross-component queries possible without cross-component coupling.
- All repositories MUST support
query(QuerySpec) - All repositories MUST support filtering by
owner_id - Components MUST register schemas in the Catalogue
- Deletion MUST cascade through all component data