The Challenge

In a system with isolated components, how do you answer: "What data exists for this user?" This matters for:

  • GDPR exports — User requests all their data
  • Account deletion — Find and remove everything
  • Debugging — Understand the full user state
  • Reporting — Aggregate data across components

Components don't know about each other, so there's no central "get all data" query. Instead, PSP provides discovery mechanisms.

Discovery Flow

Data Discovery Process
%%{init: {"sequence": {"useMaxWidth": false}}}%% sequenceDiagram participant Client participant Catalogue participant Components Client->>Catalogue: List components Catalogue-->>Client: [todos, budget, ...] loop Each component Client->>Components: Query by owner_id Components-->>Client: Component data end Client->>Client: Aggregate results

The Catalogue tells you what components exist; each component provides its own query endpoints.

The Catalogue as Schema Registry

The Catalogue is the source of truth for what data exists:

GET /v1/catalogue
// Response
{
  "components": [
    {
      "name": "todos",
      "schemas": ["Task", "TaskContainer", "Tag"],
      "query_endpoint": "/v1/todos"
    },
    {
      "name": "budget",
      "schemas": ["BudgetPlan", "LedgerEntry", "TokenType"],
      "query_endpoint": "/v1/budget"
    }
  ]
}

Each component registers its schemas, describing what data it stores and how to query it.

QuerySpec: Uniform Querying

Every repository supports QuerySpec—a standard way to filter, sort, and paginate:

platform/query/spec.py
@dataclass
class QuerySpec:
    filters: list[Filter]
    sort_by: str | None = None
    sort_order: str = "asc"
    limit: int = 100
    offset: int = 0

@dataclass
class Filter:
    field: str
    op: FilterOp  # eq, ne, gt, lt, gte, lte, in, contains
    value: Any

Usage across components is consistent:

export.py
# Same pattern for every component
spec = QuerySpec(filters=[
    Filter("owner_id", FilterOp.EQ, user_id)
])

tasks = task_repo.query(spec)
budget_plans = budget_repo.query(spec)
ledger_entries = ledger_repo.query(spec)

Facet Discovery

Facets (preferences. settings) are stored separately and discoverable by owner:

export.py
def export_facets(owner_id: UUID) -> dict:
    """Get all facets for a user."""
    return {
        facet.facet_name: facet.payload
        for facet in facet_store.list_by_owner("person", owner_id)
    }

# Returns:
# {
#   "todos.preferences": {"show_completed": true, ...},
#   "budget.settings": {"weekly_reset_day": "monday", ...}
# }

Activity History

The ActivityLogRepo provides a timeline of all actions:

export.py
def export_activity(owner_id: UUID) -> list:
    """Get all activity for a user."""
    spec = QuerySpec(filters=[
        Filter("actor_id", FilterOp.EQ, owner_id)
    ], sort_by="timestamp", sort_order="desc")
    return activity_repo.query(spec)

# Returns entries like:
# {
#   "entity_type": "Task",
#   "entity_id": "...",
#   "action": "completed",
#   "timestamp": "2024-01-15T10:30:00Z",
#   "changes": {"status": ["pending", "completed"]}
# }

Complete GDPR Export

Putting it together—a complete data export:

gdpr_export.py
class GDPRExport:
    def __init__(self, catalogue, facet_store, activity_repo, *repos):
        self.catalogue = catalogue
        self.facet_store = facet_store
        self.activity_repo = activity_repo
        self.repos = repos

    def execute(self, owner_id: UUID) -> dict:
        spec = QuerySpec(filters=[
            Filter("owner_id", FilterOp.EQ, owner_id)
        ])

        return {
            # Entity data from each component
            "tasks": self.repos["task"].query(spec),
            "containers": self.repos["container"].query(spec),
            "budget_plans": self.repos["budget"].query(spec),

            # Preferences and settings
            "facets": {
                f.facet_name: f.payload
                for f in self.facet_store.list_by_owner("person", owner_id)
            },

            # Activity history
            "activity": self.activity_repo.query(
                QuerySpec(filters=[Filter("actor_id", FilterOp.EQ, owner_id)])
            ),
        }

Data Deletion

The same discovery pattern enables complete deletion:

account_deletion.py
class DeleteAccount:
    def execute(self, owner_id: UUID):
        # Delete all entities owned by this user
        for repo in self.repos:
            repo.delete_by_owner(owner_id)

        # Delete all facets
        self.facet_store.delete_by_owner("person", owner_id)

        # Activity log: anonymize, don't delete (audit requirements)
        self.activity_repo.anonymize_actor(owner_id)

        # Finally, delete identity
        self.person_repo.delete(owner_id)
Key Insight

Data discovery works because every entity has owner_id. This single convention makes cross-component queries possible without cross-component coupling.

Rules
  • All repositories MUST support query(QuerySpec)
  • All repositories MUST support filtering by owner_id
  • Components MUST register schemas in the Catalogue
  • Deletion MUST cascade through all component data