Design Permission System for Endpoint Access Control #2

New issue

Closed

opened 2025-04-22 21:58:16 +00:00 by abed.alrahman · 5 comments

abed.alrahman commented

2025-04-22 21:58:16 +00:00

Member

Ref epic: #13

Goal: Design a system that allows individual backend services to define which users or groups (from Keycloak) can access their specific REST endpoints, and have this access enforced centrally before requests reach the service.

Background:
We need finer control over who can access specific API endpoints. The requirement is that the service owning the endpoint should ultimately define the access rules (e.g., "Only users in the 'admins' group can call POST /users"). This system needs to integrate with Keycloak for user/group information and likely with Traefik (potentially via the auth-service) for enforcement.

What needs to be done:

Evaluate Architectures: Research and compare different ways to achieve this, considering:
    Central Policy Store: Services register rules (e.g., Endpoint=/abc, Method=GET, Group=readers) centrally. An auth service checks requests against this store.
    Backend Service Callback: An auth service receives the request, gets user info, then calls a specific endpoint on the backend service itself (e.g., /_check_permission) asking "Can this user access this original request?". The backend service contains the logic and responds yes/no.
    Other approaches: Explore alternatives if applicable.
Analyze Options: Compare the evaluated architectures based on:
    How well they meet the "backend service decides" requirement.
    Latency impact on requests.
    Complexity of implementation (for auth service, backend services, and potential central stores).
    Scalability and maintainability.
    Security implications.
Select and Specify: Choose the most suitable architecture and document the design in detail:
    Data Formats: Define how user identity (username, groups, roles from Keycloak) and permission rules/requests will be represented (e.g., JSON schemas).
    Protocols: Specify the communication flow (e.g., REST calls, headers used) between Traefik, the auth-service, Keycloak, and potentially the backend services or a policy store.
    Workflow: Create sequence diagrams showing the end-to-end process for an incoming request being allowed or denied.
    Backend Service Integration: Clearly define how a backend service will provide its access rules (e.g., API for registration, expected format for a check endpoint).
Develop PoC: Create a small Proof of Concept (PoC) to demonstrate the core mechanics of the chosen design. This could involve mock services simulating Traefik, auth-service, Keycloak, and a backend service.

Deliverables:

A detailed Design Document containing:
    Comparison of evaluated architectures.
    Specification of the chosen architecture, including data formats, protocols, and workflow diagrams.
    Guidelines for backend service integration.
PoC Code and demonstration results.

Ref epic: [#13](https://git.cleverthis.com/clevermicro/user-management/issues/13) Goal: Design a system that allows individual backend services to define which users or groups (from Keycloak) can access their specific REST endpoints, and have this access enforced centrally before requests reach the service. Background: We need finer control over who can access specific API endpoints. The requirement is that the service owning the endpoint should ultimately define the access rules (e.g., "Only users in the 'admins' group can call POST /users"). This system needs to integrate with Keycloak for user/group information and likely with Traefik (potentially via the auth-service) for enforcement. What needs to be done: Evaluate Architectures: Research and compare different ways to achieve this, considering: Central Policy Store: Services register rules (e.g., Endpoint=/abc, Method=GET, Group=readers) centrally. An auth service checks requests against this store. Backend Service Callback: An auth service receives the request, gets user info, then calls a specific endpoint on the backend service itself (e.g., /_check_permission) asking "Can this user access this original request?". The backend service contains the logic and responds yes/no. Other approaches: Explore alternatives if applicable. Analyze Options: Compare the evaluated architectures based on: How well they meet the "backend service decides" requirement. Latency impact on requests. Complexity of implementation (for auth service, backend services, and potential central stores). Scalability and maintainability. Security implications. Select and Specify: Choose the most suitable architecture and document the design in detail: Data Formats: Define how user identity (username, groups, roles from Keycloak) and permission rules/requests will be represented (e.g., JSON schemas). Protocols: Specify the communication flow (e.g., REST calls, headers used) between Traefik, the auth-service, Keycloak, and potentially the backend services or a policy store. Workflow: Create sequence diagrams showing the end-to-end process for an incoming request being allowed or denied. Backend Service Integration: Clearly define how a backend service will provide its access rules (e.g., API for registration, expected format for a check endpoint). Develop PoC: Create a small Proof of Concept (PoC) to demonstrate the core mechanics of the chosen design. This could involve mock services simulating Traefik, auth-service, Keycloak, and a backend service. Deliverables: A detailed Design Document containing: Comparison of evaluated architectures. Specification of the chosen architecture, including data formats, protocols, and workflow diagrams. Guidelines for backend service integration. PoC Code and demonstration results.

abed.alrahman added the

Points

Priority

Medium

labels

2025-04-22 21:58:16 +00:00

abed.alrahman referenced this issue

2025-04-22 22:11:56 +00:00

Implement Access Control in auth-service via Traefik Forward Auth #3

stanislav.hejny commented

2025-04-23 09:32:22 +00:00

Member

It is important for this feature that we do a model design and schema implementation as first, before hooking it with Traefik API Gateway or AMQ Adapter. Things that need to be designed:

We need to model a resource, privilege, role, group. In the design we need to define a representation of all 4 entities.

Resource is a something we control access to. if it an endpoint, it can be either fully specified or given as regex. if we allow regex, than privilege attached to fully specified resource has precedence over privilege attached to resource specified as regex
Privilege is something user is granted. Do we allow grants of individual privileges? How do we map Resource to Privilege?
Role. How do we model it? Is the role limited to single service(application) or is it possible that one role can grant user privileges in multiple services(applications)? Is it a hierarchical inheritance, like in OO programming, so is modelled as something user IS, where e.g super-admin is extension of admin role, so super-admin can do everything admin can do plus something extra? In this model, user is allowed to have only single role in each service. Or is a role just a label that identifies what user can DO? Like 'job editor' role meaning that e.g. user can manually change job status? In this model, user can have multiple roles in single service/application.
Group. Is group an aggregation of privileges? Or is it aggregation of resources and privilege(s) are assigned to entire group? Do we allow resource aggregation across the multiple services(e.g. membership in a group will grant user privilege in multiple services)? Do we allow supergroups (aggregation of groups, where membership in a group will automatically imply membership in ALL sub-groups?

When all the above is agreed and documented, we need to design a schema that will allow efficient queries:

given a username and resourcename, is there an intersection between privileges granted to user via role(s), group membership or direct grants and the privileges assigned to the resource? (aka - is user allowed to access it)
given username and resource+privilege, what grants user needs in order to have access to the resource with privilege? this will be very basic management question: "I need to do X in the service S, what role or which group I need to be in in order to be allowed to do X?)
who has privilege P on resource R? - if something goes wrong, or just regular housekeeping
what are the privileges of an user (what are MY privileges - to be part of display in user profile)

Next we need to determine the maintenance of the privilege matrix or access control list (ACL). Who is responsible for determining the privileges? Is it a Service developers? Management? How do we represent the ACL in both human readable form so it is suitable for review and human amendment and also in machine readable form, so we can feed the entire ACL to the security schema and synchronize the new ACL with existing ACL? Implement the sync function.
What is the user privilege granting process? Who approves the grant request? Can some requests be auto-approved? Do we implement a time limited privilege escalation? (e.g. user can request a sudo privilege for an 1 hr to perform a release or some privilege action?) The privilege is than auto-removed after the period elapse or user can give up the privilege.

It is important for this feature that we do a model design and schema implementation as first, before hooking it with Traefik API Gateway or AMQ Adapter. Things that need to be designed: 1. We need to model a resource, privilege, role, group. In the design we need to define a representation of all 4 entities. - Resource is a something we control access to. if it an endpoint, it can be either fully specified or given as regex. if we allow regex, than privilege attached to fully specified resource has precedence over privilege attached to resource specified as regex - Privilege is something user is granted. Do we allow grants of individual privileges? How do we map Resource to Privilege? - Role. How do we model it? Is the role limited to single service(application) or is it possible that one role can grant user privileges in multiple services(applications)? Is it a hierarchical inheritance, like in OO programming, so is modelled as **something user IS**, where e.g super-admin is extension of admin role, so super-admin can do everything admin can do plus something extra? In this model, user is allowed to have only single role in each service. Or is a role just a label that identifies what **user can DO**? Like 'job editor' role meaning that e.g. user can manually change job status? In this model, user can have multiple roles in single service/application. - Group. Is group an aggregation of privileges? Or is it aggregation of resources and privilege(s) are assigned to entire group? Do we allow resource aggregation across the multiple services(e.g. membership in a group will grant user privilege in multiple services)? Do we allow supergroups (aggregation of groups, where membership in a group will automatically imply membership in ALL sub-groups? 2. When all the above is agreed and documented, we need to design a schema that will allow efficient queries: - given a username and resourcename, is there an intersection between privileges granted to user via role(s), group membership or direct grants and the privileges assigned to the resource? (aka - is user allowed to access it) - given username and resource+privilege, what grants user needs in order to have access to the resource with privilege? this will be very basic management question: "I need to do X in the service S, what role or which group I need to be in in order to be allowed to do X?) - who has privilege P on resource R? - if something goes wrong, or just regular housekeeping - what are the privileges of an user (what are MY privileges - to be part of display in user profile) 3. Next we need to determine the maintenance of the privilege matrix or access control list (ACL). Who is responsible for determining the privileges? Is it a Service developers? Management? How do we represent the ACL in both human readable form so it is suitable for review and human amendment and also in machine readable form, so we can feed the entire ACL to the security schema and synchronize the new ACL with existing ACL? Implement the sync function. 4. What is the user privilege granting process? Who approves the grant request? Can some requests be auto-approved? Do we implement a time limited privilege escalation? (e.g. user can request a sudo privilege for an 1 hr to perform a release or some privilege action?) The privilege is than auto-removed after the period elapse or user can give up the privilege.

stanislav.hejny added the

Type

Documentation

label

2025-04-29 10:30:21 +00:00

stanislav.hejny added the

Rows
Columns