Design Permission System for Endpoint Access Control #2

Closed
opened 2025-04-22 21:58:16 +00:00 by abed.alrahman · 5 comments
Member

Ref epic: #13

Goal: Design a system that allows individual backend services to define which users or groups (from Keycloak) can access their specific REST endpoints, and have this access enforced centrally before requests reach the service.

Background:
We need finer control over who can access specific API endpoints. The requirement is that the service owning the endpoint should ultimately define the access rules (e.g., "Only users in the 'admins' group can call POST /users"). This system needs to integrate with Keycloak for user/group information and likely with Traefik (potentially via the auth-service) for enforcement.

What needs to be done:

Evaluate Architectures: Research and compare different ways to achieve this, considering:
    Central Policy Store: Services register rules (e.g., Endpoint=/abc, Method=GET, Group=readers) centrally. An auth service checks requests against this store.
    Backend Service Callback: An auth service receives the request, gets user info, then calls a specific endpoint on the backend service itself (e.g., /_check_permission) asking "Can this user access this original request?". The backend service contains the logic and responds yes/no.
    Other approaches: Explore alternatives if applicable.
Analyze Options: Compare the evaluated architectures based on:
    How well they meet the "backend service decides" requirement.
    Latency impact on requests.
    Complexity of implementation (for auth service, backend services, and potential central stores).
    Scalability and maintainability.
    Security implications.
Select and Specify: Choose the most suitable architecture and document the design in detail:
    Data Formats: Define how user identity (username, groups, roles from Keycloak) and permission rules/requests will be represented (e.g., JSON schemas).
    Protocols: Specify the communication flow (e.g., REST calls, headers used) between Traefik, the auth-service, Keycloak, and potentially the backend services or a policy store.
    Workflow: Create sequence diagrams showing the end-to-end process for an incoming request being allowed or denied.
    Backend Service Integration: Clearly define how a backend service will provide its access rules (e.g., API for registration, expected format for a check endpoint).
Develop PoC: Create a small Proof of Concept (PoC) to demonstrate the core mechanics of the chosen design. This could involve mock services simulating Traefik, auth-service, Keycloak, and a backend service.

Deliverables:

A detailed Design Document containing:
    Comparison of evaluated architectures.
    Specification of the chosen architecture, including data formats, protocols, and workflow diagrams.
    Guidelines for backend service integration.
PoC Code and demonstration results.
Ref epic: [#13](https://git.cleverthis.com/clevermicro/user-management/issues/13) Goal: Design a system that allows individual backend services to define which users or groups (from Keycloak) can access their specific REST endpoints, and have this access enforced centrally before requests reach the service. Background: We need finer control over who can access specific API endpoints. The requirement is that the service owning the endpoint should ultimately define the access rules (e.g., "Only users in the 'admins' group can call POST /users"). This system needs to integrate with Keycloak for user/group information and likely with Traefik (potentially via the auth-service) for enforcement. What needs to be done: Evaluate Architectures: Research and compare different ways to achieve this, considering: Central Policy Store: Services register rules (e.g., Endpoint=/abc, Method=GET, Group=readers) centrally. An auth service checks requests against this store. Backend Service Callback: An auth service receives the request, gets user info, then calls a specific endpoint on the backend service itself (e.g., /_check_permission) asking "Can this user access this original request?". The backend service contains the logic and responds yes/no. Other approaches: Explore alternatives if applicable. Analyze Options: Compare the evaluated architectures based on: How well they meet the "backend service decides" requirement. Latency impact on requests. Complexity of implementation (for auth service, backend services, and potential central stores). Scalability and maintainability. Security implications. Select and Specify: Choose the most suitable architecture and document the design in detail: Data Formats: Define how user identity (username, groups, roles from Keycloak) and permission rules/requests will be represented (e.g., JSON schemas). Protocols: Specify the communication flow (e.g., REST calls, headers used) between Traefik, the auth-service, Keycloak, and potentially the backend services or a policy store. Workflow: Create sequence diagrams showing the end-to-end process for an incoming request being allowed or denied. Backend Service Integration: Clearly define how a backend service will provide its access rules (e.g., API for registration, expected format for a check endpoint). Develop PoC: Create a small Proof of Concept (PoC) to demonstrate the core mechanics of the chosen design. This could involve mock services simulating Traefik, auth-service, Keycloak, and a backend service. Deliverables: A detailed Design Document containing: Comparison of evaluated architectures. Specification of the chosen architecture, including data formats, protocols, and workflow diagrams. Guidelines for backend service integration. PoC Code and demonstration results.

It is important for this feature that we do a model design and schema implementation as first, before hooking it with Traefik API Gateway or AMQ Adapter. Things that need to be designed:

  1. We need to model a resource, privilege, role, group. In the design we need to define a representation of all 4 entities.
  • Resource is a something we control access to. if it an endpoint, it can be either fully specified or given as regex. if we allow regex, than privilege attached to fully specified resource has precedence over privilege attached to resource specified as regex
  • Privilege is something user is granted. Do we allow grants of individual privileges? How do we map Resource to Privilege?
  • Role. How do we model it? Is the role limited to single service(application) or is it possible that one role can grant user privileges in multiple services(applications)? Is it a hierarchical inheritance, like in OO programming, so is modelled as something user IS, where e.g super-admin is extension of admin role, so super-admin can do everything admin can do plus something extra? In this model, user is allowed to have only single role in each service. Or is a role just a label that identifies what user can DO? Like 'job editor' role meaning that e.g. user can manually change job status? In this model, user can have multiple roles in single service/application.
  • Group. Is group an aggregation of privileges? Or is it aggregation of resources and privilege(s) are assigned to entire group? Do we allow resource aggregation across the multiple services(e.g. membership in a group will grant user privilege in multiple services)? Do we allow supergroups (aggregation of groups, where membership in a group will automatically imply membership in ALL sub-groups?
  1. When all the above is agreed and documented, we need to design a schema that will allow efficient queries:
  • given a username and resourcename, is there an intersection between privileges granted to user via role(s), group membership or direct grants and the privileges assigned to the resource? (aka - is user allowed to access it)
  • given username and resource+privilege, what grants user needs in order to have access to the resource with privilege? this will be very basic management question: "I need to do X in the service S, what role or which group I need to be in in order to be allowed to do X?)
  • who has privilege P on resource R? - if something goes wrong, or just regular housekeeping
  • what are the privileges of an user (what are MY privileges - to be part of display in user profile)
  1. Next we need to determine the maintenance of the privilege matrix or access control list (ACL). Who is responsible for determining the privileges? Is it a Service developers? Management? How do we represent the ACL in both human readable form so it is suitable for review and human amendment and also in machine readable form, so we can feed the entire ACL to the security schema and synchronize the new ACL with existing ACL? Implement the sync function.

  2. What is the user privilege granting process? Who approves the grant request? Can some requests be auto-approved? Do we implement a time limited privilege escalation? (e.g. user can request a sudo privilege for an 1 hr to perform a release or some privilege action?) The privilege is than auto-removed after the period elapse or user can give up the privilege.

It is important for this feature that we do a model design and schema implementation as first, before hooking it with Traefik API Gateway or AMQ Adapter. Things that need to be designed: 1. We need to model a resource, privilege, role, group. In the design we need to define a representation of all 4 entities. - Resource is a something we control access to. if it an endpoint, it can be either fully specified or given as regex. if we allow regex, than privilege attached to fully specified resource has precedence over privilege attached to resource specified as regex - Privilege is something user is granted. Do we allow grants of individual privileges? How do we map Resource to Privilege? - Role. How do we model it? Is the role limited to single service(application) or is it possible that one role can grant user privileges in multiple services(applications)? Is it a hierarchical inheritance, like in OO programming, so is modelled as **something user IS**, where e.g super-admin is extension of admin role, so super-admin can do everything admin can do plus something extra? In this model, user is allowed to have only single role in each service. Or is a role just a label that identifies what **user can DO**? Like 'job editor' role meaning that e.g. user can manually change job status? In this model, user can have multiple roles in single service/application. - Group. Is group an aggregation of privileges? Or is it aggregation of resources and privilege(s) are assigned to entire group? Do we allow resource aggregation across the multiple services(e.g. membership in a group will grant user privilege in multiple services)? Do we allow supergroups (aggregation of groups, where membership in a group will automatically imply membership in ALL sub-groups? 2. When all the above is agreed and documented, we need to design a schema that will allow efficient queries: - given a username and resourcename, is there an intersection between privileges granted to user via role(s), group membership or direct grants and the privileges assigned to the resource? (aka - is user allowed to access it) - given username and resource+privilege, what grants user needs in order to have access to the resource with privilege? this will be very basic management question: "I need to do X in the service S, what role or which group I need to be in in order to be allowed to do X?) - who has privilege P on resource R? - if something goes wrong, or just regular housekeeping - what are the privileges of an user (what are MY privileges - to be part of display in user profile) 3. Next we need to determine the maintenance of the privilege matrix or access control list (ACL). Who is responsible for determining the privileges? Is it a Service developers? Management? How do we represent the ACL in both human readable form so it is suitable for review and human amendment and also in machine readable form, so we can feed the entire ACL to the security schema and synchronize the new ACL with existing ACL? Implement the sync function. 4. What is the user privilege granting process? Who approves the grant request? Can some requests be auto-approved? Do we implement a time limited privilege escalation? (e.g. user can request a sudo privilege for an 1 hr to perform a release or some privilege action?) The privilege is than auto-removed after the period elapse or user can give up the privilege.

Following on discussion on the questions raised above, I have created the following design proposal:

https://docs.cleverthis.com/en/architecture/microservices/feature-discussion/user-privilege-design

@CoreRasurae
@freemo
please feel free to review/comment

Following on discussion on the questions raised above, I have created the following design proposal: https://docs.cleverthis.com/en/architecture/microservices/feature-discussion/user-privilege-design @CoreRasurae @freemo please feel free to review/comment
aleenaumair added this to the V.01 milestone 2025-05-05 10:49:58 +00:00
Author
Member

Here is the design document for Access control using Keycloak capabilities:
https://docs.cleverthis.com/en/architecture/microservices/feature-discussion/Endpoint-Access-Control-using-Keycloak

It shows how the auth-service endpoint will act as a middleware between Traefik and Keycloak to authenticate and authorize users' requests.

Here is the design document for Access control using Keycloak capabilities: https://docs.cleverthis.com/en/architecture/microservices/feature-discussion/Endpoint-Access-Control-using-Keycloak It shows how the auth-service endpoint will act as a middleware between Traefik and Keycloak to authenticate and authorize users' requests.
Author
Member

A new version for ACL using Keycloak, in this version, there are two main additions:

  1. Use the ALL keyword to represent any method (GET, POST, etc).
  2. Use STAR to mean "anything beyond this point" (like /*).
    Here is the link for the document v1.1
    https://docs.cleverthis.com/en/architecture/microservices/feature-discussion/endpoint-keycloakv2
A new version for ACL using Keycloak, in this version, there are two main additions: 1. Use the ALL keyword to represent any method (GET, POST, etc). 2. Use STAR to mean "anything beyond this point" (like /*). Here is the link for the document v1.1 https://docs.cleverthis.com/en/architecture/microservices/feature-discussion/endpoint-keycloakv2
Author
Member

Here is a quick guideline document.
This guide explains how to configure your service in Keycloak so its API endpoints can be protected by our central auth-service. The auth-service checks if a user has the correct "Client Role" in Keycloak before allowing access to an endpoint.

https://docs.cleverthis.com/en/user_pages/abed_alrahman/permission_keycloak

Here is a quick guideline document. This guide explains how to configure your service in Keycloak so its API endpoints can be protected by our central auth-service. The auth-service checks if a user has the correct "Client Role" in Keycloak before allowing access to an endpoint. https://docs.cleverthis.com/en/user_pages/abed_alrahman/permission_keycloak
Sign in to join this conversation.
No milestone
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
You do not have permission to read 1 dependency
Reference: clevermicro/user-management#2
No description provided.