Ensure that when scaling down, the Service does not invalidate route in Router if other replicase are still running #93
Labels
No labels
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Support
Type
Task
Type
Testing
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
clevermicro/amq-adapter-python#93
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
why: When testing scaleability of a service, after successful test that verified the replica count being reduced, the further interaction with the service is not possible, all requests fail with HTTP 503 - Service not available error, despite the fact that at least one replica of the service is still up and running. It appears that REMOVE_ROUTE event that the Service emits on shutdown, invalidates route even if there are still active replicas.
what: REMOVE_ROUTE event is a feature to allow proper client notification if the service exists because it run out its allocated time. This is meant to support limited service, like it was envisioned in CleverData service, where a specific stack would be started based on the job description, and the stack would exist only for limited time, after which the route to it should be removed.
RCA: the Route Database does not contain tag identifying instance associated with the route. Thus REMOVE_ROUTE event invalidates route for all replicas, not only for the one that exited.
Solution: add SWARM_TASK_ID (a global instance ID) as tag on the route in Route Database, and upon REMOVE_ROUTE event invalidate only the route for this particular replica.
Implemented in AMQ Library v0.2.71 as part of issue #81 (ensure that CI/CD pipeline operates)