-
Notifications
You must be signed in to change notification settings - Fork 9k
YARN-11809. Support application backoff mechanism for CapacityScheduler #7589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
64260fd
to
322fe34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAppBackoff.java:170
- [nitpick] Consider increasing the sleep duration slightly above 100ms to ensure the backoff interval reliably expires, reducing potential flakiness in tests.
Thread.sleep(100L);
1fbb000
to
ff85582
Compare
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAppBackoff.java:171
- [nitpick] Using Thread.sleep for waiting on the backoff interval might lead to flaky tests under variable system load. Consider using a polling mechanism with a timeout to check for backoff expiration for increased reliability.
Thread.sleep(appBackoffIntervalMs + 10L);
ff85582
to
f3068f4
Compare
f3068f4
to
79f2503
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
} else if (assignment.getSkippedType() | ||
== CSAssignment.SkippedType.QUEUE_LIMIT) { | ||
ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node, | ||
parent.getQueuePath(), getQueuePath(), ActivityState.REJECTED, | ||
() -> ActivityDiagnosticConstant.QUEUE_DO_NOT_HAVE_ENOUGH_HEADROOM | ||
+ " from " + application.getApplicationId()); | ||
// Add missed scheduling opportunities for the application | ||
application.addAppMissedSchedulingOpportunities(); | ||
return assignment; | ||
} else{ | ||
// If we don't allocate anything, and it is not skipped by application, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this need application.addAppMissedSchedulingOpportunities(); ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zeekling for the review.
This check was intended to quickly skip redundant checks for every node when reached the queue limit, which is already resolved in YARN-11798. So it's not necessary for the latest code, I'll remove this line later. Thanks for reminding this.
… limit is reached.
🎊 +1 overall
This message was automatically generated. |
Description of PR
Please refer to YARN-11809 for details.
How was this patch tested?
UT
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?