server: make HikariCP leak detection configurable#13407
Open
andrijapanicsb wants to merge 1 commit into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #13407 +/- ##
============================================
- Coverage 17.68% 17.67% -0.01%
+ Complexity 15793 15791 -2
============================================
Files 5922 5922
Lines 533123 533182 +59
Branches 65201 65210 +9
============================================
- Hits 94268 94251 -17
- Misses 428212 428284 +72
- Partials 10643 10647 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
Author
|
@blueorangutan package kvm |
|
@andrijapanicsb a [SL] Jenkins job has been kicked to build packages. It will be bundled with kvm SystemVM template(s). I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18233 |
Contributor
Author
|
@blueorangutan help |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR makes the HikariCP leak-detection threshold and JMX MBean registration configurable per database pool via
db.properties, instead of relying on HikariCP defaults that cannot be changed without a code change.CloudStack already maps a subset of
db.propertiesvalues ontoHikariConfiginframework/db/src/main/java/com/cloud/utils/db/TransactionLegacy.java(e.g.maxActive,maxIdle,maxWait,minIdleConnections,connectionTimeout,keepAliveTime). This PR adds two more, following the exact same parsing/threading pattern:db.<pool>.leakDetectionThresholdHikariConfig#setLeakDetectionThreshold(long)0(disabled)db.<pool>.registerMbeansHikariConfig#setRegisterMbeans(boolean)false(disabled)Supported for all three pools that use the shared datasource factory:
cloud,usage,simulator.Behaviour:
leakDetectionThresholdabsent or0→ leak detection disabled (unchanged default behaviour). Only applied when set to a value> 0. (HikariCP itself ignores values below 2000 ms with a warning.)registerMbeansabsent orfalse→ MBeans disabled (unchanged default).true→ Hikari JMX MBeans registered for live pool-counter observation.Motivation / context: in production we saw the management server become unstable — and eventually crash — on clusters exercising Host-HA. Watching MySQL with
SHOW PROCESSLISTduring the incident showed the number of sessions owned by thecloudDB user climbing steadily over a couple of hours, all of them in theSleepstate, until the HikariCP pool (db.cloud.maxActive, default250) was exhausted and the server could no longer borrow a connection. That signature — monotonically growing, never-reaped, all idle, all owned by theclouduser — is a classic DB connection leak in a periodic code path (suspected Host-HA host checks) that borrows a pooled connection and never returns it.The problem is these symptoms tell you that connections leak, not where. HikariCP already has the exact tool for that —
leakDetectionThreshold— but CloudStack hard-wires it off with no way to turn it on. This PR exposes it (andregisterMbeans) throughdb.propertiesso an operator can enable leak detection on a live server; HikariCP then logs anApparent connection leak detectedstack trace identifying the precise code path that borrowed the connection and failed to return it, and the MBeans give live pool-counter visibility. The actual leak fix is a separate change; this PR is the diagnostic enabler.Everything is disabled by default, so there is no behavioural change for existing deployments that don't set the new properties.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
N/A
How Has This Been Tested?
Build: compiled the affected module and its dependencies off tag
4.22.1.0:Result: BUILD SUCCESS (checkstyle passed). The change is confined to property parsing and threading through the existing
createDataSource→createHikaricpDataSourcechain, reusing the existingparseNumber(...)helper.Unit tests: the apply-logic is factored into a package-private
applyHikariDebugSettings(HikariConfig, Long, Boolean, String)and covered by 4 newTransactionLegacyTestcases — defaults-disabled,0-keeps-disabled, leak-detection-enabled (60000), and register-MBeans-enabled:Result: Tests run: 4, Failures: 0, Errors: 0 — BUILD SUCCESS.
Runtime validation plan (on a patched management server):
/etc/cloudstack/management/db.properties:systemctl restart cloudstack-managementjava.lang.Exception: Apparent connection leak detectedwith a stack trace throughcom.zaxxer.hikari.HikariDataSource.getConnection(...)→com.cloud.utils.db.TransactionLegacy...identifying the borrowing path.registerMbeans=true, thecom.zaxxer.hikari:type=Pool (cloud)MBean is visible via JMX for live pool counters.How did you try to break this feature and the system with this change?
Edge cases considered:
registerMbeans=false(existing behaviour preserved).leakDetectionThreshold=0→ not applied (disabled).leakDetectionThresholdbetween 1–1999 ms → passed to Hikari, which warns and ignores it (documented Hikari behaviour; noted in the code comment and the sample config).registerMbeans=falseexplicitly → MBeans off.getDefaultHikaricpDataSource) → untouched.These cases (defaults,
0, enabled threshold, enabled MBeans) are locked down by the newapplyHikariDebugSettingsunit tests.