Skip to content
Snippets Groups Projects
Unverified Commit 9a43165a authored by github-automation-metabase's avatar github-automation-metabase Committed by GitHub
Browse files

Enable debugUnreturnedConnectionStackTraces in...

Enable debugUnreturnedConnectionStackTraces in data-warehouse-connection-pool-properties (#49456) (#49807)

* Add blurb from C3P0 docs to comment for unreturnedConnectionTimeout

* Enable debugUnreturnedConnectionStackTraces in data-warehouse-connection-pool-properties

Closes #47981

* Add defsetting for jdbc-data-warehouse-debug-unreturned-connection-stack-traces

And only enable debugUnreturnedConnectionStackTraces in the C3P0 config if the setting is enabled. There is some
memory cost to enabling the setting, and since maxPoolSize is user-configurable, it seems safer to disable
debugUnreturnedConnectionStackTraces by default.

* Add metabase.logger/level-enabled?

* Log a warning if C3P0 logs are not configured correctly to log stack traces

Log a warning if the user enables the jdbc-data-warehouse-debug-unreturned-connection-stack-traces but does not raise
the log level for the com.mchange package to INFO or higher.

* Add docs for MB_JDBC_DATA_WAREHOUSE_DEBUG_UNRETURNED_CONNECTION_STACK_TRACES

* Improve markdown docs for MB_JDBC_DATA_WAREHOUSE_DEBUG_UNRETURNED_CONNECTION_STACK_TRACES



---------

Co-authored-by: default avatarappleby <86076+appleby@users.noreply.github.com>
Co-authored-by: default avatarJeff Bruemmer <jeff.bruemmer@gmail.com>
parent 5dee9733
Branches
Tags
No related merge requests found
......@@ -1883,6 +1883,20 @@ This variable affects connections that are severed and undetected by Metabase (t
Unless set otherwise, the default production value for `metabase.query-processor.query-timeout-ms` is used which is 1,200,000 ms (i.e. 1,200 seconds or 20 minutes).
### `MB_JDBC_DATA_WAREHOUSE_DEBUG_UNRETURNED_CONNECTION_STACK_TRACES`
Type: boolean<br>
Default: `false`<br>
Since: v51.3
If `true`, log a stack trace for any connections killed due to exceeding the timeout specified in [MB_JDBC_DATA_WAREHOUSE_UNRETURNED_CONNECTION_TIMEOUT_SECONDS](#mb_jdbc_data_warehouse_unreturned_connection_timeout_seconds).
Note: In addtion to enabling this variable, you need to update the com.mchange log level to INFO or higher via a custom log4j configuration in order to see the stack traces in the logs.
See the [Metabase log configuration](./log-configuration.md) documentation for how to configure log levels.
See [MB_JDBC_DATA_WAREHOUSE_UNRETURNED_CONNECTION_TIMEOUT_SECONDS](#mb_jdbc_data_warehouse_unreturned_connection_timeout_seconds) for setting the timeout after which connections will be killed.
### `MB_JETTY_ASYNC_RESPONSE_TIMEOUT`
Type: integer<br>
......
......@@ -221,6 +221,18 @@ This variable affects connections that are severed and undetected by Metabase (t
Unless set otherwise, the default production value for `metabase.query-processor.query-timeout-ms` is used which is 1,200,000 ms (i.e. 1,200 seconds or 20 minutes).
### `MB_JDBC_DATA_WAREHOUSE_DEBUG_UNRETURNED_CONNECTION_STACK_TRACES`
Type: boolean<br>
Default: `false`<br>
Since: v51.3
If `true`, log a stack trace for any connections killed due to exceeding the timeout specified in [MB_JDBC_DATA_WAREHOUSE_UNRETURNED_CONNECTION_TIMEOUT_SECONDS](#mb_jdbc_data_warehouse_unreturned_connection_timeout_seconds).
In order to see the stack traces in the logs, you'll also need to update the com.mchange log level to "INFO" or higher via a custom log4j configuration. For configuring log levels, see [Metabase log configuration](./log-configuration.md).
To set a timeout for how long Metabase should wait before it kills unreturned connections, see [MB_JDBC_DATA_WAREHOUSE_UNRETURNED_CONNECTION_TIMEOUT_SECONDS](#mb_jdbc_data_warehouse_unreturned_connection_timeout_seconds).
### `MB_JETTY_ASYNC_RESPONSE_TIMEOUT`
Type: integer<br>
......
......@@ -9,6 +9,7 @@
[metabase.driver.util :as driver.u]
[metabase.lib.metadata :as lib.metadata]
[metabase.lib.metadata.jvm :as lib.metadata.jvm]
[metabase.logger :as logger]
[metabase.models.interface :as mi]
[metabase.models.setting :as setting]
[metabase.query-processor.pipeline :as qp.pipeline]
......@@ -22,7 +23,8 @@
[toucan2.core :as t2])
(:import
(com.mchange.v2.c3p0 DataSources)
(javax.sql DataSource)))
(javax.sql DataSource)
(org.apache.logging.log4j Level)))
(set! *warn-on-reflection* true)
......@@ -103,6 +105,19 @@ For setting the maximum, see [MB_APPLICATION_DB_MAX_CONNECTION_POOL_SIZE](#mb_ap
(long (/ qp.pipeline/*query-timeout-ms* 1000))))
:setter :none)
(setting/defsetting jdbc-data-warehouse-debug-unreturned-connection-stack-traces
"Tell c3p0 to log a stack trace for any connections killed due to exceeding the timeout specified in
[[jdbc-data-warehouse-unreturned-connection-timeout-seconds]].
Note: You also need to update the com.mchange log level to INFO or higher in the log4j configs in order to see the
stack traces in the logs."
:visibility :internal
:type :boolean
:default false
:export? false
:setter :none
:doc false) ; This setting is documented in other-env-vars.md.
(defmethod data-warehouse-connection-pool-properties :default
[driver database]
{;; only fetch one new connection at a time, rather than batching fetches (default = 3 at a time). This is done in
......@@ -137,10 +152,36 @@ For setting the maximum, see [MB_APPLICATION_DB_MAX_CONNECTION_POOL_SIZE](#mb_ap
;;
;; Kill idle connections above the minPoolSize after 5 minutes.
"maxIdleTimeExcessConnections" (* 5 60)
;; kill connections after this amount of time if they haven't been returned -- this should be the same as the query
;; timeout. This theoretically shouldn't happen since the QP should kill things after a certain timeout but it's
;; better to be safe than sorry -- it seems like in practice some connections disappear into the ether
;; [From dox] Seconds. If set, if an application checks out but then fails to check-in [i.e. close()] a Connection
;; within the specified period of time, the pool will unceremoniously destroy() the Connection. This permits
;; applications with occasional Connection leaks to survive, rather than eventually exhausting the Connection
;; pool. And that's a shame. Zero means no timeout, applications are expected to close() their own
;; Connections. Obviously, if a non-zero value is set, it should be to a value longer than any Connection should
;; reasonably be checked-out. Otherwise, the pool will occasionally kill Connections in active use, which is bad.
;;
;; This should be the same as the query timeout. This theoretically shouldn't happen since the QP should kill
;; things after a certain timeout but it's better to be safe than sorry -- it seems like in practice some
;; connections disappear into the ether
"unreturnedConnectionTimeout" (jdbc-data-warehouse-unreturned-connection-timeout-seconds)
;; [From dox] If true, and if unreturnedConnectionTimeout is set to a positive value, then the pool will capture
;; the stack trace (via an Exception) of all Connection checkouts, and the stack traces will be printed when
;; unreturned checked-out Connections timeout. This is intended to debug applications with Connection leaks, that
;; is applications that occasionally fail to return Connections, leading to pool growth, and eventually
;; exhaustion (when the pool hits maxPoolSize with all Connections checked-out and lost). This parameter should
;; only be set while debugging, as capturing the stack trace will slow down every Connection check-out.
;;
;; As noted in the C3P0 docs, this does add some overhead to create the Exception at Connection checkout.
;; criterium/quick-bench indicates this is ~600ns of overhead per Exception created on my laptop, which is small
;; compared to the overhead added by testConnectionCheckout, above. The memory usage will depend on the size of the
;; stack trace, but clj-memory-meter reports ~800 bytes for a fresh Exception created at the REPL (which presumably
;; has a smaller-than-average stack).
"debugUnreturnedConnectionStackTraces" (u/prog1 (jdbc-data-warehouse-debug-unreturned-connection-stack-traces)
(when (and <> (not (logger/level-enabled? 'com.mchange Level/INFO)))
(log/warn "jdbc-data-warehouse-debug-unreturned-connection-stack-traces"
"is enabled, but INFO logging is not enabled for the"
"com.mchange namespace. You must raise the log level for"
"com.mchange to INFO via a custom log4j config in order to"
"see stacktraces in the logs.")))
;; Set the data source name so that the c3p0 JMX bean has a useful identifier, which incorporates the DB ID, driver,
;; and name from the details
"dataSourceName" (format "db-%d-%s-%s"
......
......@@ -82,6 +82,14 @@
(name (ns-name a-namespace))
(name a-namespace)))
(defn level-enabled?
"Is logging at `level` enabled for `a-namespace`?"
(^Boolean [level]
(level-enabled? *ns* level))
(^Boolean [a-namespace level]
(let [^Logger logger (log.impl/get-logger log/*logger-factory* a-namespace)]
(.isEnabled logger level))))
(defn effective-ns-logger
"Get the logger that will be used for the namespace named by `a-namespace`."
^LoggerConfig [a-namespace]
......
......@@ -240,6 +240,42 @@
(is (= 20
(sql-jdbc.conn/jdbc-data-warehouse-unreturned-connection-timeout-seconds))))))
(deftest ^:parallel include-debug-unreturned-connection-stack-traces-test
(testing "We should be setting debugUnreturnedConnectionStackTraces (#47981)"
(is (=? {"debugUnreturnedConnectionStackTraces" boolean?}
(sql-jdbc.conn/data-warehouse-connection-pool-properties :h2 (mt/db))))))
(deftest debug-unreturned-connection-stack-traces-test
(testing "We should be able to set jdbc-data-warehouse-debug-unreturned-connection-stack-traces via env var (#47981)"
(doseq [setting [true false]]
(mt/with-temp-env-var-value! [mb-jdbc-data-warehouse-debug-unreturned-connection-stack-traces (str setting)]
(is (= setting
(sql-jdbc.conn/jdbc-data-warehouse-debug-unreturned-connection-stack-traces))
(str "setting=" setting))))))
(deftest debug-unreturned-connection-stack-traces-misconfigured-c3p0-log-warning-test
(testing "We should log a warning if debug stack traces are enabled but c3p0 INFO logs are not (#47981)\n"
;; kondo thinks the c3p0-log-level binding is unused
#_{:clj-kondo/ignore [:unused-binding]}
(letfn [(warning-found? [warnings]
(boolean (some #(str/includes?
(:message %)
"You must raise the log level for com.mchange to INFO")
warnings)))
(warnings-logged? [c3p0-log-level setting warning-expected?]
(mt/with-temp-env-var-value! [mb-jdbc-data-warehouse-debug-unreturned-connection-stack-traces setting]
(mt/with-log-level [com.mchange c3p0-log-level]
(mt/with-log-messages-for-level [warnings :warn]
(and (= setting
(get (sql-jdbc.conn/data-warehouse-connection-pool-properties :h2 (mt/db))
"debugUnreturnedConnectionStackTraces"))
(= warning-expected? (warning-found? (warnings))))))))]
(are [c3p0-log-level setting warning-expected?] (warnings-logged? c3p0-log-level setting warning-expected?)
:error true true
:error false false
:info true false
:info false false))))
(defn- init-h2-tcp-server [port]
(let [args ["-tcp" "-tcpPort", (str port), "-tcpAllowOthers" "-tcpDaemon"]
server (Server/createTcpServer (into-array args))]
......
......@@ -9,6 +9,7 @@
[metabase.logger :as logger]
[metabase.test :as mt])
(:import
(org.apache.logging.log4j Level)
(org.apache.logging.log4j.core Logger)))
(set! *warn-on-reflection* true)
......@@ -94,3 +95,37 @@
(finally
(when (.exists f)
(io/delete-file f)))))))
(deftest level-enabled?-test
(are [set-level check-level expected-value] (= expected-value
(mt/with-log-level [metabase.logger-test set-level]
(logger/level-enabled? 'metabase.logger-test check-level)))
:error Level/ERROR true
:error Level/WARN false
:error Level/INFO false
:error Level/DEBUG false
:error Level/TRACE false
:warn Level/ERROR true
:warn Level/WARN true
:warn Level/INFO false
:warn Level/DEBUG false
:warn Level/TRACE false
:info Level/ERROR true
:info Level/WARN true
:info Level/INFO true
:info Level/DEBUG false
:info Level/TRACE false
:debug Level/ERROR true
:debug Level/WARN true
:debug Level/INFO true
:debug Level/DEBUG true
:debug Level/TRACE false
:trace Level/ERROR true
:trace Level/WARN true
:trace Level/INFO true
:trace Level/DEBUG true
:trace Level/TRACE true))
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment