Skip to content
Snippets Groups Projects
  • dpsutton's avatar
    df165ba1
    Fix in-memory logger (#24616) · df165ba1
    dpsutton authored
    * Fix in-memory logger
    
    Our Admin > Troubleshooting > Logs page broke, just showing a spinner
    and never showing the logs.
    
    Don't quite understand why this fixes it. Javadocs are
    https://logging.apache.org/log4j/2.x/log4j-api/apidocs/org/apache/logging/log4j/LogManager.html#getContext-boolean-
    
    ```clojure
    logger=> (log/warn "test")
    nil
    logger=> (count @messages*)
    0
    ;; no in-memory logs so page is empty
    ;; change `(LogManager/getContext true)` in the momoized ns-logger fn
    ;; and then observe:
    logger=> (log/warn "test")
    nil
    logger=> (count @messages*)
    4
    ```
    
    Some explorations that might shine some light:
    
    ```clojure
    logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext false) (str *ns*))))
    {}
    logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext true) (str *ns*))))
    {"metabase-appender" #object[metabase.logger.proxy$org.apache.logging.log4j.core.appender.AbstractAppender$ff19274a
                                 "0x4d680247"
                                 "metabase-appender"]}
    ```
    
    So something is not hooked up quite right.
    
    * Add tests for metabase.logger
    
    * Ensure `api/util/logs` returns logs
    
    * tests
    
    * Check for presence of `MetabaseLoggerFactory` rather than whole str
    
    When in a namespace with a record, `Foo resolves to ns.Foo. But outside
    it resolves to ns/Foo. When running tests from the command line *ns* is
    user so it gets more complicated.
    
    * Kinda playing whackamole™ with `(LogManager/getContext true)`
    
    * Remove custom memoizing logger
    
    History:
    
    39.2 we set `Multi-Release: true` in our manifest file and query speed
    drops like a stone. Jeff was able to track this to our logger calls in
    tight loops.
    
    We revert the multi-release and keep seeing the warning on startup
    
    > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
    
    Benchmarking on 39.2
    (bench (log/trace "hi")) -> 15383 ns
    
    So we freaked out and set up a memoizing logger factory
    
    (bench (log/trace "hi")) -> 141 ns
    
    What a win.
    
    But we never noticed that the default *logger-factory* being picked up
    was slf4j ( `(log.impl/name log/*logger-factory*)` -> "org.slf4j" ). On
    39.2 if you set the factory to the log4j2 version you get back to a
    great number: `(bench (log/trace "hi"))` -> 25 ns
    
    And thus ensuring that our logger is the default log4j2 version is even
    faster than our memoizing logger-factory.
    
    Memoizing factory: 141 ns
    slf4j factory: 2269 ns
    log4j2 factory: 31 ns
    
    What does `(LogManager/getContext false)` mean versus using `true`? We
    only need and want a single context. But log4j2 by default uses a
    context selector called `ClassLoaderContextSelector`. We could put all
    of this behind us if we used a context selector type
    `BasicContextSelector` but this is surprisingly hard to do: you have to
    set a system property. And since all of this stuff gets initialized in
    static initializers, we don't have an opportunity to do this
    programmatically. The way around this is to require people to pass this
    system property on startup which is not acceptable.
    
    So getContext true checks for a threadlocal context in a specific static
    variable and falls back to a Default context. getContext false looks at
    classloaders and ends up at a different context. BUT: the log.impl
    version uses a closure over getContext false instead of getting it each
    time. And I suspect that when it does this there is only one so it is
    the default and continues to use this one. In our LoggerFactory
    implementation we were looking up the context each time. This still
    seems to work and everything is playing nice in our application
    classloader but its totally possible that our drivers are not hitting
    this. I'll have to investigate this.
    
    Verification:
    - build the uberjar locally (`bin/build`)
    - copy to some temp directory and also copy criterium.jar
    
    ```shell
    MB_JETTY_PORT=4000 java "$(socket-repl 4001)" -cp locally-built.jar:criterium.jar metabase.core
    ```
    
    ```clojure
    /tmp/locally-built via :coffee: v17.30 on :cloud:  metabase-query
    ❯ nc localhost 4001
    user=> (doto 'metabase.logger require in-ns)
    metabase.logger
    metabase.logger=> (require '[criterium.core :refer [bench]])
    nil
    metabase.logger=> (bench (log/trace "hi"))
    Evaluation count : 1686535500 in 60 samples of 28108925 calls.
                 Execution time mean : 22.487972 ns
        Execution time std-deviation : 0.101004 ns
       Execution time lower quantile : 22.326806 ns ( 2.5%)
       Execution time upper quantile : 22.648368 ns (97.5%)
                       Overhead used : 6.924761 ns
    nil
    metabase.logger=> (count (messages))
    358
    metabase.logger=>
    ```
    
    Verifies that we are on the order of 22 nanoseconds and the in-memory
    logger has messages in it.
    
    * Appease our namespace linters
    
    * I'll unplug you ns linter
    
    * Better tests and ns docstring
    
    * Bootstrap to set system properties
    
    New entrypoint for the application: metabase.bootstrap
    
    sets two properties for logging (context selector, log4j2 factory) and
    ensures those properties are set before any logging code is loaded.
    
    * docstrings and clean ns
    
    * metabase.logger ns docstring cleanup
    
    * docstring
    
    * rename a test now that there's no memoization
    
    * add logger properties to :dev profile
    
    * Revert "add logger properties to :dev profile"
    
    This reverts commit 4f09fa3b631f882a3c5edcab4508769ffb20d4fa.
    
    * deps
    Fix in-memory logger (#24616)
    dpsutton authored
    * Fix in-memory logger
    
    Our Admin > Troubleshooting > Logs page broke, just showing a spinner
    and never showing the logs.
    
    Don't quite understand why this fixes it. Javadocs are
    https://logging.apache.org/log4j/2.x/log4j-api/apidocs/org/apache/logging/log4j/LogManager.html#getContext-boolean-
    
    ```clojure
    logger=> (log/warn "test")
    nil
    logger=> (count @messages*)
    0
    ;; no in-memory logs so page is empty
    ;; change `(LogManager/getContext true)` in the momoized ns-logger fn
    ;; and then observe:
    logger=> (log/warn "test")
    nil
    logger=> (count @messages*)
    4
    ```
    
    Some explorations that might shine some light:
    
    ```clojure
    logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext false) (str *ns*))))
    {}
    logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext true) (str *ns*))))
    {"metabase-appender" #object[metabase.logger.proxy$org.apache.logging.log4j.core.appender.AbstractAppender$ff19274a
                                 "0x4d680247"
                                 "metabase-appender"]}
    ```
    
    So something is not hooked up quite right.
    
    * Add tests for metabase.logger
    
    * Ensure `api/util/logs` returns logs
    
    * tests
    
    * Check for presence of `MetabaseLoggerFactory` rather than whole str
    
    When in a namespace with a record, `Foo resolves to ns.Foo. But outside
    it resolves to ns/Foo. When running tests from the command line *ns* is
    user so it gets more complicated.
    
    * Kinda playing whackamole™ with `(LogManager/getContext true)`
    
    * Remove custom memoizing logger
    
    History:
    
    39.2 we set `Multi-Release: true` in our manifest file and query speed
    drops like a stone. Jeff was able to track this to our logger calls in
    tight loops.
    
    We revert the multi-release and keep seeing the warning on startup
    
    > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
    
    Benchmarking on 39.2
    (bench (log/trace "hi")) -> 15383 ns
    
    So we freaked out and set up a memoizing logger factory
    
    (bench (log/trace "hi")) -> 141 ns
    
    What a win.
    
    But we never noticed that the default *logger-factory* being picked up
    was slf4j ( `(log.impl/name log/*logger-factory*)` -> "org.slf4j" ). On
    39.2 if you set the factory to the log4j2 version you get back to a
    great number: `(bench (log/trace "hi"))` -> 25 ns
    
    And thus ensuring that our logger is the default log4j2 version is even
    faster than our memoizing logger-factory.
    
    Memoizing factory: 141 ns
    slf4j factory: 2269 ns
    log4j2 factory: 31 ns
    
    What does `(LogManager/getContext false)` mean versus using `true`? We
    only need and want a single context. But log4j2 by default uses a
    context selector called `ClassLoaderContextSelector`. We could put all
    of this behind us if we used a context selector type
    `BasicContextSelector` but this is surprisingly hard to do: you have to
    set a system property. And since all of this stuff gets initialized in
    static initializers, we don't have an opportunity to do this
    programmatically. The way around this is to require people to pass this
    system property on startup which is not acceptable.
    
    So getContext true checks for a threadlocal context in a specific static
    variable and falls back to a Default context. getContext false looks at
    classloaders and ends up at a different context. BUT: the log.impl
    version uses a closure over getContext false instead of getting it each
    time. And I suspect that when it does this there is only one so it is
    the default and continues to use this one. In our LoggerFactory
    implementation we were looking up the context each time. This still
    seems to work and everything is playing nice in our application
    classloader but its totally possible that our drivers are not hitting
    this. I'll have to investigate this.
    
    Verification:
    - build the uberjar locally (`bin/build`)
    - copy to some temp directory and also copy criterium.jar
    
    ```shell
    MB_JETTY_PORT=4000 java "$(socket-repl 4001)" -cp locally-built.jar:criterium.jar metabase.core
    ```
    
    ```clojure
    /tmp/locally-built via :coffee: v17.30 on :cloud:  metabase-query
    ❯ nc localhost 4001
    user=> (doto 'metabase.logger require in-ns)
    metabase.logger
    metabase.logger=> (require '[criterium.core :refer [bench]])
    nil
    metabase.logger=> (bench (log/trace "hi"))
    Evaluation count : 1686535500 in 60 samples of 28108925 calls.
                 Execution time mean : 22.487972 ns
        Execution time std-deviation : 0.101004 ns
       Execution time lower quantile : 22.326806 ns ( 2.5%)
       Execution time upper quantile : 22.648368 ns (97.5%)
                       Overhead used : 6.924761 ns
    nil
    metabase.logger=> (count (messages))
    358
    metabase.logger=>
    ```
    
    Verifies that we are on the order of 22 nanoseconds and the in-memory
    logger has messages in it.
    
    * Appease our namespace linters
    
    * I'll unplug you ns linter
    
    * Better tests and ns docstring
    
    * Bootstrap to set system properties
    
    New entrypoint for the application: metabase.bootstrap
    
    sets two properties for logging (context selector, log4j2 factory) and
    ensures those properties are set before any logging code is loaded.
    
    * docstrings and clean ns
    
    * metabase.logger ns docstring cleanup
    
    * docstring
    
    * rename a test now that there's no memoization
    
    * add logger properties to :dev profile
    
    * Revert "add logger properties to :dev profile"
    
    This reverts commit 4f09fa3b631f882a3c5edcab4508769ffb20d4fa.
    
    * deps
Code owners
Assign users and groups as approvers for specific file changes. Learn more.