Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated .
  1. Aug 05, 2022
    • dpsutton's avatar
      Fix in-memory logger (#24616) · df165ba1
      dpsutton authored
      * Fix in-memory logger
      
      Our Admin > Troubleshooting > Logs page broke, just showing a spinner
      and never showing the logs.
      
      Don't quite understand why this fixes it. Javadocs are
      https://logging.apache.org/log4j/2.x/log4j-api/apidocs/org/apache/logging/log4j/LogManager.html#getContext-boolean-
      
      ```clojure
      logger=> (log/warn "test")
      nil
      logger=> (count @messages*)
      0
      ;; no in-memory logs so page is empty
      ;; change `(LogManager/getContext true)` in the momoized ns-logger fn
      ;; and then observe:
      logger=> (log/warn "test")
      nil
      logger=> (count @messages*)
      4
      ```
      
      Some explorations that might shine some light:
      
      ```clojure
      logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext false) (str *ns*))))
      {}
      logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext true) (str *ns*))))
      {"metabase-appender" #object[metabase.logger.proxy$org.apache.logging.log4j.core.appender.AbstractAppender$ff19274a
                                   "0x4d680247"
                                   "metabase-appender"]}
      ```
      
      So something is not hooked up quite right.
      
      * Add tests for metabase.logger
      
      * Ensure `api/util/logs` returns logs
      
      * tests
      
      * Check for presence of `MetabaseLoggerFactory` rather than whole str
      
      When in a namespace with a record, `Foo resolves to ns.Foo. But outside
      it resolves to ns/Foo. When running tests from the command line *ns* is
      user so it gets more complicated.
      
      * Kinda playing whackamole™ with `(LogManager/getContext true)`
      
      * Remove custom memoizing logger
      
      History:
      
      39.2 we set `Multi-Release: true` in our manifest file and query speed
      drops like a stone. Jeff was able to track this to our logger calls in
      tight loops.
      
      We revert the multi-release and keep seeing the warning on startup
      
      > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
      
      Benchmarking on 39.2
      (bench (log/trace "hi")) -> 15383 ns
      
      So we freaked out and set up a memoizing logger factory
      
      (bench (log/trace "hi")) -> 141 ns
      
      What a win.
      
      But we never noticed that the default *logger-factory* being picked up
      was slf4j ( `(log.impl/name log/*logger-factory*)` -> "org.slf4j" ). On
      39.2 if you set the factory to the log4j2 version you get back to a
      great number: `(bench (log/trace "hi"))` -> 25 ns
      
      And thus ensuring that our logger is the default log4j2 version is even
      faster than our memoizing logger-factory.
      
      Memoizing factory: 141 ns
      slf4j factory: 2269 ns
      log4j2 factory: 31 ns
      
      What does `(LogManager/getContext false)` mean versus using `true`? We
      only need and want a single context. But log4j2 by default uses a
      context selector called `ClassLoaderContextSelector`. We could put all
      of this behind us if we used a context selector type
      `BasicContextSelector` but this is surprisingly hard to do: you have to
      set a system property. And since all of this stuff gets initialized in
      static initializers, we don't have an opportunity to do this
      programmatically. The way around this is to require people to pass this
      system property on startup which is not acceptable.
      
      So getContext true checks for a threadlocal context in a specific static
      variable and falls back to a Default context. getContext false looks at
      classloaders and ends up at a different context. BUT: the log.impl
      version uses a closure over getContext false instead of getting it each
      time. And I suspect that when it does this there is only one so it is
      the default and continues to use this one. In our LoggerFactory
      implementation we were looking up the context each time. This still
      seems to work and everything is playing nice in our application
      classloader but its totally possible that our drivers are not hitting
      this. I'll have to investigate this.
      
      Verification:
      - build the uberjar locally (`bin/build`)
      - copy to some temp directory and also copy criterium.jar
      
      ```shell
      MB_JETTY_PORT=4000 java "$(socket-repl 4001)" -cp locally-built.jar:criterium.jar metabase.core
      ```
      
      ```clojure
      /tmp/locally-built via :coffee: v17.30 on :cloud:  metabase-query
      ❯ nc localhost 4001
      user=> (doto 'metabase.logger require in-ns)
      metabase.logger
      metabase.logger=> (require '[criterium.core :refer [bench]])
      nil
      metabase.logger=> (bench (log/trace "hi"))
      Evaluation count : 1686535500 in 60 samples of 28108925 calls.
                   Execution time mean : 22.487972 ns
          Execution time std-deviation : 0.101004 ns
         Execution time lower quantile : 22.326806 ns ( 2.5%)
         Execution time upper quantile : 22.648368 ns (97.5%)
                         Overhead used : 6.924761 ns
      nil
      metabase.logger=> (count (messages))
      358
      metabase.logger=>
      ```
      
      Verifies that we are on the order of 22 nanoseconds and the in-memory
      logger has messages in it.
      
      * Appease our namespace linters
      
      * I'll unplug you ns linter
      
      * Better tests and ns docstring
      
      * Bootstrap to set system properties
      
      New entrypoint for the application: metabase.bootstrap
      
      sets two properties for logging (context selector, log4j2 factory) and
      ensures those properties are set before any logging code is loaded.
      
      * docstrings and clean ns
      
      * metabase.logger ns docstring cleanup
      
      * docstring
      
      * rename a test now that there's no memoization
      
      * add logger properties to :dev profile
      
      * Revert "add logger properties to :dev profile"
      
      This reverts commit 4f09fa3b631f882a3c5edcab4508769ffb20d4fa.
      
      * deps
      Unverified
      df165ba1
  2. Jul 28, 2022
    • dpsutton's avatar
      Multi release jar again (#24366) · 1f4bf25f
      dpsutton authored
      * Try multi-release true again in our manifest
      
      Problem statement:
      Luiz packs our partner jars (exasol, starburst, etc.) into our jar so
      they can be "first class" and in cloud. But with the 44 cycle we've run
      into some issues:
      
      ```shell
      /tmp/j via :coffee: v17.30 on :cloud:  metabase-query
      ❯ jar uf 0.44.0-RC1.jar modules/*.jar
      
      ❯ java --version
      openjdk 11.0.14.1 2022-02-08
      OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
      OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
      
      /tmp/j via :coffee: v11.0.14.1 on :cloud:  metabase-query
      ❯ jar uf 0.44.0-RC1.jar modules/*.jar
      java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0
      	at java.base/jdk.internal.module.ModuleInfo.invalidModuleDescriptor(ModuleInfo.java:1091)
      	at java.base/jdk.internal.module.ModuleInfo.doRead(ModuleInfo.java:195)
      	at java.base/jdk.internal.module.ModuleInfo.read(ModuleInfo.java:147)
      	at java.base/java.lang.module.ModuleDescriptor.read(ModuleDescriptor.java:2553)
      	at jdk.jartool/sun.tools.jar.Main.addExtendedModuleAttributes(Main.java:2083)
      	at jdk.jartool/sun.tools.jar.Main.update(Main.java:1017)
      	at jdk.jartool/sun.tools.jar.Main.run(Main.java:366)
      	at jdk.jartool/sun.tools.jar.Main.main(Main.java:1680)
      
      ```
      
      Diogo tracked this down with some great sleuthing to an upgrade in our
      graal/js engine from “22.0.0.2” -> “22.1.0". This brought along the
      transitive truffle jar (which is the actual engine powering the js
      engine). The 22.0.0.2 was technically a multi-release jar but it only
      included java 11 sources. The 22.1.0 added java 17 sources in addition
      to the java 11.
      
      And this proves fatal to using the `jar` command. When `"Multi-Release"`
      is set to true, it knows to only look at versions it will need. Lacking
      this, it looks at all of the classes and the class version for 17 is
      61.0 is higher than it knows how to understand and it breaks.
      
      Obvious Solution:
      Set Multi-Release to true. We have done this in the past. On startup we
      have a message logged:
      
      > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
      > will impact performance.
      
      And setting multi-release can remove this. But when we did that we ended
      up with:
      - https://github.com/metabase/metabase/issues/16380
      - https://github.com/metabase/metabase/pull/17027
      
      That issue describes slowdowns of queries on the order of 0.6 seconds ->
      1.3 seconds. Almost doubling. People reported dashboards timing
      out. Jeff tracked this down to
      
      > Profiling revealed that the calls to Log4jLoggerFactory.getLogger
      > became much slower between the two versions. See attached screenshots.
      
      And this is a pernicious problem that we cannot easily test for.
      
      Lets try again:
      I've set multi-release to true and built a jar with `bin/build`. I
      immediately ran into problems:
      
      ```shell
      ❯ MB_DB_CONNECTION_URI="postgres://user:pass@localhost:5432/compare
      
      " MB_JETTY_PORT=3007 java "$(socket-repl 6007)" -jar multi-release-local.jar
      Warning: protocol #'java-time.core/Amount is overwriting function abs
      WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time.core, being replaced by: #'java-time.core/abs
      WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time, being replaced by: #'java-time/abs
      Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home
      Exception in thread "main" java.lang.Error: Circular loading of installed providers detected
      	at java.base/java.nio.file.spi.FileSystemProvider.installedProviders(FileSystemProvider.java:198)
      	at java.base/java.nio.file.Path.of(Path.java:206)
      	at java.base/java.nio.file.Paths.get(Paths.java:98)
      	at org.apache.logging.log4j.core.util.Source.toFile(Source.java:55)
      	at org.apache.logging.log4j.core.util.Source.<init>(Source.java:142)
      	at
              org.apache.logging.log4j.core.config.ConfigurationSource.<init>(ConfigurationSource.java:139)
      ```
      
      So hazarded a guess that a bump in the log4j would solve this. And it
      does solve it.
      
      Then profiling some queries against bigquery (just viewing the table) in
      the RC2 and the locally built version with the multi-release:
      
      ```shell
      -- multi-release
      2022-07-27 12:28:00,659 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.1 s
      2022-07-27 12:28:02,609 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 897.9 ms
      2022-07-27 12:28:03,950 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 778.1 ms
      
      -- RC non-multi-release
      2022-07-27 12:28:57,633 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.0 s
      2022-07-27 12:28:59,343 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 912.9 ms
      2022-07-27 12:29:02,328 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 808.6 ms
      ```
      So times seem very similar.
      
      ============
      
      Proper benching:
      
      using criterium
      
      ```shell
      MB_JETTY_PORT=3008 java "$(socket-repl 6008)" -cp "/Users/dan/.m2/repository/criterium/criterium/0.4.6/criterium-0.4.6.jar":0.39.2.jar metabase.core
      ```
      
      `(bench (log/warn "benching"))`
      
      Summary:
      39.2:          21.109470 µs
      RC2:           4.975204 µs
      multi-release: 7.673965 µs
      
      These flood the consoles with logs
      
      ```
      Older release: 39.2
      
      user=> (bench (log/warn "benching"))
      Evaluation count : 2886240 in 60 samples of 48104 calls.
                   Execution time mean : 21.109470 µs
          Execution time std-deviation : 567.271917 ns
         Execution time lower quantile : 20.171870 µs ( 2.5%)
         Execution time upper quantile : 22.429557 µs (97.5%)
                         Overhead used : 6.835913 ns
      
      Found 5 outliers in 60 samples (8.3333 %)
      	low-severe	 4 (6.6667 %)
      	low-mild	 1 (1.6667 %)
       Variance from outliers : 14.1886 % Variance is moderately inflated by outliers
      
      =============================================
      
      RC2:
      
      user=> (bench (log/warn "benching"))Evaluation count : 12396420 in 60 samples of 206607 calls.
                   Execution time mean : 4.975204 µs
          Execution time std-deviation : 521.769687 ns
         Execution time lower quantile : 4.711607 µs ( 2.5%)
         Execution time upper quantile : 6.404317 µs (97.5%)
                         Overhead used : 6.837290 ns
      
      Found 5 outliers in 60 samples (8.3333 %)
      	low-severe	 2 (3.3333 %)
      	low-mild	 3 (5.0000 %)
       Variance from outliers : 72.0600 % Variance is severely inflated by outliers
      
      =============================================
      
      Proposed Multi-Release
      
      user=> (bench (log/warn "benching"))
      Evaluation count : 7551000 in 60 samples of 125850 calls.
                   Execution time mean : 7.673965 µs
          Execution time std-deviation : 201.155749 ns
         Execution time lower quantile : 7.414837 µs ( 2.5%)
         Execution time upper quantile : 8.138010 µs (97.5%)
                         Overhead used : 6.843981 ns
      
      Found 1 outliers in 60 samples (1.6667 %)
      	low-severe	 1 (1.6667 %)
       Variance from outliers : 14.1472 % Variance is moderately inflated by outliers
      
      ```
      
      `(bench (log/info "benching info"))`
      
      This does not hit a console so is a no-op.
      
      Summary:
      39.2:          11.534614 µs
      RC2:           98.408357 ns
      multi-release: 2.236756 µs
      
      ```
      =============================================
      39.2:
      
      user=> (bench (log/info "benching info"))
      Evaluation count : 5223480 in 60 samples of 87058 calls.
                   Execution time mean : 11.534614 µs
          Execution time std-deviation : 57.756163 ns
         Execution time lower quantile : 11.461502 µs ( 2.5%)
         Execution time upper quantile : 11.657644 µs (97.5%)
                         Overhead used : 6.835913 ns
      
      Found 3 outliers in 60 samples (5.0000 %)
      	low-severe	 2 (3.3333 %)
      	low-mild	 1 (1.6667 %)
       Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
      
      =============================================
      
      RC2:
      
      user=> (bench (log/info "benching info"))Evaluation count : 574427220 in 60 samples of 9573787 calls.
                   Execution time mean : 98.408357 ns
          pExecution time std-deviation : 1.792214 ns
         Execution time lower quantile : 96.891477 ns ( 2.5%)
         Execution time upper quantile : 103.394664 ns (97.5%)
                         Overhead used : 6.837290 ns
      
      Found 8 outliers in 60 samples (13.3333 %)
      	low-severe	 3 (5.0000 %)
      	low-mild	 5 (8.3333 %)
       Variance from outliers : 7.7881 % Variance is slightly inflated by outliers
      
      =============================================
      
      Multi-release:
      
      user=> (bench (log/info "benching info"))Evaluation count : 26477700 in 60 samples of 441295 calls.
                   Execution time mean : 2.236756 µs
          Execution time std-deviation : 15.412356 ns
         Execution time lower quantile : 2.212301 µs ( 2.5%)
         Execution time upper quantile : 2.275434 µs (97.5%)
                         Overhead used : 6.843981 ns
      
      Found 3 outliers in 60 samples (5.0000 %)
      	low-severe	 3 (5.0000 %)
       Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
       ```
      
      * bump graal/js
      
      * Custom MB log factory (#24369)
      
      * Custom MB log factory
      
      * Write stupid code to appease stupid Eastwood
      
      * `ns-name` already calls `the-ns` on its argument.
      
      * More code cleanup
      
      * Improved code
      
      * Remove NOCOMMIT
      
      * empty commit to trigger CI
      
      Co-authored-by: default avatarCam Saul <1455846+camsaul@users.noreply.github.com>
      Unverified
      1f4bf25f
  3. Apr 19, 2022
    • Braden Shepherdson's avatar
      Make namespace aliasing consistent everywhere; enforce with clj-kondo (#21738) · 19beda53
      Braden Shepherdson authored
      * Make namespace aliasing consistent everywhere; enforce with clj-kondo
      
      See the table of aliases in .clj-kondo/config.edn
      
      Notable patterns:
      - `[metabase.api.foo :as api.foo]`
      - `[metabase.models.foo :as foo]`
      - `[metabase.query-processor.foo :as qp.foo]`
      - `[metabase.server.middleware.foo :as mw.foo]`
      - `[metabase.util.foo :as u.foo]`
      - `[clj-http.client :as http]` and `[metabase.http-client :as client]`
      
      Fixes #19930.
      Unverified
      19beda53
  4. Mar 14, 2022
    • Cam Saul's avatar
      Upgrade Liquibase to latest version; remove final Java source file and need... · aaf1b601
      Cam Saul authored
      Upgrade Liquibase to latest version; remove final Java source file and need for `clojure -X:deps prep` (#20611)
      
      * Upgrade Liquibase to latest version
      
      * Try adjusting log
      
      * Fix checksums for the TWO migrations with ID = 32
      
      * FINALLY get Liquibase to use Log4j2
      
      * Set Liquibase ConsoleUIService OutputStream to null OutputStream
      
      * Manually define a package for our H2 proxy class so Java 8 works
      
      * Fix package-name determination code
      
      * Update migrations file spec
      
      * `databasechangelog` shouldn't be upper-case
      
      * Lower-case quartz table names
      
      * More MySQL fixes :wrench:
      
      * Properties for all the Quartz tables :cry:
      
      * Formatting tweaks [ci skip]
      
      * Revert a few more busted changes
      
      * Fix more busted changes
      
      * Bump Liquibase version to 4.8.0 to fix MySQL defaultValueBoolean bug
      
      * OMG I think I finally fixed MySQL
      
      * Remove Java source file and prep-deps code
      
      * Remove two more references to bin/prep.sh
      
      * Minor cleanup
      
      * Revert unneeded changes
      
      * Fix busted indentation
      
      * Don't search inside java/ anymore since it's G-O-N-E
      
      * Appease the namespace linter
      
      * Update src/metabase/db/liquibase/h2.clj
      Unverified
      aaf1b601
  5. Jul 30, 2021
Loading