Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated .
  1. Jul 28, 2022
    • dpsutton's avatar
      Multi release jar again (#24366) · 1f4bf25f
      dpsutton authored
      * Try multi-release true again in our manifest
      
      Problem statement:
      Luiz packs our partner jars (exasol, starburst, etc.) into our jar so
      they can be "first class" and in cloud. But with the 44 cycle we've run
      into some issues:
      
      ```shell
      /tmp/j via :coffee: v17.30 on :cloud:  metabase-query
      ❯ jar uf 0.44.0-RC1.jar modules/*.jar
      
      ❯ java --version
      openjdk 11.0.14.1 2022-02-08
      OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
      OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
      
      /tmp/j via :coffee: v11.0.14.1 on :cloud:  metabase-query
      ❯ jar uf 0.44.0-RC1.jar modules/*.jar
      java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0
      	at java.base/jdk.internal.module.ModuleInfo.invalidModuleDescriptor(ModuleInfo.java:1091)
      	at java.base/jdk.internal.module.ModuleInfo.doRead(ModuleInfo.java:195)
      	at java.base/jdk.internal.module.ModuleInfo.read(ModuleInfo.java:147)
      	at java.base/java.lang.module.ModuleDescriptor.read(ModuleDescriptor.java:2553)
      	at jdk.jartool/sun.tools.jar.Main.addExtendedModuleAttributes(Main.java:2083)
      	at jdk.jartool/sun.tools.jar.Main.update(Main.java:1017)
      	at jdk.jartool/sun.tools.jar.Main.run(Main.java:366)
      	at jdk.jartool/sun.tools.jar.Main.main(Main.java:1680)
      
      ```
      
      Diogo tracked this down with some great sleuthing to an upgrade in our
      graal/js engine from “22.0.0.2” -> “22.1.0". This brought along the
      transitive truffle jar (which is the actual engine powering the js
      engine). The 22.0.0.2 was technically a multi-release jar but it only
      included java 11 sources. The 22.1.0 added java 17 sources in addition
      to the java 11.
      
      And this proves fatal to using the `jar` command. When `"Multi-Release"`
      is set to true, it knows to only look at versions it will need. Lacking
      this, it looks at all of the classes and the class version for 17 is
      61.0 is higher than it knows how to understand and it breaks.
      
      Obvious Solution:
      Set Multi-Release to true. We have done this in the past. On startup we
      have a message logged:
      
      > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
      > will impact performance.
      
      And setting multi-release can remove this. But when we did that we ended
      up with:
      - https://github.com/metabase/metabase/issues/16380
      - https://github.com/metabase/metabase/pull/17027
      
      That issue describes slowdowns of queries on the order of 0.6 seconds ->
      1.3 seconds. Almost doubling. People reported dashboards timing
      out. Jeff tracked this down to
      
      > Profiling revealed that the calls to Log4jLoggerFactory.getLogger
      > became much slower between the two versions. See attached screenshots.
      
      And this is a pernicious problem that we cannot easily test for.
      
      Lets try again:
      I've set multi-release to true and built a jar with `bin/build`. I
      immediately ran into problems:
      
      ```shell
      ❯ MB_DB_CONNECTION_URI="postgres://user:pass@localhost:5432/compare
      
      " MB_JETTY_PORT=3007 java "$(socket-repl 6007)" -jar multi-release-local.jar
      Warning: protocol #'java-time.core/Amount is overwriting function abs
      WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time.core, being replaced by: #'java-time.core/abs
      WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time, being replaced by: #'java-time/abs
      Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home
      Exception in thread "main" java.lang.Error: Circular loading of installed providers detected
      	at java.base/java.nio.file.spi.FileSystemProvider.installedProviders(FileSystemProvider.java:198)
      	at java.base/java.nio.file.Path.of(Path.java:206)
      	at java.base/java.nio.file.Paths.get(Paths.java:98)
      	at org.apache.logging.log4j.core.util.Source.toFile(Source.java:55)
      	at org.apache.logging.log4j.core.util.Source.<init>(Source.java:142)
      	at
              org.apache.logging.log4j.core.config.ConfigurationSource.<init>(ConfigurationSource.java:139)
      ```
      
      So hazarded a guess that a bump in the log4j would solve this. And it
      does solve it.
      
      Then profiling some queries against bigquery (just viewing the table) in
      the RC2 and the locally built version with the multi-release:
      
      ```shell
      -- multi-release
      2022-07-27 12:28:00,659 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.1 s
      2022-07-27 12:28:02,609 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 897.9 ms
      2022-07-27 12:28:03,950 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 778.1 ms
      
      -- RC non-multi-release
      2022-07-27 12:28:57,633 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.0 s
      2022-07-27 12:28:59,343 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 912.9 ms
      2022-07-27 12:29:02,328 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 808.6 ms
      ```
      So times seem very similar.
      
      ============
      
      Proper benching:
      
      using criterium
      
      ```shell
      MB_JETTY_PORT=3008 java "$(socket-repl 6008)" -cp "/Users/dan/.m2/repository/criterium/criterium/0.4.6/criterium-0.4.6.jar":0.39.2.jar metabase.core
      ```
      
      `(bench (log/warn "benching"))`
      
      Summary:
      39.2:          21.109470 µs
      RC2:           4.975204 µs
      multi-release: 7.673965 µs
      
      These flood the consoles with logs
      
      ```
      Older release: 39.2
      
      user=> (bench (log/warn "benching"))
      Evaluation count : 2886240 in 60 samples of 48104 calls.
                   Execution time mean : 21.109470 µs
          Execution time std-deviation : 567.271917 ns
         Execution time lower quantile : 20.171870 µs ( 2.5%)
         Execution time upper quantile : 22.429557 µs (97.5%)
                         Overhead used : 6.835913 ns
      
      Found 5 outliers in 60 samples (8.3333 %)
      	low-severe	 4 (6.6667 %)
      	low-mild	 1 (1.6667 %)
       Variance from outliers : 14.1886 % Variance is moderately inflated by outliers
      
      =============================================
      
      RC2:
      
      user=> (bench (log/warn "benching"))Evaluation count : 12396420 in 60 samples of 206607 calls.
                   Execution time mean : 4.975204 µs
          Execution time std-deviation : 521.769687 ns
         Execution time lower quantile : 4.711607 µs ( 2.5%)
         Execution time upper quantile : 6.404317 µs (97.5%)
                         Overhead used : 6.837290 ns
      
      Found 5 outliers in 60 samples (8.3333 %)
      	low-severe	 2 (3.3333 %)
      	low-mild	 3 (5.0000 %)
       Variance from outliers : 72.0600 % Variance is severely inflated by outliers
      
      =============================================
      
      Proposed Multi-Release
      
      user=> (bench (log/warn "benching"))
      Evaluation count : 7551000 in 60 samples of 125850 calls.
                   Execution time mean : 7.673965 µs
          Execution time std-deviation : 201.155749 ns
         Execution time lower quantile : 7.414837 µs ( 2.5%)
         Execution time upper quantile : 8.138010 µs (97.5%)
                         Overhead used : 6.843981 ns
      
      Found 1 outliers in 60 samples (1.6667 %)
      	low-severe	 1 (1.6667 %)
       Variance from outliers : 14.1472 % Variance is moderately inflated by outliers
      
      ```
      
      `(bench (log/info "benching info"))`
      
      This does not hit a console so is a no-op.
      
      Summary:
      39.2:          11.534614 µs
      RC2:           98.408357 ns
      multi-release: 2.236756 µs
      
      ```
      =============================================
      39.2:
      
      user=> (bench (log/info "benching info"))
      Evaluation count : 5223480 in 60 samples of 87058 calls.
                   Execution time mean : 11.534614 µs
          Execution time std-deviation : 57.756163 ns
         Execution time lower quantile : 11.461502 µs ( 2.5%)
         Execution time upper quantile : 11.657644 µs (97.5%)
                         Overhead used : 6.835913 ns
      
      Found 3 outliers in 60 samples (5.0000 %)
      	low-severe	 2 (3.3333 %)
      	low-mild	 1 (1.6667 %)
       Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
      
      =============================================
      
      RC2:
      
      user=> (bench (log/info "benching info"))Evaluation count : 574427220 in 60 samples of 9573787 calls.
                   Execution time mean : 98.408357 ns
          pExecution time std-deviation : 1.792214 ns
         Execution time lower quantile : 96.891477 ns ( 2.5%)
         Execution time upper quantile : 103.394664 ns (97.5%)
                         Overhead used : 6.837290 ns
      
      Found 8 outliers in 60 samples (13.3333 %)
      	low-severe	 3 (5.0000 %)
      	low-mild	 5 (8.3333 %)
       Variance from outliers : 7.7881 % Variance is slightly inflated by outliers
      
      =============================================
      
      Multi-release:
      
      user=> (bench (log/info "benching info"))Evaluation count : 26477700 in 60 samples of 441295 calls.
                   Execution time mean : 2.236756 µs
          Execution time std-deviation : 15.412356 ns
         Execution time lower quantile : 2.212301 µs ( 2.5%)
         Execution time upper quantile : 2.275434 µs (97.5%)
                         Overhead used : 6.843981 ns
      
      Found 3 outliers in 60 samples (5.0000 %)
      	low-severe	 3 (5.0000 %)
       Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
       ```
      
      * bump graal/js
      
      * Custom MB log factory (#24369)
      
      * Custom MB log factory
      
      * Write stupid code to appease stupid Eastwood
      
      * `ns-name` already calls `the-ns` on its argument.
      
      * More code cleanup
      
      * Improved code
      
      * Remove NOCOMMIT
      
      * empty commit to trigger CI
      
      Co-authored-by: default avatarCam Saul <1455846+camsaul@users.noreply.github.com>
      Unverified
      1f4bf25f
    • Aleksandr Lesnenko's avatar
    • Nemanja Glumac's avatar
    • Nemanja Glumac's avatar
      [E2E] Optimize looping logic in tests for dashboard number filters (#24411) · 8a9ea952
      Nemanja Glumac authored
      * Optimize looping logic for GUI and SQL dashboard number filters
      
      - This will connect all filter types at once
      - We're reducing the number of checks for the default filter value to just one
      Unverified
      8a9ea952
    • Nemanja Glumac's avatar
    • Nemanja Glumac's avatar
      [CI] Run backend checks in GHA conditionally (#24397) · d2315351
      Nemanja Glumac authored
      * [CI] Conditionally run backend checks in GHA
      
      Ignore documentation, markdown files and frontend tests.
      
      * Fix indentation
      Unverified
      d2315351
    • Aleksandr Lesnenko's avatar
    • Howon Lee's avatar
      Cutoff instead of null out nested field column set if there are too many (#24336) · 07aaf1fa
      Howon Lee authored
      Pursuant to #23635. Previous behavior was to blank out the nested field columns if there were too many for our arbitrary limit (of 100). However, this silent behavior was very user-unfriendly, and people were perplexed that the feature just seemed to turn off out of nowhere. New, better behavior is to log that there are too many and cut it off at 100, but still have some if there are 100.
      Unverified
      07aaf1fa
    • Alexander Polyankin's avatar
    • Nemanja Glumac's avatar
      [E2E] Optimize looping logic in tests for dashboard date filters (#24262) · 154ff811
      Nemanja Glumac authored
      - This will connect all filter types at once
      - We're reducing the number of checks for the default filter value to just one
      Unverified
      154ff811
    • Gustavo Saiani's avatar
    • Ryan Laurie's avatar
      don't use list icon for fields (#24379) · fa03bc71
      Ryan Laurie authored
      Unverified
      fa03bc71
    • Ryan Laurie's avatar
    • Jeff Bruemmer's avatar
      docs - notes on block permission (#24277) · b10fda24
      Jeff Bruemmer authored
      Unverified
      b10fda24
    • Gustavo Saiani's avatar
    • Gustavo Saiani's avatar
    • Alexander Polyankin's avatar
    • Aleksandr Lesnenko's avatar
      add dummy workflow clones to make originals required (#24361) · e9a4b6b7
      Aleksandr Lesnenko authored
      * add dummy workflow clones to make originals required
      
      * review
      Unverified
      e9a4b6b7
    • Alexander Polyankin's avatar
  2. Jul 27, 2022
    • Ryan Laurie's avatar
      Update default text filter operator (#24328) · a6ac1553
      Ryan Laurie authored
      * only default to contains operator for long text fields
      Unverified
      a6ac1553
    • Ryan Laurie's avatar
    • Ryan Laurie's avatar
      change segment select text (#24368) · feb643be
      Ryan Laurie authored
      Unverified
      feb643be
    • Nick Fitzpatrick's avatar
    • Aleksandr Lesnenko's avatar
      fix calculating labels for stacked charts (#24324) · 7b76a94b
      Aleksandr Lesnenko authored
      * fix calculating labels for stacked charts
      
      * fix related 17205
      
      * update waterfall spec
      Unverified
      7b76a94b
    • dpsutton's avatar
      Downgrade graal to prevent multi-release issues (#24357) · b40e90d7
      dpsutton authored
      The jar worked fine except when trying to add partner jars (exasol,
      starburst, etc)
      
      ```shell
      ❯ java --version
      openjdk 11.0.14.1 2022-02-08
      OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
      OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
      
      /tmp/j via :coffee: v11.0.14.1 on :cloud:  metabase-query
      ❯ jar uf 0.44.0-RC1.jar modules/*.jar
      java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0
      	at java.base/jdk.internal.module.ModuleInfo.invalidModuleDescriptor(ModuleInfo.java:1091)
      	at java.base/jdk.internal.module.ModuleInfo.doRead(ModuleInfo.java:195)
      	at java.base/jdk.internal.module.ModuleInfo.read(ModuleInfo.java:147)
      	at java.base/java.lang.module.ModuleDescriptor.read(ModuleDescriptor.java:2553)
      	at jdk.jartool/sun.tools.jar.Main.addExtendedModuleAttributes(Main.java:2083)
      	at jdk.jartool/sun.tools.jar.Main.update(Main.java:1017)
      	at jdk.jartool/sun.tools.jar.Main.run(Main.java:366)
      	at jdk.jartool/sun.tools.jar.Main.main(Main.java:1680)
      ```
      
      The 22.1.0 graal/js requires a similarly versioned graal/truffle which
      is multi-release but includes on versions/11 class files. The upgraded
      one includes versions/17 and since our uberjar is not multi-release,
      when running it on java 11 it rejects handling class version 61.0 (java
      17) files. If the uberjar were multi-release it would know to select the
      versions it wanted.
      Unverified
      b40e90d7
    • Jeff Bruemmer's avatar
      update API docs (#24348) · a897ec32
      Jeff Bruemmer authored
      Unverified
      a897ec32
    • Alexander Polyankin's avatar
    • Ngoc Khuat's avatar
      Reverts param in snippets (#24298) · 995cc80b
      Ngoc Khuat authored
      
      * revert #23658
      
      * keep the migration to add native_query_snippet.template_tag, add a new migration to drop it
      
      * Remove snippet parameter support in fully parametrized check
      
      Co-authored-by: default avatarTamás Benkő <tamas@metabase.com>
      Unverified
      995cc80b
    • dpsutton's avatar
      Ensure uploaded secrets are stable (#24325) · 73599275
      dpsutton authored
      * Ensure uploaded secrets are stable
      
      Fixes: https://github.com/metabase/metabase/issues/23034
      
      Background:
      Uploaded secrets are stored as bytes in our application db since cloud
      doesn't have a filesystem. To make db connections we stuff them into
      temporary files and use those files.
      
      We also are constantly watching for db detail changes so we can
      recompose the connection pool. Each time you call
      `db->pooled-connection-spec` we check if the hash of the connection spec
      has changed and recompose the pool if it has.
      
      Problem:
      These uploaded files have temporary files and we make new temp files
      each time we call `db->pooled-connection-spec`. So the hashes always
      appear different:
      
      ```clojure
      connection=> (= x y)
      true
      connection=> (take 2
                         (clojure.data/diff (connection-details->spec :postgres (:details x))
                                            (connection-details->spec :postgres (:details y))))
      ({:sslkey
        #object[java.io.File 0x141b0f09 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_1388256635324085910.tmp"],
        :sslrootcert
        #object[java.io.File 0x6f443fac "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_9248342447139746747.tmp"],
        :sslcert
        #object[java.io.File 0xbb13300 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_17076432929457451876.tmp"]}
       {:sslkey
        #object[java.io.File 0x6fbb3b7b "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_18336254363340056265.tmp"],
        :sslrootcert
        #object[java.io.File 0x6ba4c390 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_11775804023700307206.tmp"],
        :sslcert
        #object[java.io.File 0x320184a0
        "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_10098480793225259237.tmp"]})
      ```
      
      And this is quite a problem: each time we get a db connection we are
      making a new file, putting the contents of the secret in it, and then
      considering the pool stale, recomposing it, starting our query. And if
      you are on a dashboard, each card will kill the pool of the previously
      running cards.
      
      This behavior does not happen with the local-file path because the
      secret is actually the filepath and we load that. So the file returned
      is always the same. It's only for the uploaded bits that we dump into a
      temp file (each time).
      
      Solution:
      Let's memoize the temp file created by the secret. We cannot use the
      secret as the key though because the secret can (always?) includes a
      byte array:
      
      ```clojure
      connection-test=> (hash {:x (.getBytes "hi")})
      1771366777
      connection-test=> (hash {:x (.getBytes "hi")})
      -709002180
      ```
      
      So we need to come up with a stable key. I'm using `value->string` here,
      falling back to `(gensym)` because `value->string` doesn't always return
      a value due to its cond.
      
      ```clojure
      (defn value->string
        "Returns the value of the given `secret` as a String.  `secret` can be a Secret model object, or a
        secret-map (i.e. return value from `db-details-prop->secret-map`)."
        {:added "0.42.0"}
        ^String [{:keys [value] :as _secret}]
        (cond (string? value)
              value
              (bytes? value)
              (String. ^bytes value StandardCharsets/UTF_8)))
      ```
      
      Why did this bug come up recently?
      [pull/21604](https://github.com/metabase/metabase/pull/21604) gives some
      light. That changed
      `(hash details)` -> `(hash (connection-details->spec driver details))`
      
      with the message
      
      > also made some tweaks so the SQL JDBC driver connection pool cache is
      > invalidated when the (unpooled) JDBC spec returned by
      > connection-details->spec changes, rather than when the details map
      > itself changes. This means that changes to other things outside of
      > connection details that affect the JDBC connection parameters, for
      > example the report-timezone or start-of-week Settings will now properly
      > result in the connection pool cache being flushed
      
      So we want to continue to hash the db spec but ensure that the spec is
      stable.
      
      * typehint the memoized var with ^java.io.File
      
      * Switch memoization key from String to vector of bytes
      
      Copying comment from Github:
      
      When you upload a sequence of bytes as a secret, we want to put them in
      a file once and only once and always reuse that temporary file. We will
      eventually hash the whole connection spec but I don't care about
      collisions there. It's only given the same sequence of bytes, you should
      always get back the exact same temporary file that has been created.
      
      So i'm making a function `f: Secret -> file` that given the same Secret
      always returns the exact same file. This was not the case before
      this. Each uploaded secret would return a new temporary file with the
      contents of the secret each time you got its value. So you would end up
      with 35 temporary files each with the same key in it.
      
      An easy way to get this guarantee is to memoize the function. But the
      secret itself isn't a good key to memoize against because it contains a
      byte array.
      
      If the memoization key is the byte-array itself, this will fail because
      arrays have reference identity:
      
      ```clojure
      user=> (= (.getBytes "hi") (.getBytes "hi"))
      false
      ```
      
      So each time we load the same secret from the database we get a new byte
      array, ask for its temp file and get a different temp file each time.
      
      This means that memoization cannot be driven off of the byte array. But
      one way to gain this back quickly is just stuff those bytes into a
      string, because strings compare on value not identity. This is what is
      currently in the PR (before this change). I was banking on the
      assumption that Strings are just opaque sequences of bytes that will
      compare byte by byte, regardless of whether those bytes make sense.
      
      But you've pointed out a good point that maybe that is flirting with
      undefined behavior. If we use the hash of the contents of the byte array
      as the memoization key with (j.u.Arrays/hashCode array), then we open
      ourselves up the (albeit rare) case that two distinct secret values hash
      to the same value. This sounds really bad. Two distinct secrets (think
      two ssh keys) but both would map to only one file containing a single
      ssh key.
      
      An easy way to have the value semantics we want for the memoization is
      just to call (vec array) on the byte array and use this sequence of
      bytes as the memoization key. Clojure vectors compare by value not
      reference. So two secrets would return the same file if and only if the
      sequence of bytes are identical, in which case we would expect the files
      to be identical. This gives me the same guarantee that I was wanting
      from the String behavior I used initially without entwining this with
      charsets, utf8, etc.
      Unverified
      73599275
    • Alexander Polyankin's avatar
    • Ryan Laurie's avatar
      adjust modal padding and width (#24322) · 9a9c70ab
      Ryan Laurie authored
      Unverified
      9a9c70ab
    • Ngoc Khuat's avatar
      Hot fix for the some enterprise api docs (#24295) · 6e1eda9b
      Ngoc Khuat authored
      * hot fix for the wrong api-documentation
      
      * resolve the reflection warning
      Unverified
      6e1eda9b
    • Natalie's avatar
      docs - okta (#24279) · aede76c1
      Natalie authored
      Unverified
      aede76c1
    • Aleksandr Lesnenko's avatar
    • Aleksandr Lesnenko's avatar
      automerge backported prs (#24219) · 0a36cbdd
      Aleksandr Lesnenko authored
      Unverified
      0a36cbdd
    • Howon Lee's avatar
      Mysql filters for JSON columns which are heterogeneous (#24214) (#24268) · 1de4c4a4
      Howon Lee authored
      Pursuant to #24214. Previously MySQL JSON spunout fields didn't work when the individual fields had heterogeneous contents because type information was missing, unlike where type information was provided in Postgres JSON fields. Now they are provided, along with a refactor to not use reify which was used in Postgres JSON fields because the type operator was really annoying otherwise to add (MySQL type cast is a function qua function).
      Unverified
      1de4c4a4
  3. Jul 26, 2022
Loading