Commits · 55f7b5476634d5cc192bf13c96bc3a8056055e97 · Engineering Digital Service / Metabase

This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated 32 seconds ago.

Aug 02, 2022

Enforce bigint for all int fields derived from JSON instead of doing more... · 55f7b547

Howon Lee authored 2 years ago

Enforce bigint for all int fields derived from JSON instead of doing more complete refactor (#24494)

Pursuant to #22732. Should basically eliminate it with prejudice by saying that any int should fit bigints in our point of view in postgres.

Unverified

55f7b547

Aug 01, 2022
- Properly format date-time property for snowplow events (#24436) · fb9db3dc
  Ngoc Khuat authored 2 years ago
  
  * correct format datetime property for snowplow * some docs and comments
  Unverified
  
  fb9db3dc
Jul 29, 2022
- Add pluralization support to backend i18n system (#24248) · 8a9f40d8
  Noah Moss authored 2 years ago
  
  Unverified
  
  8a9f40d8
Jul 28, 2022

Substitute locked params on embedded dashboards into text cards on the backend (#24373) · d0e12c4e
Noah Moss authored 2 years ago

Unverified

d0e12c4e

Multi release jar again (#24366) · 1f4bf25f

dpsutton authored 2 years ago

* Try multi-release true again in our manifest

Problem statement:
Luiz packs our partner jars (exasol, starburst, etc.) into our jar so
they can be "first class" and in cloud. But with the 44 cycle we've run
into some issues:

```shell
/tmp/j via  v17.30 on   metabase-query
❯ jar uf 0.44.0-RC1.jar modules/*.jar

❯ java --version
openjdk 11.0.14.1 2022-02-08
OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)

/tmp/j via  v11.0.14.1 on   metabase-query
❯ jar uf 0.44.0-RC1.jar modules/*.jar
java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0
	at java.base/jdk.internal.module.ModuleInfo.invalidModuleDescriptor(ModuleInfo.java:1091)
	at java.base/jdk.internal.module.ModuleInfo.doRead(ModuleInfo.java:195)
	at java.base/jdk.internal.module.ModuleInfo.read(ModuleInfo.java:147)
	at java.base/java.lang.module.ModuleDescriptor.read(ModuleDescriptor.java:2553)
	at jdk.jartool/sun.tools.jar.Main.addExtendedModuleAttributes(Main.java:2083)
	at jdk.jartool/sun.tools.jar.Main.update(Main.java:1017)
	at jdk.jartool/sun.tools.jar.Main.run(Main.java:366)
	at jdk.jartool/sun.tools.jar.Main.main(Main.java:1680)

```

Diogo tracked this down with some great sleuthing to an upgrade in our
graal/js engine from “22.0.0.2” -> “22.1.0". This brought along the
transitive truffle jar (which is the actual engine powering the js
engine). The 22.0.0.2 was technically a multi-release jar but it only
included java 11 sources. The 22.1.0 added java 17 sources in addition
to the java 11.

And this proves fatal to using the `jar` command. When `"Multi-Release"`
is set to true, it knows to only look at versions it will need. Lacking
this, it looks at all of the classes and the class version for 17 is
61.0 is higher than it knows how to understand and it breaks.

Obvious Solution:
Set Multi-Release to true. We have done this in the past. On startup we
have a message logged:

> WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
> will impact performance.

And setting multi-release can remove this. But when we did that we ended
up with:
- https://github.com/metabase/metabase/issues/16380
- https://github.com/metabase/metabase/pull/17027

That issue describes slowdowns of queries on the order of 0.6 seconds ->
1.3 seconds. Almost doubling. People reported dashboards timing
out. Jeff tracked this down to

> Profiling revealed that the calls to Log4jLoggerFactory.getLogger
> became much slower between the two versions. See attached screenshots.

And this is a pernicious problem that we cannot easily test for.

Lets try again:
I've set multi-release to true and built a jar with `bin/build`. I
immediately ran into problems:

```shell
❯ MB_DB_CONNECTION_URI="postgres://user:pass@localhost:5432/compare

" MB_JETTY_PORT=3007 java "$(socket-repl 6007)" -jar multi-release-local.jar
Warning: protocol #'java-time.core/Amount is overwriting function abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time.core, being replaced by: #'java-time.core/abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time, being replaced by: #'java-time/abs
Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home
Exception in thread "main" java.lang.Error: Circular loading of installed providers detected
	at java.base/java.nio.file.spi.FileSystemProvider.installedProviders(FileSystemProvider.java:198)
	at java.base/java.nio.file.Path.of(Path.java:206)
	at java.base/java.nio.file.Paths.get(Paths.java:98)
	at org.apache.logging.log4j.core.util.Source.toFile(Source.java:55)
	at org.apache.logging.log4j.core.util.Source.<init>(Source.java:142)
	at
        org.apache.logging.log4j.core.config.ConfigurationSource.<init>(ConfigurationSource.java:139)
```

So hazarded a guess that a bump in the log4j would solve this. And it
does solve it.

Then profiling some queries against bigquery (just viewing the table) in
the RC2 and the locally built version with the multi-release:

```shell
-- multi-release
2022-07-27 12:28:00,659 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.1 s
2022-07-27 12:28:02,609 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 897.9 ms
2022-07-27 12:28:03,950 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 778.1 ms

-- RC non-multi-release
2022-07-27 12:28:57,633 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.0 s
2022-07-27 12:28:59,343 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 912.9 ms
2022-07-27 12:29:02,328 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 808.6 ms
```
So times seem very similar.

============

Proper benching:

using criterium

```shell
MB_JETTY_PORT=3008 java "$(socket-repl 6008)" -cp "/Users/dan/.m2/repository/criterium/criterium/0.4.6/criterium-0.4.6.jar":0.39.2.jar metabase.core
```

`(bench (log/warn "benching"))`

Summary:
39.2:          21.109470 µs
RC2:           4.975204 µs
multi-release: 7.673965 µs

These flood the consoles with logs

```
Older release: 39.2

user=> (bench (log/warn "benching"))
Evaluation count : 2886240 in 60 samples of 48104 calls.
             Execution time mean : 21.109470 µs
    Execution time std-deviation : 567.271917 ns
   Execution time lower quantile : 20.171870 µs ( 2.5%)
   Execution time upper quantile : 22.429557 µs (97.5%)
                   Overhead used : 6.835913 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 4 (6.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 14.1886 % Variance is moderately inflated by outliers

=============================================

RC2:

user=> (bench (log/warn "benching"))Evaluation count : 12396420 in 60 samples of 206607 calls.
             Execution time mean : 4.975204 µs
    Execution time std-deviation : 521.769687 ns
   Execution time lower quantile : 4.711607 µs ( 2.5%)
   Execution time upper quantile : 6.404317 µs (97.5%)
                   Overhead used : 6.837290 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 2 (3.3333 %)
	low-mild	 3 (5.0000 %)
 Variance from outliers : 72.0600 % Variance is severely inflated by outliers

=============================================

Proposed Multi-Release

user=> (bench (log/warn "benching"))
Evaluation count : 7551000 in 60 samples of 125850 calls.
             Execution time mean : 7.673965 µs
    Execution time std-deviation : 201.155749 ns
   Execution time lower quantile : 7.414837 µs ( 2.5%)
   Execution time upper quantile : 8.138010 µs (97.5%)
                   Overhead used : 6.843981 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 14.1472 % Variance is moderately inflated by outliers

```

`(bench (log/info "benching info"))`

This does not hit a console so is a no-op.

Summary:
39.2:          11.534614 µs
RC2:           98.408357 ns
multi-release: 2.236756 µs

```
=============================================
39.2:

user=> (bench (log/info "benching info"))
Evaluation count : 5223480 in 60 samples of 87058 calls.
             Execution time mean : 11.534614 µs
    Execution time std-deviation : 57.756163 ns
   Execution time lower quantile : 11.461502 µs ( 2.5%)
   Execution time upper quantile : 11.657644 µs (97.5%)
                   Overhead used : 6.835913 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 2 (3.3333 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

=============================================

RC2:

user=> (bench (log/info "benching info"))Evaluation count : 574427220 in 60 samples of 9573787 calls.
             Execution time mean : 98.408357 ns
    pExecution time std-deviation : 1.792214 ns
   Execution time lower quantile : 96.891477 ns ( 2.5%)
   Execution time upper quantile : 103.394664 ns (97.5%)
                   Overhead used : 6.837290 ns

Found 8 outliers in 60 samples (13.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 5 (8.3333 %)
 Variance from outliers : 7.7881 % Variance is slightly inflated by outliers

=============================================

Multi-release:

user=> (bench (log/info "benching info"))Evaluation count : 26477700 in 60 samples of 441295 calls.
             Execution time mean : 2.236756 µs
    Execution time std-deviation : 15.412356 ns
   Execution time lower quantile : 2.212301 µs ( 2.5%)
   Execution time upper quantile : 2.275434 µs (97.5%)
                   Overhead used : 6.843981 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
 ```

* bump graal/js

* Custom MB log factory (#24369)

* Custom MB log factory

* Write stupid code to appease stupid Eastwood

* `ns-name` already calls `the-ns` on its argument.

* More code cleanup

* Improved code

* Remove NOCOMMIT

* empty commit to trigger CI

Co-authored-by: Cam Saul <1455846+camsaul@users.noreply.github.com>

Unverified

1f4bf25f

Cutoff instead of null out nested field column set if there are too many (#24336) · 07aaf1fa

Howon Lee authored 2 years ago

Pursuant to #23635. Previous behavior was to blank out the nested field columns if there were too many for our arbitrary limit (of 100). However, this silent behavior was very user-unfriendly, and people were perplexed that the feature just seemed to turn off out of nowhere. New, better behavior is to log that there are too many and cut it off at 100, but still have some if there are 100.

Unverified

07aaf1fa

Jul 27, 2022

Reverts param in snippets (#24298) · 995cc80b

Ngoc Khuat authored 2 years ago


* revert #23658

* keep the migration to add native_query_snippet.template_tag, add a new migration to drop it

* Remove snippet parameter support in fully parametrized check

Co-authored-by: Tamás Benkő <tamas@metabase.com>

Unverified

995cc80b

Ensure uploaded secrets are stable (#24325) · 73599275

dpsutton authored 2 years ago

* Ensure uploaded secrets are stable

Fixes: https://github.com/metabase/metabase/issues/23034

Background:
Uploaded secrets are stored as bytes in our application db since cloud
doesn't have a filesystem. To make db connections we stuff them into
temporary files and use those files.

We also are constantly watching for db detail changes so we can
recompose the connection pool. Each time you call
`db->pooled-connection-spec` we check if the hash of the connection spec
has changed and recompose the pool if it has.

Problem:
These uploaded files have temporary files and we make new temp files
each time we call `db->pooled-connection-spec`. So the hashes always
appear different:

```clojure
connection=> (= x y)
true
connection=> (take 2
                   (clojure.data/diff (connection-details->spec :postgres (:details x))
                                      (connection-details->spec :postgres (:details y))))
({:sslkey
  #object[java.io.File 0x141b0f09 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_1388256635324085910.tmp"],
  :sslrootcert
  #object[java.io.File 0x6f443fac "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_9248342447139746747.tmp"],
  :sslcert
  #object[java.io.File 0xbb13300 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_17076432929457451876.tmp"]}
 {:sslkey
  #object[java.io.File 0x6fbb3b7b "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_18336254363340056265.tmp"],
  :sslrootcert
  #object[java.io.File 0x6ba4c390 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_11775804023700307206.tmp"],
  :sslcert
  #object[java.io.File 0x320184a0
  "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_10098480793225259237.tmp"]})
```

And this is quite a problem: each time we get a db connection we are
making a new file, putting the contents of the secret in it, and then
considering the pool stale, recomposing it, starting our query. And if
you are on a dashboard, each card will kill the pool of the previously
running cards.

This behavior does not happen with the local-file path because the
secret is actually the filepath and we load that. So the file returned
is always the same. It's only for the uploaded bits that we dump into a
temp file (each time).

Solution:
Let's memoize the temp file created by the secret. We cannot use the
secret as the key though because the secret can (always?) includes a
byte array:

```clojure
connection-test=> (hash {:x (.getBytes "hi")})
1771366777
connection-test=> (hash {:x (.getBytes "hi")})
-709002180
```

So we need to come up with a stable key. I'm using `value->string` here,
falling back to `(gensym)` because `value->string` doesn't always return
a value due to its cond.

```clojure
(defn value->string
  "Returns the value of the given `secret` as a String.  `secret` can be a Secret model object, or a
  secret-map (i.e. return value from `db-details-prop->secret-map`)."
  {:added "0.42.0"}
  ^String [{:keys [value] :as _secret}]
  (cond (string? value)
        value
        (bytes? value)
        (String. ^bytes value StandardCharsets/UTF_8)))
```

Why did this bug come up recently?
[pull/21604](https://github.com/metabase/metabase/pull/21604) gives some
light. That changed
`(hash details)` -> `(hash (connection-details->spec driver details))`

with the message

> also made some tweaks so the SQL JDBC driver connection pool cache is
> invalidated when the (unpooled) JDBC spec returned by
> connection-details->spec changes, rather than when the details map
> itself changes. This means that changes to other things outside of
> connection details that affect the JDBC connection parameters, for
> example the report-timezone or start-of-week Settings will now properly
> result in the connection pool cache being flushed

So we want to continue to hash the db spec but ensure that the spec is
stable.

* typehint the memoized var with ^java.io.File

* Switch memoization key from String to vector of bytes

Copying comment from Github:

When you upload a sequence of bytes as a secret, we want to put them in
a file once and only once and always reuse that temporary file. We will
eventually hash the whole connection spec but I don't care about
collisions there. It's only given the same sequence of bytes, you should
always get back the exact same temporary file that has been created.

So i'm making a function `f: Secret -> file` that given the same Secret
always returns the exact same file. This was not the case before
this. Each uploaded secret would return a new temporary file with the
contents of the secret each time you got its value. So you would end up
with 35 temporary files each with the same key in it.

An easy way to get this guarantee is to memoize the function. But the
secret itself isn't a good key to memoize against because it contains a
byte array.

If the memoization key is the byte-array itself, this will fail because
arrays have reference identity:

```clojure
user=> (= (.getBytes "hi") (.getBytes "hi"))
false
```

So each time we load the same secret from the database we get a new byte
array, ask for its temp file and get a different temp file each time.

This means that memoization cannot be driven off of the byte array. But
one way to gain this back quickly is just stuff those bytes into a
string, because strings compare on value not identity. This is what is
currently in the PR (before this change). I was banking on the
assumption that Strings are just opaque sequences of bytes that will
compare byte by byte, regardless of whether those bytes make sense.

But you've pointed out a good point that maybe that is flirting with
undefined behavior. If we use the hash of the contents of the byte array
as the memoization key with (j.u.Arrays/hashCode array), then we open
ourselves up the (albeit rare) case that two distinct secret values hash
to the same value. This sounds really bad. Two distinct secrets (think
two ssh keys) but both would map to only one file containing a single
ssh key.

An easy way to have the value semantics we want for the memoization is
just to call (vec array) on the byte array and use this sequence of
bytes as the memoization key. Clojure vectors compare by value not
reference. So two secrets would return the same file if and only if the
sequence of bytes are identical, in which case we would expect the files
to be identical. This gives me the same guarantee that I was wanting
from the String behavior I used initially without entwining this with
charsets, utf8, etc.

Unverified

73599275

Mysql filters for JSON columns which are heterogeneous (#24214) (#24268) · 1de4c4a4

Howon Lee authored 2 years ago

Pursuant to #24214. Previously MySQL JSON spunout fields didn't work when the individual fields had heterogeneous contents because type information was missing, unlike where type information was provided in Postgres JSON fields. Now they are provided, along with a refactor to not use reify which was used in Postgres JSON fields because the type operator was really annoying otherwise to add (MySQL type cast is a function qua function).

Unverified

1de4c4a4

Jul 26, 2022
- Preserve false as default value in case clauses (#24274) · 5cf8bf83
  metamben authored 2 years ago
  
  This change makes sure that false as the default value in a case statement is not ignored.
  Unverified
  
  5cf8bf83
- Deflake the Serdes v2 e2e test by ensuring generated inputs are unique (#24320) · 5ca09b8f
  Braden Shepherdson authored 2 years ago
  
  Unverified
  
  5ca09b8f
- Remove zombie `Dependency` model code (#24287) · e734526e
  Cam Saul authored 2 years ago
  
  * Remove the `Dependency` model code * Appease the namespace linter again.
  Unverified
  
  e734526e
Jul 25, 2022

Add a check to PUT /user/:id to disallow name edits if an SSO user (#23752) · 3795b56c

adam-james authored 2 years ago

* Add a check to PUT /user/:id to disallow name edits if an SSO user

* Clean up After SAML SSO tests

The `:sso_source` key is set for the Rasta user in some SAML tests, but is expeted to be nil in subsequent tests, so
we clean up in the SAML test ns.

* Add a test to ensure SSO user names can't be changed via API

* Missed a change I had made while adjusting tests

* valid-name-update? take 1 name, to allow better error messages

Let's the user know that first or last name is the cause of a problem, rather than just 'names'.

* Remove unneeded thread macro

* Use partial=

* slight change to local fn for reusability

Unverified

3795b56c

Ignore fields in joins when hoisting joined fields (#24167) · 0dc38ac3

dpsutton authored 2 years ago

* Ignore fields in joins when hoisting joined fields

Need to understand two sides of mbql to understand this.

```clojure
(defn nest-expressions
  "Pushes the `:source-table`/`:source-query`, `:expressions`, and `:joins` in the top-level of the query into a
  `:source-query` and updates `:expression` references and `:field` clauses with `:join-alias`es accordingly. See
  tests for examples. This is used by the SQL QP to make sure expressions happen in a subselect."
  ...)
```

Make a query against the orders table joined to the people table and
select order id, order total, and the person's email:

```sql
 SELECT "PUBLIC"."orders"."id"    AS "ID",
       "PUBLIC"."orders"."total" AS "TOTAL",
       "People - User"."email"   AS "People - User__EMAIL"
FROM   "PUBLIC"."orders"
       LEFT JOIN "PUBLIC"."people" "People - User"
              ON "PUBLIC"."orders"."user_id" = "People - User"."id"
LIMIT  1048575
```

Now add a custom column called adjective, which is 'expensive' when
total > 100 else 'cheap'
```sql
 SELECT "source"."id"                   AS "ID",
       "source"."total"                AS "TOTAL",
       "source"."adjective"            AS "adjective",
       "source"."people - user__email" AS "People - User__EMAIL"
FROM   (SELECT "PUBLIC"."orders"."id"         AS "ID",
               "PUBLIC"."orders"."user_id"    AS "USER_ID",
               "PUBLIC"."orders"."product_id" AS "PRODUCT_ID",
               "PUBLIC"."orders"."subtotal"   AS "SUBTOTAL",
               "PUBLIC"."orders"."tax"        AS "TAX",
               "PUBLIC"."orders"."total"      AS "TOTAL",
               "PUBLIC"."orders"."discount"   AS "DISCOUNT",
               "PUBLIC"."orders"."created_at" AS "CREATED_AT",
               "PUBLIC"."orders"."quantity"   AS "QUANTITY",
               CASE
                 WHEN "PUBLIC"."orders"."total" > 50 THEN 'expensive'
                 ELSE 'cheap'
               end                            AS "adjective",
               "People - User"."email"        AS "People - User__EMAIL",
               "People - User"."id"           AS "People - User__ID"
        FROM   "PUBLIC"."orders"
               LEFT JOIN "PUBLIC"."people" "People - User"
                      ON "PUBLIC"."orders"."user_id" = "People - User"."id")
       "source"
LIMIT  1048575
```

We put the "work" in a nested query and then just update the outer
select to select `source.total` instead of `orders.total`. But the way
this figured out which joins to hoist up was a bit broken. It walked all
over the inner query, finding fields it was interested in. However, it
also got fields it shouldn't have, by descending into join information
that should be opaque at this level.

In the repro for this, we make a card Q1, selecting product_id and
count, but with an implicit join to products and filtering where
category = doohickey.

```clojure
;; card Q1
{:type :query,
 :query {:source-table 4,     #_ reviews
         :filter [:= [:field 26 {:source-field 33}] "Doohickey"], #_ products.category
         :aggregation [[:count]],
         :breakout [[:field 33 nil]],  #_ reviews.product_id
         :limit 2},
 :database 1}
 ```

A second question Q2 queries the Orders table and joins to Q1 on
orders.product_id = q1.product_id, and adds a custom column `+ 1 1`.
```
;; card Q2, based on Q1
{:source-table 2,
 :expressions {"CC" [:+ 1 1]},
 :fields [[:field 9 nil]
          [:field 3 nil]
          [:field 5 nil]
          [:field 6 nil]
          [:field 8 nil]
          [:field 7 nil]
          [:field 1 nil]
          [:field 4 {:temporal-unit :default}]
          [:field 2 nil]
          [:expression
           "CC"
           {:metabase.query-processor.util.add-alias-info/desired-alias "CC",
            :metabase.query-processor.util.add-alias-info/position 9}]
          [:field 33 nil]
          [:field "count" {:base-type :type/BigInteger}]],
 :joins [{:alias "Question 4918",
          :strategy :left-join,
          :fields [[:field 33 nil]
                   [:field "count" {:base-type :type/BigInteger}]],
          :condition [:=
                      [:field 5 nil]
                      [:field 33 nil]],
          :source-card-id 4918,
          }]}
```

and this should yield the sql:

```sql
SELECT "source"."ID" AS "ID",
       "source"."PRODUCT_ID" AS "PRODUCT_ID",
       "source"."TOTAL" AS "TOTAL",
       "source"."CC" AS "CC",
       "source"."Question 4918__PRODUCT_ID" AS "Question 4918__PRODUCT_ID",
       "source"."Question 4918__count" AS "Question 4918__count"
FROM (
  SELECT "PUBLIC"."ORDERS"."ID" AS "ID",
         "PUBLIC"."ORDERS"."USER_ID" AS "USER_ID",
         "PUBLIC"."ORDERS"."PRODUCT_ID" AS "PRODUCT_ID",
         "PUBLIC"."ORDERS"."SUBTOTAL" AS "SUBTOTAL",
         "PUBLIC"."ORDERS"."TAX" AS "TAX",
         "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL",
         "PUBLIC"."ORDERS"."DISCOUNT" AS "DISCOUNT",
         "PUBLIC"."ORDERS"."CREATED_AT" AS "CREATED_AT",
         "PUBLIC"."ORDERS"."QUANTITY" AS "QUANTITY",
         (1 + 1) AS "CC",
         "Question 4918"."PRODUCT_ID" AS "Question 4918__PRODUCT_ID",
         "Question 4918"."count" AS "Question 4918__count"
  FROM "PUBLIC"."ORDERS"
  LEFT JOIN (
    SELECT "PUBLIC"."REVIEWS"."PRODUCT_ID" AS "PRODUCT_ID",
           count(*) AS "count"
    FROM "PUBLIC"."REVIEWS"
    LEFT JOIN "PUBLIC"."PRODUCTS" "PRODUCTS__via__PRODUCT_ID"
      ON "PUBLIC"."REVIEWS"."PRODUCT_ID" = "PRODUCTS__via__PRODUCT_ID"."ID"
    WHERE "PRODUCTS__via__PRODUCT_ID"."CATEGORY" = 'Doohickey'
    GROUP BY "PUBLIC"."REVIEWS"."PRODUCT_ID"
    ORDER BY "PUBLIC"."REVIEWS"."PRODUCT_ID" ASC) "Question 4918"
  ON "PUBLIC"."ORDERS"."PRODUCT_ID" = "Question 4918"."PRODUCT_ID") "source"
LIMIT 1048575
```

But when it was looking through to see which fields to hoist top level,
it was also finding `"PRODUCTS__via__PRODUCT_ID"."ID"` and
`"PRODUCTS__via__PRODUCT_ID"."CATEGORY"` as fields it should hoist up
because it searched the whole tree underneath it which includes join
conditions and such.

But of course that doesn't matter, the only thing is really analyzing
what fields come out of a query, and those fields do not come out of the
nested query

```sql
    SELECT "PUBLIC"."REVIEWS"."PRODUCT_ID" AS "PRODUCT_ID",
           count(*) AS "count"
    FROM "PUBLIC"."REVIEWS"
    LEFT JOIN "PUBLIC"."PRODUCTS" "PRODUCTS__via__PRODUCT_ID"
      ON "PUBLIC"."REVIEWS"."PRODUCT_ID" = "PRODUCTS__via__PRODUCT_ID"."ID"
    WHERE "PRODUCTS__via__PRODUCT_ID"."CATEGORY" = 'Doohickey'
    GROUP BY "PUBLIC"."REVIEWS"."PRODUCT_ID"
    ORDER BY "PUBLIC"."REVIEWS"."PRODUCT_ID" ASC) "Question 4918"
```

that only returns product_id and count.

And this matches the error it was (helpfully) throwing:

```clojure
investigation=> (qp/compile nested-query)
Execution error (ExceptionInfo) at metabase.query-processor.util.add-alias-info/field-source-table-alias (add_alias_info.clj:182).
Cannot determine the source table or query for Field clause
[:field
 26 #_ products.category
 {:source-field 33, #_reviews.product_id
  :join-alias "PRODUCTS__via__PRODUCT_ID",
  :metabase.query-processor.util.add-alias-info/source-table "PRODUCTS__via__PRODUCT_ID",
  :metabase.query-processor.util.add-alias-info/source-alias "CATEGORY"}]
```

And here's the query it was analyzing when determining which fields it
needed to hoist (looking for ones with `:join-alias` (which is also in
the nest_query_test.clj file):

```clojure
;; removed some cruft and extra field information for brevity
{:source-table 2,
 :expressions  {"CC" [:+ 1 1]},
 :fields       [[:field 33 {:join-alias "Question 4918",}]
                [:field "count" {:join-alias "Question 4918"}]]
 :joins        [{:alias           "Question 4918",
                 :strategy        :left-join,
                 :fields          [[:field 33 {:join-alias "Question 4918"}]
                                   [:field
                                    "count"
                                    {:join-alias "Question 4918"}]]
                 :condition       [:=
                                   [:field 5 nil]
                                   [:field 33 {:join-alias "Question 4918",}]],
                 :source-card-id  4918,
                 :source-query    {:source-table 4,
                                   ;; nested query has filter values with join-alias that should not
                                   ;; be selected
                                   :filter       [:=
                                                  [:field 26 {:join-alias "PRODUCTS__via__PRODUCT_ID"}]
                                                  [:value "Doohickey" {}]],
                                   :aggregation  [[:aggregation-options
                                                   [:count]
                                                   {:name "count"}]],
                                   :breakout     [[:field 33 nil]],
                                   :limit        2,
                                   :order-by     [[:asc
                                                   [:field 33 nil]]],
                                   ;; nested query has an implicit join with conditions that should
                                   ;; not be selected
                                   :joins        [{:alias        "PRODUCTS__via__PRODUCT_ID",
                                                   :strategy     :left-join,
                                                   :condition    [:=
                                                                  [:field 33 nil]
                                                                  [:field
                                                                   30
                                                                   {:join-alias "PRODUCTS__via__PRODUCT_ID"}]]
                                                   :source-table 1,
                                                   :fk-field-id  33}]},
                 :source-metadata [{:field_ref [:field 33 nil]}
                                   {:field_ref [:aggregation 0]}]}]}
```

* Add round-trip test through qp

This test is essentially the repro from the ticket
https://github.com/metabase/metabase/issues/20809

```clojure
debug-qp=> (to-mbql-shorthand (:dataset_query (metabase.models/Card 4919)))
(mt/mbql-query
 orders
 {:joins [{:source-table "card__4918",
           :alias "Question 4918",
           :condition [:=
                       $product_id
                       [:field
                        %reviews.product_id
                        {:join-alias "Question 4918"}]],
           :fields :all}],
  :expressions {"CC" [:+ 1 1]}})
debug-qp=> (to-mbql-shorthand (:dataset_query (metabase.models/Card 4918)))
(mt/mbql-query
 reviews
 {:breakout [$product_id],
  :aggregation [[:count]],
  :filter [:= $product_id->products.category "Doohickey"]})
```

Thanks to @cam for providing such a lovely tool

Unverified

0dc38ac3

Jul 23, 2022
- Enable private key authentication for Snowflake (#24157) · 5a4fdcca
  metamben authored 2 years ago
  
  Unverified
  
  5a4fdcca
Jul 22, 2022
- Handle snippets when determining if a query is fully parametrized (#24069) · cb423767
  metamben authored 2 years ago
  
  Unverified
  
  cb423767
- Serdes v2 for Pulse, PulseCard, PulseChannel (#24200) · 451517e4
  Braden Shepherdson authored 2 years ago
  
  `PulseChannelRecipients` are a `:recipients` field on `PulseChannels`. This requires some careful handling when inserting or updating a `PulseChannel` to upsert the recipients.
  Unverified
  
  451517e4
Jul 21, 2022

revert card param validation changes (#24190) · 33adb2af
Noah Moss authored 2 years ago

Unverified

33adb2af
Add entity_id columns to PulseCard, PulseChannel (#24176) · 8656a74f
Braden Shepherdson authored 2 years ago

Unverified

8656a74f

Fix for #24132: Search returns irrelevant matches for non-native questions (#24133) · 80e3f47e

Cal Herries authored 2 years ago

* Update search query to not match search string tokens with dataset_query column if the question is not native

* Add test

* Rewrite test

* Rename query-card -> mbql-card

* Use the correct dataset_query map

Unverified

80e3f47e

Fix incorrect scoring of search results (#23984) · dd614a18

Cal Herries authored 2 years ago

* Correct `score-and-result` to count the weight of the text score when there is no text match

* Replace nil-punning of scores with explicit checking for zero scores at each step

* Update scoring functions that can return nil to return zero instead

* Address archived search with no search string

* Update e2e test

* Update e2e test with new ordering of results

* Fix dataset search with blank query string

Unverified

dd614a18

Serdes v2 for Segments (#24146) · 82d451e9
Braden Shepherdson authored 2 years ago

Unverified

82d451e9

Jul 20, 2022

Include markdown parameters in dashboard subscriptions (#24070) · b30e2e53

Noah Moss authored 2 years ago

* basic logic for including params in dashboard subscriptions

* BE date formatting

* redo approach to date formatting locale

* refactor and use the same formatting code for parameters listed in subscription header

* fix tests and lint

* fix more tests

* fix final (?) test and fix namespace lint error

* a couple extra test cases

* fix tests

* new cypress test

* address braden and ngoc's comments

* fix tests

* guard against nil text in substitute_tags

Unverified

b30e2e53

Add has_more_values to chain filter endpoints (#23985) · e5cd10fc

Ngoc Khuat authored 2 years ago


* fixing tests and make chain-filter dashboard endpoints returns the flag

* remove using anonymous function in update

* Update dashboard endpoints and `FieldValuesWidget` to handle `has_more_values` scenario (#24022)

* update dashboard action/reducer

* Update FieldValuesWidget

* Update isSearchable to accept map of args

* Add unit tests for various FVW utils

* Split isSearchable logic a little

Co-authored-by: Dalton <daltojohnso@users.noreply.github.com>

Unverified

e5cd10fc

Serdes v2 for Timeline and TimelineEvent (#24122) · d721eada
Braden Shepherdson authored 2 years ago

Unverified

d721eada
Remove `u/key-by` (#24131) · 36107b6e
Cam Saul authored 2 years ago
```
* Remove `u/key-by`

* Fix typo

* SORT THE NAMESPACES
```
Unverified

36107b6e

Jul 18, 2022

Style cleanup and refactoring of serdes v2 (#23978) · 122d0267
Braden Shepherdson authored 2 years ago
```
Cam suggested most of these changes in another PR, and I'm tackling them
here.
```
Unverified

122d0267

Move ddl namespaces under driver package (#23974) · d565fbf5

Case Nelson authored 2 years ago

Fixes #23938

metabase.driver.ddl.postgres should be named metabase.driver.postgres.ddl
metabase.driver.ddl.mysql    should be named metabase.driver.mysql.ddl
metabase.driver.ddl.sql      should be named metabase.driver.sql.ddl

Renamed aliases to match new structure

Unverified

d565fbf5

fix xlsx month bucketing regression (#24026) · 443d4616
Noah Moss authored 2 years ago

Unverified

443d4616

Add snowplow tracking for new records on TaskHistory (#23893) · 8c66c473

Ngoc Khuat authored 2 years ago

* add snowplow tracking for new records on TaskHistory

* more tests and have a better way to figure out db_id, db_engine

Unverified

8c66c473

Jul 15, 2022

Retain source alias for aggregates (#23771) · 6858dcef
metamben authored 2 years ago

Unverified

6858dcef

Check email test errors for javax.mail.AuthenticationFailedException (#23921) · 52336b97

dpsutton authored 2 years ago

* Check email test errors for javax.mail.AuthenticationFailedException

Authing with Office365 led to a less-than-great user experience. If you
type in the wrong username or password, we would display an unhelpful
message:

> Sorry, something went wrong. Please try again. Error::Exception
reading response.Read timed out.

This is an error from an inner nested exception which looks like:
```clojure
(javax.mail.AuthenticationFailedException.
 "" ;; Office365 returns auth exception with no message so we only saw "Read timed out" prior
 (javax.mail.MessagingException.
  "Exception reading response"
  (java.net.SocketTimeoutException. "Read timed out")))
```

We didn't get any message from the AuthenticationFailedException so our
humanization strategy only went on the underlying "Read timed out". Now
we can check against the error class and use that information.

* Little bit of clarity

better function name (`check` -> `match-error`), better arg name (
`regex|exception-class` -> `regex-or-exception-class`), full argument
names `m` -> `message` and `es` -> `exceptions`

Unverified

52336b97

Handle email password obfuscation (#23925) · 5e49508d

dpsutton authored 2 years ago

The frontend receives obfuscated values for settings marked sensitive:

```clojure
(defsetting email-smtp-password
  (deferred-tru "SMTP password.")
  :sensitive? true)
```

and over the wire:
```javascript
    {
        "key": "email-smtp-password",
        "value": "**********ig",
        "is_env_setting": false,
        "env_name": "MB_EMAIL_SMTP_PASSWORD",
        "description": "SMTP password.",
        "default": null
    },
```

So the frontend form never has a valid password in it. When you edit
email settings, we check if those are valid, and if so, commit the
settings. But with the wrong password email settings are almost always
wrong. You have to re-type the email password just to change other
settings, like the friendly name, reply-to email address, etc.

So we recognize that we've received the obfuscated value, swap in the
real value for the email connection test, and then obfuscate the
password in the response. If we do not receive an obfuscated value, just
leave it alone and return the value anyways (they typed it in, so seems
safe to send it back).

Note that there is a function in models.setting called
`obfuscated-value?` that checks strings against a regex of
`#"^\*{10}.{2}$"`. This is used to never set settings to an obfuscated
value. But its a bit less sensitive than our purposes need. We have a
real value and an obfuscated value so we can check if the obfuscated
value is based on the real value. (obfuscate-value "password") ->
"**********rd" . Whereas we might recognize "**********AA" as the
obfuscated value if we reused that helper function.

NOTE this could be weird if anyone's password changes from "password" to
"**********rd" but the chances of that happening are miniscule.

Had thought to have the FE not send a value at all if it is unchanged
but this had some far-reaching implications that we don't want to tackle
so close to the release. There's an associated issue for LDAP settings
that is largely the same as this:
https://github.com/metabase/metabase/issues/16708 But it is not clear to
me if these are the only two places or if there could be others. Until
that research is done we can just accept this patch as is and come up
with a systematic approach in the future.

But it does seem only three settings are sensitive:
```clojure
setting=> (->> (vals @registered-settings)
               (filter :sensitive?)
               (map :name))
(:saml-keystore-password :email-smtp-password :ldap-password)
```

The direct setting apis won't write setting values which appear as
obfuscated, but we have other endpoints that take collections of
settings as a unit (for instance, the endpoint /api/email that takes all
of the email settings here to test if they work together).

Unverified

5e49508d

Async card metadata (#23672) · 21aa0053

dpsutton authored 2 years ago


* Lets use the main thread

* Strip out channel stuff and rename

* 202 -> 200 response

When returning a channel we return a 202. A map is just a 200. Since we
no longer need to have the main stuff async (as opposed to the metadata
stuff) we can just return the map with a 200 instead of this long
running channel stuff and a 202.

* Last test

* renames, logging, ensure query is the same before saving metadata

* Sandbox test 202 -> 200

* Another 202 -> 200

* Put timeout on async metadata saving

timeout of 15 minutes before we give up on the async metadata saving. It
is possible this cuts things off but hard to tell if work is still being
done at that point.

* outdated comment

* Return json error message, not text/plain

this is a subtle one that I'm not happy about. Our error handling will
return text/plain if you throw an `(ex-info "something" {:status-code
400})`.

```shell
❯ http post localhost:3000/api/timeline-event/ name=some-name description=Bob timestamp=2022 timezone=America/Central time_matters:=false timeline_id:=1629  Cookie:$COOKIE
HTTP/1.1 404 Not Found
Content-Type: text/plain

Timeline with id 1,629 not found
```

But if you add extra information to the map to the ex-info, you get
json!

```clojure
(defmethod api-exception-response Throwable
  [^Throwable e]
  (let [{:keys [status-code], :as info} (ex-data e)
        other-info                      (dissoc info :status-code :schema :type)
        body                            (cond
                                          (and status-code (empty? other-info))
                                          ;; If status code was specified but other data wasn't, it's something like a
                                          ;; 404. Return message as the (plain-text) body.
                                          (.getMessage e)

                                          ;; if the response includes `:errors`, (e.g., it's something like a generic
                                          ;; parameter validation exception), just return the `other-info` from the
                                          ;; ex-data.
                                          (and status-code (:errors other-info))
                                          other-info

                                          ;; Otherwise return the full `Throwable->map` representation with Stacktrace
                                          ;; and ex-data
                                          :else
                                          (merge
                                           (Throwable->map e)
                                           {:message (.getMessage e)}
                                           other-info))]
    {:status  (or status-code 500)
     :headers (mw.security/security-headers)
     :body    body}))
```

So this fix is a _very_ subtle way to get to what we want, although it
does add a bunch of extra junk to our response.

```javascript
{
 ...
 "message": "Invalid Field Filter: Field 26 \"PRODUCTS\".\"CATEGORY\"
belongs to Database 1 \"Sample Database\", but the query is against
Database 990 \"copy of sample dataset\"",
 "data": {
   "status-code": 400,
   "query-database": 990,
   "field-filter-database": 1}
 ...
}
```

Reminder of what we want: the frontend is saving a card. We added a
field filter on a database and then changed the source database of the
query. So the backend needs to reject the field filter as being on the
wrong db. The FE expects a json response with a message and will then
show that message in the save/edit modal.

Why did this come up now: these endpoints for saving and editing a card
used to always return 202 streaming responses. This means they would
have to tuck any errors inside of an already open 202 response. Which is
why you get a message as a json response. But now that they are sync,
the api can just return a proper 400 with a reason. But we still want
that to be a json response for the FE.

* error layout fix

* Several Cleanups

- make sure numbers have units at the end (-ms)
- use (u/minutes->ms 15) rather than a more opaque (* 15 60 1000)
- move the scheduled metadata saving into its own function

This last bit is actually a bit more than that. I was previously
throwing everything in a thread submitted to the pooled executor. I'm
now using a `future` but with a caveat: All of the waiting for the
timeout, checking if we got metadata is done in a simple `a/go` block
now, and the thread does just the last bit of IO. We have to select the
card as it is now to ensure the query is the same and then save.

Refresher on why we have to check if the query is the same. We have up
to 15 minutes to wait for the query metadata to come back. Completely
possible for them to have edited the query and then our metadata is
useless.

Co-authored-by: Aleksandr Lesnenko <alxnddr@gmail.com>

Unverified

21aa0053

Fix constantly checking token on error (#23815) · e7b4dc62

dpsutton authored 2 years ago

* Fix constantly checking token on error

We were catching `clojure.lang.ExceptionInfo` which only catches a small
subset of errors and not any related to what might be thrown in network
calls.

The reason this affected us is that the cache will cache this value but
keep trying what is throwing the error. So each time we look at token
features it attempts again and throws an error.

If we instead do what the code _meant_ to do and we return a value
indicating there was an error, we are all good.

Example:

```clojure
premium-features=> (defn is-odd? [x]
                     (println "computing for " x)
                     (if (odd? x) true (throw (ex-info "boom" {}))))
premium-features=> (def modd (memoize/ttl is-odd? :ttl/threshold 30000))
premium-features=> (modd 3)
computing for  3
true
premium-features=> (modd 5)
computing for  5
true
premium-features=> (modd 3)
true
premium-features=> (modd 2)
computing for  2
Execution error (ExceptionInfo) at metabase.public-settings.premium-features/is-odd? (REPL:289).
boom
premium-features=> (modd 2)
computing for  2
Execution error (ExceptionInfo) at metabase.public-settings.premium-features/is-odd? (REPL:289).
boom
```

Note that `(modd 2)` keeps attempting to compute for 2 since it throws
an error rather than returning a value indicating an error.

When the token check throws an error:

```clojure
premium-features=> (fetch-token-status (apply str (repeat 64 "b")))
Execution error (UnknownHostException) at java.net.Inet6AddressImpl/lookupAllHostAddr (Inet6AddressImpl.java:-2).
store.metabase.com: nodename nor servname provided, or not known
```

When the token check returns a value indicating an error:

```clojure
premium-features=> (fetch-token-status (apply str (repeat 64 "b")))
{:valid false,
 :status "Unable to validate token",
 :error-details "store.metabase.com: nodename nor servname provided, or not known"}
```

(both instances were with wifi turned off locally)

* add test

* Assert we only call the api once when it throws an error

before this PR we would make a call for each setting (perhaps
more). Each time trying to hit the endpoint. Now it catches those errors
and reports not a valid token

Unverified

e7b4dc62

Store linked-filter fieldvalues (#23699) · e8d98d95

Ngoc Khuat authored 2 years ago

* first pass storing linked-filter fieldvalues

* limit linked-filter to not explode

* no check perms for fields when getting params values on dashboard

Unverified

e8d98d95

Jul 14, 2022

Fix for #20716: "Our analytics" can be confusing when user does not have... · d06729e1

Cal Herries authored 2 years ago

Fix for #20716: "Our analytics" can be confusing when user does not have access to that collection (#23887)

* Change collection endpoints to not return root collection if user doesn't have Root permission

* hide our analytics collection in saved question picker when users do not have access

* Fix tests

Co-authored-by: Aleksandr Lesnenko <alxnddr@gmail.com>

Unverified

d06729e1

Update email-to test with from from-name setting (#23944) · 72885b78
Cal Herries authored 2 years ago

Unverified

72885b78
Serdes v2 for Native Query Snippets (#23961) · 457e172d
Braden Shepherdson authored 2 years ago
```
Also standardizes date/time output to `ZonedDateTime`, rather than whatever the JDBC happens to return.
```
View commits for tag rm rm Unverified

457e172d

Jul 13, 2022

Session activity timeout (#23349) · 0defe7f8

Cal Herries authored 2 years ago


* logout when session expires, login when session appears

* add setting UI

* Add last_activity column to session table

* Start implementing session middleware to check for expired sessions

* Change last_activity field to include timezone offset

* Update session middleware to check user activity timeout

* Update last_activity after checking the timeout, or not at all if the setting is nil

* Move session-timeout settings to server.middleware.session

* Handcode timeout for testing

* Fix migrations validation error

* Fix whitespace

* Change session timeout to use metabase.TIMEOUT cookie with expiry

* Remove migration for last_activity column on session table

* Revert changes to logout endpoint

* Revert change to Session model pre-update

* Remove tap>

* Fix tests to include cookie value

* Fix timeout when user is logged out. Timeout loop should only start when a user is logged in

* Update comment and date format

* Store the session-timeout setting as json and convert it to seconds on the fly

* Set zoned date time to use GMT instead of default time zone

* Refactor for testing

* refactor session listener (#23686)

* remove old session listener

* Clear the timeout cookie when user signs out

* Clear session cookie if the timeout cookie expires

* fe tweaks

* Update expires attribute for session and timeout cookies together

* Reapply minimum limit on session-timeout

* Rename functions and fix lint warnings

* Fix resetting session-timeout

* Fix sign out

* Fix tests

* Whitespace

* Get full-app-embeds working

* Add test for embedded session

* session timeout ui tweaks

* fix security issue

* Fix test

* Fix tests

* Do not redirect to "/" if there isn't any redirect URL

* Add test for session-cookies setting

* Fix bug when toggling off timeout and adjust tests

Co-authored-by: Aleksandr Lesnenko <alxnddr@gmail.com>
Co-authored-by: Aleksandr Lesnenko <alxnddr@users.noreply.github.com>

Unverified

0defe7f8