This project is mirrored from https://github.com/metabase/metabase.
Pull mirroring updated .
- Aug 05, 2022
-
-
dpsutton authored
* Fix in-memory logger Our Admin > Troubleshooting > Logs page broke, just showing a spinner and never showing the logs. Don't quite understand why this fixes it. Javadocs are https://logging.apache.org/log4j/2.x/log4j-api/apidocs/org/apache/logging/log4j/LogManager.html#getContext-boolean- ```clojure logger=> (log/warn "test") nil logger=> (count @messages*) 0 ;; no in-memory logs so page is empty ;; change `(LogManager/getContext true)` in the momoized ns-logger fn ;; and then observe: logger=> (log/warn "test") nil logger=> (count @messages*) 4 ``` Some explorations that might shine some light: ```clojure logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext false) (str *ns*)))) {} logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext true) (str *ns*)))) {"metabase-appender" #object[metabase.logger.proxy$org.apache.logging.log4j.core.appender.AbstractAppender$ff19274a "0x4d680247" "metabase-appender"]} ``` So something is not hooked up quite right. * Add tests for metabase.logger * Ensure `api/util/logs` returns logs * tests * Check for presence of `MetabaseLoggerFactory` rather than whole str When in a namespace with a record, `Foo resolves to ns.Foo. But outside it resolves to ns/Foo. When running tests from the command line *ns* is user so it gets more complicated. * Kinda playing whackamole™ with `(LogManager/getContext true)` * Remove custom memoizing logger History: 39.2 we set `Multi-Release: true` in our manifest file and query speed drops like a stone. Jeff was able to track this to our logger calls in tight loops. We revert the multi-release and keep seeing the warning on startup > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Benchmarking on 39.2 (bench (log/trace "hi")) -> 15383 ns So we freaked out and set up a memoizing logger factory (bench (log/trace "hi")) -> 141 ns What a win. But we never noticed that the default *logger-factory* being picked up was slf4j ( `(log.impl/name log/*logger-factory*)` -> "org.slf4j" ). On 39.2 if you set the factory to the log4j2 version you get back to a great number: `(bench (log/trace "hi"))` -> 25 ns And thus ensuring that our logger is the default log4j2 version is even faster than our memoizing logger-factory. Memoizing factory: 141 ns slf4j factory: 2269 ns log4j2 factory: 31 ns What does `(LogManager/getContext false)` mean versus using `true`? We only need and want a single context. But log4j2 by default uses a context selector called `ClassLoaderContextSelector`. We could put all of this behind us if we used a context selector type `BasicContextSelector` but this is surprisingly hard to do: you have to set a system property. And since all of this stuff gets initialized in static initializers, we don't have an opportunity to do this programmatically. The way around this is to require people to pass this system property on startup which is not acceptable. So getContext true checks for a threadlocal context in a specific static variable and falls back to a Default context. getContext false looks at classloaders and ends up at a different context. BUT: the log.impl version uses a closure over getContext false instead of getting it each time. And I suspect that when it does this there is only one so it is the default and continues to use this one. In our LoggerFactory implementation we were looking up the context each time. This still seems to work and everything is playing nice in our application classloader but its totally possible that our drivers are not hitting this. I'll have to investigate this. Verification: - build the uberjar locally (`bin/build`) - copy to some temp directory and also copy criterium.jar ```shell MB_JETTY_PORT=4000 java "$(socket-repl 4001)" -cp locally-built.jar:criterium.jar metabase.core ``` ```clojure /tmp/locally-built via
v17.30 on metabase-query ❯ nc localhost 4001 user=> (doto 'metabase.logger require in-ns) metabase.logger metabase.logger=> (require '[criterium.core :refer [bench]]) nil metabase.logger=> (bench (log/trace "hi")) Evaluation count : 1686535500 in 60 samples of 28108925 calls. Execution time mean : 22.487972 ns Execution time std-deviation : 0.101004 ns Execution time lower quantile : 22.326806 ns ( 2.5%) Execution time upper quantile : 22.648368 ns (97.5%) Overhead used : 6.924761 ns nil metabase.logger=> (count (messages)) 358 metabase.logger=> ``` Verifies that we are on the order of 22 nanoseconds and the in-memory logger has messages in it. * Appease our namespace linters * I'll unplug you ns linter * Better tests and ns docstring * Bootstrap to set system properties New entrypoint for the application: metabase.bootstrap sets two properties for logging (context selector, log4j2 factory) and ensures those properties are set before any logging code is loaded. * docstrings and clean ns * metabase.logger ns docstring cleanup * docstring * rename a test now that there's no memoization * add logger properties to :dev profile * Revert "add logger properties to :dev profile" This reverts commit 4f09fa3b631f882a3c5edcab4508769ffb20d4fa. * deps
-
- Aug 04, 2022
-
-
metamben authored
-
metamben authored
-
Ryan Laurie authored
* Dashboard Card String Filters are now case-insensitive This PR is a draft because while it solves the problem of string filters being case sensitive, it doesn't necessarily do it in the best way. This isn't necessarily a bug, but it seems that there is no way for the frontend to set the :case-sensitive true/false option anyway. For the purposes of this initial attempt at a solution, I have modified the endpoint ` POST "/:dashboard-id/dashcard/:dashcard-id/card/:card-id/query"` to automatically include an option map containing :case-sensitive false. The machinery to take this option into consideration already exists with the default :sql ->honeysql function in the ` metabase.driver.sql.query-processor` namespace. See the `like-clause` private function in particular. Since the query processor is a bit opaque to me at present, I was unable to figure out if there is a proper way that the frontend could send an options map (or key-value pair) all the way through the qp middleware to the query building stage. I discovered that if you conj a map `{:case-sensitive false}` to the output of the `to-clause` function in `metabase.driver.common.parameters.operators`, you will get the desired case-insensitive behavior. So, I modified the to-clause function in this PR to appropriately conj an options map if one exists. What I'd like to know: - is there a super-simple way to pass an option in already that I just missed? (eg. I thought perhaps in the `[:field 13 nil]` that 'nil' could be an options map, but I couldn't get that to work for me) - is there a middleware approach that I should consider? - any other options to appropriately hanlde this? * Revert the endpoint. If the frontend sends an options map on an options key inside the parameter, this endpoint will pass that on, so no change is needed. * include parameter options in datasetQuery Co-authored-by:
Adam James <adam.vermeer2@gmail.com>
-
Cam Saul authored
* Fix SSH tunnel leaks when testing connections * Appease kondo =( * Add some more NOTES
-
Braden Shepherdson authored
These fields hold JSON, which can contain field IDs in a few places. Particularly nasty is the `:column_settings` subfield, which is a map whose *keys* are JSON strings with field IDs.
-
dpsutton authored
If you click "view sql" on a cached model you see the substituted query rather than the "real" query. The query based on cache: ```sql SELECT "source"."login" AS "login", "source"."count" AS "count" FROM (select "login", "count" from "metabase_cache_e449f_19"."model_4112_issue_assi") "source" LIMIT 1048575 ``` The real underlying query: ```sql SELECT "source"."login" AS "login", "source"."count" AS "count" FROM (SELECT "github_raw"."issue_events__issue__assignees"."login" AS "login", count(*) AS "count" FROM "github_raw"."issue_events__issue__assignees" GROUP BY "github_raw"."issue_events__issue__assignees"."login" ORDER BY "count" DESC, "github_raw"."issue_events__issue__assignees"."login" ASC) "source" LIMIT 1048575 ``` If you click the "convert to sql question" you will end up with a sql question that is not based on the original question but the Metabase managed table.
-
Ngoc Khuat authored
-
- Aug 03, 2022
-
-
Noah Moss authored
* set CookieSpecs/STANDARD on Apache HttpClient for Snowplow * add a comment in the code pointing to the PR * set cookie spec via RequestConfig instead
-
Howon Lee authored
* rewrite here... * no bq * thinking * conjunction of alias thing... * make sure we dont wanna see it * nit * multi-arity func * fiddly bit on test * add bit to see order by works right * no default breakout true
-
Braden Shepherdson authored
This PR handles the other JSON-encoded MBQL snippets I was able to find. These snippets contain `Table` and `Field` IDs, and so are not portable. These fields are expanded during serialization and the IDs replaced with portable references, then converted back in deserialization. Note that the referenced field must already be loaded before it has a valid ID. `serdes-dependencies` defines this order, therefore each entity depends on those tables and fields referenced in its MBQL fragments. The complete set of fields I found to convert: - `Metric.definition` - `Segment.definition` - `DashboardCard.parameter_mappings` - `Card.parameter_mappings`
-
dpsutton authored
* Remove deprecated friend library - friend has two functions we used: bcrypt and bcrypt-verify. Easy to lift them into our own namespace with attribution - uses simple interop on org.mindrot.jbcrypt.BCrypt to achieve these - also brings in other stuff we don't need ``` com.cemerick/friend 0.2.3 X org.mindrot/jbcrypt 0.3m :use-top <- all we care about X org.clojure/core.cache 0.6.3 :superseded X org.clojure/data.priority-map 0.0.2 :parent-omitted . org.openid4java/openid4java-nodeps 0.9.6 X commons-logging/commons-logging 1.1.1 :older-version . net.jcip/jcip-annotations 1.0 . com.google.inject/guice 2.0 . aopalliance/aopalliance 1.0 ``` And we already declare a dependency on 0.4 of this lib ``` org.mindrot/jbcrypt 0.4 ``` This means we can remove openid4, google.inject/guice, aopalliance, etc and just keep using the same `BCrypt` java class we have been using this whole time. Behavior and classfiles are identical. So very low risk Want to call out a use of ```clojure (when-not api/*is-superuser?* (api/checkp (u.password/bcrypt-verify (str (:password_salt user) old_password) (:password user)) "old_password" (tru "Invalid password"))) ``` This has the same signature of an existing function in `u.password/verify-password`: ```clojure (defn verify-password "Verify if a given unhashed password + salt matches the supplied hashed-password. Returns `true` if matched, `false` otherwise." ^Boolean [password salt hashed-password] ;; we wrap the friend/bcrypt-verify with this function specifically to avoid unintended exceptions getting out (boolean (u/ignore-exceptions (bcrypt-verify (str salt password) hashed-password)))) ``` I did not replace it in this PR so that the diff is essentially `creds/<fn>` -> `u.password/<fn>` and very easy to structually see what is going on. But totally makes sense to clean up the usages of these in another pass * sort ns * simple tests
-
- Aug 02, 2022
-
-
Braden Shepherdson authored
These snippets contain raw `Database`, `Table`, and `Field` IDs, and so are not portable. This PR proves out the concept for `Card.dataset_query`, paving the way for the others in a follow-up PR.
-
Howon Lee authored
Enforce bigint for all int fields derived from JSON instead of doing more complete refactor (#24494) Pursuant to #22732. Should basically eliminate it with prejudice by saying that any int should fit bigints in our point of view in postgres.
-
- Aug 01, 2022
-
-
Ngoc Khuat authored
* correct format datetime property for snowplow * some docs and comments
-
- Jul 29, 2022
-
-
Noah Moss authored
-
- Jul 28, 2022
-
-
Noah Moss authored
-
dpsutton authored
* Try multi-release true again in our manifest Problem statement: Luiz packs our partner jars (exasol, starburst, etc.) into our jar so they can be "first class" and in cloud. But with the 44 cycle we've run into some issues: ```shell /tmp/j via
v17.30 on metabase-query ❯ jar uf 0.44.0-RC1.jar modules/*.jar ❯ java --version openjdk 11.0.14.1 2022-02-08 OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1) OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode) /tmp/j via v11.0.14.1 on metabase-query ❯ jar uf 0.44.0-RC1.jar modules/*.jar java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0 at java.base/jdk.internal.module.ModuleInfo.invalidModuleDescriptor(ModuleInfo.java:1091) at java.base/jdk.internal.module.ModuleInfo.doRead(ModuleInfo.java:195) at java.base/jdk.internal.module.ModuleInfo.read(ModuleInfo.java:147) at java.base/java.lang.module.ModuleDescriptor.read(ModuleDescriptor.java:2553) at jdk.jartool/sun.tools.jar.Main.addExtendedModuleAttributes(Main.java:2083) at jdk.jartool/sun.tools.jar.Main.update(Main.java:1017) at jdk.jartool/sun.tools.jar.Main.run(Main.java:366) at jdk.jartool/sun.tools.jar.Main.main(Main.java:1680) ``` Diogo tracked this down with some great sleuthing to an upgrade in our graal/js engine from “22.0.0.2” -> “22.1.0". This brought along the transitive truffle jar (which is the actual engine powering the js engine). The 22.0.0.2 was technically a multi-release jar but it only included java 11 sources. The 22.1.0 added java 17 sources in addition to the java 11. And this proves fatal to using the `jar` command. When `"Multi-Release"` is set to true, it knows to only look at versions it will need. Lacking this, it looks at all of the classes and the class version for 17 is 61.0 is higher than it knows how to understand and it breaks. Obvious Solution: Set Multi-Release to true. We have done this in the past. On startup we have a message logged: > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This > will impact performance. And setting multi-release can remove this. But when we did that we ended up with: - https://github.com/metabase/metabase/issues/16380 - https://github.com/metabase/metabase/pull/17027 That issue describes slowdowns of queries on the order of 0.6 seconds -> 1.3 seconds. Almost doubling. People reported dashboards timing out. Jeff tracked this down to > Profiling revealed that the calls to Log4jLoggerFactory.getLogger > became much slower between the two versions. See attached screenshots. And this is a pernicious problem that we cannot easily test for. Lets try again: I've set multi-release to true and built a jar with `bin/build`. I immediately ran into problems: ```shell ❯ MB_DB_CONNECTION_URI="postgres://user:pass@localhost:5432/compare " MB_JETTY_PORT=3007 java "$(socket-repl 6007)" -jar multi-release-local.jar Warning: protocol #'java-time.core/Amount is overwriting function abs WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time.core, being replaced by: #'java-time.core/abs WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time, being replaced by: #'java-time/abs Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home Exception in thread "main" java.lang.Error: Circular loading of installed providers detected at java.base/java.nio.file.spi.FileSystemProvider.installedProviders(FileSystemProvider.java:198) at java.base/java.nio.file.Path.of(Path.java:206) at java.base/java.nio.file.Paths.get(Paths.java:98) at org.apache.logging.log4j.core.util.Source.toFile(Source.java:55) at org.apache.logging.log4j.core.util.Source.<init>(Source.java:142) at org.apache.logging.log4j.core.config.ConfigurationSource.<init>(ConfigurationSource.java:139) ``` So hazarded a guess that a bump in the log4j would solve this. And it does solve it. Then profiling some queries against bigquery (just viewing the table) in the RC2 and the locally built version with the multi-release: ```shell -- multi-release 2022-07-27 12:28:00,659 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.1 s 2022-07-27 12:28:02,609 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 897.9 ms 2022-07-27 12:28:03,950 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 778.1 ms -- RC non-multi-release 2022-07-27 12:28:57,633 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.0 s 2022-07-27 12:28:59,343 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 912.9 ms 2022-07-27 12:29:02,328 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 808.6 ms ``` So times seem very similar. ============ Proper benching: using criterium ```shell MB_JETTY_PORT=3008 java "$(socket-repl 6008)" -cp "/Users/dan/.m2/repository/criterium/criterium/0.4.6/criterium-0.4.6.jar":0.39.2.jar metabase.core ``` `(bench (log/warn "benching"))` Summary: 39.2: 21.109470 µs RC2: 4.975204 µs multi-release: 7.673965 µs These flood the consoles with logs ``` Older release: 39.2 user=> (bench (log/warn "benching")) Evaluation count : 2886240 in 60 samples of 48104 calls. Execution time mean : 21.109470 µs Execution time std-deviation : 567.271917 ns Execution time lower quantile : 20.171870 µs ( 2.5%) Execution time upper quantile : 22.429557 µs (97.5%) Overhead used : 6.835913 ns Found 5 outliers in 60 samples (8.3333 %) low-severe 4 (6.6667 %) low-mild 1 (1.6667 %) Variance from outliers : 14.1886 % Variance is moderately inflated by outliers ============================================= RC2: user=> (bench (log/warn "benching"))Evaluation count : 12396420 in 60 samples of 206607 calls. Execution time mean : 4.975204 µs Execution time std-deviation : 521.769687 ns Execution time lower quantile : 4.711607 µs ( 2.5%) Execution time upper quantile : 6.404317 µs (97.5%) Overhead used : 6.837290 ns Found 5 outliers in 60 samples (8.3333 %) low-severe 2 (3.3333 %) low-mild 3 (5.0000 %) Variance from outliers : 72.0600 % Variance is severely inflated by outliers ============================================= Proposed Multi-Release user=> (bench (log/warn "benching")) Evaluation count : 7551000 in 60 samples of 125850 calls. Execution time mean : 7.673965 µs Execution time std-deviation : 201.155749 ns Execution time lower quantile : 7.414837 µs ( 2.5%) Execution time upper quantile : 8.138010 µs (97.5%) Overhead used : 6.843981 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 14.1472 % Variance is moderately inflated by outliers ``` `(bench (log/info "benching info"))` This does not hit a console so is a no-op. Summary: 39.2: 11.534614 µs RC2: 98.408357 ns multi-release: 2.236756 µs ``` ============================================= 39.2: user=> (bench (log/info "benching info")) Evaluation count : 5223480 in 60 samples of 87058 calls. Execution time mean : 11.534614 µs Execution time std-deviation : 57.756163 ns Execution time lower quantile : 11.461502 µs ( 2.5%) Execution time upper quantile : 11.657644 µs (97.5%) Overhead used : 6.835913 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 2 (3.3333 %) low-mild 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers ============================================= RC2: user=> (bench (log/info "benching info"))Evaluation count : 574427220 in 60 samples of 9573787 calls. Execution time mean : 98.408357 ns pExecution time std-deviation : 1.792214 ns Execution time lower quantile : 96.891477 ns ( 2.5%) Execution time upper quantile : 103.394664 ns (97.5%) Overhead used : 6.837290 ns Found 8 outliers in 60 samples (13.3333 %) low-severe 3 (5.0000 %) low-mild 5 (8.3333 %) Variance from outliers : 7.7881 % Variance is slightly inflated by outliers ============================================= Multi-release: user=> (bench (log/info "benching info"))Evaluation count : 26477700 in 60 samples of 441295 calls. Execution time mean : 2.236756 µs Execution time std-deviation : 15.412356 ns Execution time lower quantile : 2.212301 µs ( 2.5%) Execution time upper quantile : 2.275434 µs (97.5%) Overhead used : 6.843981 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 3 (5.0000 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers ``` * bump graal/js * Custom MB log factory (#24369) * Custom MB log factory * Write stupid code to appease stupid Eastwood * `ns-name` already calls `the-ns` on its argument. * More code cleanup * Improved code * Remove NOCOMMIT * empty commit to trigger CI Co-authored-by:Cam Saul <1455846+camsaul@users.noreply.github.com>
-
Howon Lee authored
Pursuant to #23635. Previous behavior was to blank out the nested field columns if there were too many for our arbitrary limit (of 100). However, this silent behavior was very user-unfriendly, and people were perplexed that the feature just seemed to turn off out of nowhere. New, better behavior is to log that there are too many and cut it off at 100, but still have some if there are 100.
-
- Jul 27, 2022
-
-
Ngoc Khuat authored
* revert #23658 * keep the migration to add native_query_snippet.template_tag, add a new migration to drop it * Remove snippet parameter support in fully parametrized check Co-authored-by:
Tamás Benkő <tamas@metabase.com>
-
dpsutton authored
* Ensure uploaded secrets are stable Fixes: https://github.com/metabase/metabase/issues/23034 Background: Uploaded secrets are stored as bytes in our application db since cloud doesn't have a filesystem. To make db connections we stuff them into temporary files and use those files. We also are constantly watching for db detail changes so we can recompose the connection pool. Each time you call `db->pooled-connection-spec` we check if the hash of the connection spec has changed and recompose the pool if it has. Problem: These uploaded files have temporary files and we make new temp files each time we call `db->pooled-connection-spec`. So the hashes always appear different: ```clojure connection=> (= x y) true connection=> (take 2 (clojure.data/diff (connection-details->spec :postgres (:details x)) (connection-details->spec :postgres (:details y)))) ({:sslkey #object[java.io.File 0x141b0f09 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_1388256635324085910.tmp"], :sslrootcert #object[java.io.File 0x6f443fac "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_9248342447139746747.tmp"], :sslcert #object[java.io.File 0xbb13300 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_17076432929457451876.tmp"]} {:sslkey #object[java.io.File 0x6fbb3b7b "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_18336254363340056265.tmp"], :sslrootcert #object[java.io.File 0x6ba4c390 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_11775804023700307206.tmp"], :sslcert #object[java.io.File 0x320184a0 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_10098480793225259237.tmp"]}) ``` And this is quite a problem: each time we get a db connection we are making a new file, putting the contents of the secret in it, and then considering the pool stale, recomposing it, starting our query. And if you are on a dashboard, each card will kill the pool of the previously running cards. This behavior does not happen with the local-file path because the secret is actually the filepath and we load that. So the file returned is always the same. It's only for the uploaded bits that we dump into a temp file (each time). Solution: Let's memoize the temp file created by the secret. We cannot use the secret as the key though because the secret can (always?) includes a byte array: ```clojure connection-test=> (hash {:x (.getBytes "hi")}) 1771366777 connection-test=> (hash {:x (.getBytes "hi")}) -709002180 ``` So we need to come up with a stable key. I'm using `value->string` here, falling back to `(gensym)` because `value->string` doesn't always return a value due to its cond. ```clojure (defn value->string "Returns the value of the given `secret` as a String. `secret` can be a Secret model object, or a secret-map (i.e. return value from `db-details-prop->secret-map`)." {:added "0.42.0"} ^String [{:keys [value] :as _secret}] (cond (string? value) value (bytes? value) (String. ^bytes value StandardCharsets/UTF_8))) ``` Why did this bug come up recently? [pull/21604](https://github.com/metabase/metabase/pull/21604) gives some light. That changed `(hash details)` -> `(hash (connection-details->spec driver details))` with the message > also made some tweaks so the SQL JDBC driver connection pool cache is > invalidated when the (unpooled) JDBC spec returned by > connection-details->spec changes, rather than when the details map > itself changes. This means that changes to other things outside of > connection details that affect the JDBC connection parameters, for > example the report-timezone or start-of-week Settings will now properly > result in the connection pool cache being flushed So we want to continue to hash the db spec but ensure that the spec is stable. * typehint the memoized var with ^java.io.File * Switch memoization key from String to vector of bytes Copying comment from Github: When you upload a sequence of bytes as a secret, we want to put them in a file once and only once and always reuse that temporary file. We will eventually hash the whole connection spec but I don't care about collisions there. It's only given the same sequence of bytes, you should always get back the exact same temporary file that has been created. So i'm making a function `f: Secret -> file` that given the same Secret always returns the exact same file. This was not the case before this. Each uploaded secret would return a new temporary file with the contents of the secret each time you got its value. So you would end up with 35 temporary files each with the same key in it. An easy way to get this guarantee is to memoize the function. But the secret itself isn't a good key to memoize against because it contains a byte array. If the memoization key is the byte-array itself, this will fail because arrays have reference identity: ```clojure user=> (= (.getBytes "hi") (.getBytes "hi")) false ``` So each time we load the same secret from the database we get a new byte array, ask for its temp file and get a different temp file each time. This means that memoization cannot be driven off of the byte array. But one way to gain this back quickly is just stuff those bytes into a string, because strings compare on value not identity. This is what is currently in the PR (before this change). I was banking on the assumption that Strings are just opaque sequences of bytes that will compare byte by byte, regardless of whether those bytes make sense. But you've pointed out a good point that maybe that is flirting with undefined behavior. If we use the hash of the contents of the byte array as the memoization key with (j.u.Arrays/hashCode array), then we open ourselves up the (albeit rare) case that two distinct secret values hash to the same value. This sounds really bad. Two distinct secrets (think two ssh keys) but both would map to only one file containing a single ssh key. An easy way to have the value semantics we want for the memoization is just to call (vec array) on the byte array and use this sequence of bytes as the memoization key. Clojure vectors compare by value not reference. So two secrets would return the same file if and only if the sequence of bytes are identical, in which case we would expect the files to be identical. This gives me the same guarantee that I was wanting from the String behavior I used initially without entwining this with charsets, utf8, etc.
-
Ngoc Khuat authored
* hot fix for the wrong api-documentation * resolve the reflection warning
-
Howon Lee authored
Pursuant to #24214. Previously MySQL JSON spunout fields didn't work when the individual fields had heterogeneous contents because type information was missing, unlike where type information was provided in Postgres JSON fields. Now they are provided, along with a refactor to not use reify which was used in Postgres JSON fields because the type operator was really annoying otherwise to add (MySQL type cast is a function qua function).
-
- Jul 26, 2022
- Jul 25, 2022
-
-
adam-james authored
* Add a check to PUT /user/:id to disallow name edits if an SSO user * Clean up After SAML SSO tests The `:sso_source` key is set for the Rasta user in some SAML tests, but is expeted to be nil in subsequent tests, so we clean up in the SAML test ns. * Add a test to ensure SSO user names can't be changed via API * Missed a change I had made while adjusting tests * valid-name-update? take 1 name, to allow better error messages Let's the user know that first or last name is the cause of a problem, rather than just 'names'. * Remove unneeded thread macro * Use partial= * slight change to local fn for reusability
-
dpsutton authored
* Ignore fields in joins when hoisting joined fields Need to understand two sides of mbql to understand this. ```clojure (defn nest-expressions "Pushes the `:source-table`/`:source-query`, `:expressions`, and `:joins` in the top-level of the query into a `:source-query` and updates `:expression` references and `:field` clauses with `:join-alias`es accordingly. See tests for examples. This is used by the SQL QP to make sure expressions happen in a subselect." ...) ``` Make a query against the orders table joined to the people table and select order id, order total, and the person's email: ```sql SELECT "PUBLIC"."orders"."id" AS "ID", "PUBLIC"."orders"."total" AS "TOTAL", "People - User"."email" AS "People - User__EMAIL" FROM "PUBLIC"."orders" LEFT JOIN "PUBLIC"."people" "People - User" ON "PUBLIC"."orders"."user_id" = "People - User"."id" LIMIT 1048575 ``` Now add a custom column called adjective, which is 'expensive' when total > 100 else 'cheap' ```sql SELECT "source"."id" AS "ID", "source"."total" AS "TOTAL", "source"."adjective" AS "adjective", "source"."people - user__email" AS "People - User__EMAIL" FROM (SELECT "PUBLIC"."orders"."id" AS "ID", "PUBLIC"."orders"."user_id" AS "USER_ID", "PUBLIC"."orders"."product_id" AS "PRODUCT_ID", "PUBLIC"."orders"."subtotal" AS "SUBTOTAL", "PUBLIC"."orders"."tax" AS "TAX", "PUBLIC"."orders"."total" AS "TOTAL", "PUBLIC"."orders"."discount" AS "DISCOUNT", "PUBLIC"."orders"."created_at" AS "CREATED_AT", "PUBLIC"."orders"."quantity" AS "QUANTITY", CASE WHEN "PUBLIC"."orders"."total" > 50 THEN 'expensive' ELSE 'cheap' end AS "adjective", "People - User"."email" AS "People - User__EMAIL", "People - User"."id" AS "People - User__ID" FROM "PUBLIC"."orders" LEFT JOIN "PUBLIC"."people" "People - User" ON "PUBLIC"."orders"."user_id" = "People - User"."id") "source" LIMIT 1048575 ``` We put the "work" in a nested query and then just update the outer select to select `source.total` instead of `orders.total`. But the way this figured out which joins to hoist up was a bit broken. It walked all over the inner query, finding fields it was interested in. However, it also got fields it shouldn't have, by descending into join information that should be opaque at this level. In the repro for this, we make a card Q1, selecting product_id and count, but with an implicit join to products and filtering where category = doohickey. ```clojure ;; card Q1 {:type :query, :query {:source-table 4, #_ reviews :filter [:= [:field 26 {:source-field 33}] "Doohickey"], #_ products.category :aggregation [[:count]], :breakout [[:field 33 nil]], #_ reviews.product_id :limit 2}, :database 1} ``` A second question Q2 queries the Orders table and joins to Q1 on orders.product_id = q1.product_id, and adds a custom column `+ 1 1`. ``` ;; card Q2, based on Q1 {:source-table 2, :expressions {"CC" [:+ 1 1]}, :fields [[:field 9 nil] [:field 3 nil] [:field 5 nil] [:field 6 nil] [:field 8 nil] [:field 7 nil] [:field 1 nil] [:field 4 {:temporal-unit :default}] [:field 2 nil] [:expression "CC" {:metabase.query-processor.util.add-alias-info/desired-alias "CC", :metabase.query-processor.util.add-alias-info/position 9}] [:field 33 nil] [:field "count" {:base-type :type/BigInteger}]], :joins [{:alias "Question 4918", :strategy :left-join, :fields [[:field 33 nil] [:field "count" {:base-type :type/BigInteger}]], :condition [:= [:field 5 nil] [:field 33 nil]], :source-card-id 4918, }]} ``` and this should yield the sql: ```sql SELECT "source"."ID" AS "ID", "source"."PRODUCT_ID" AS "PRODUCT_ID", "source"."TOTAL" AS "TOTAL", "source"."CC" AS "CC", "source"."Question 4918__PRODUCT_ID" AS "Question 4918__PRODUCT_ID", "source"."Question 4918__count" AS "Question 4918__count" FROM ( SELECT "PUBLIC"."ORDERS"."ID" AS "ID", "PUBLIC"."ORDERS"."USER_ID" AS "USER_ID", "PUBLIC"."ORDERS"."PRODUCT_ID" AS "PRODUCT_ID", "PUBLIC"."ORDERS"."SUBTOTAL" AS "SUBTOTAL", "PUBLIC"."ORDERS"."TAX" AS "TAX", "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL", "PUBLIC"."ORDERS"."DISCOUNT" AS "DISCOUNT", "PUBLIC"."ORDERS"."CREATED_AT" AS "CREATED_AT", "PUBLIC"."ORDERS"."QUANTITY" AS "QUANTITY", (1 + 1) AS "CC", "Question 4918"."PRODUCT_ID" AS "Question 4918__PRODUCT_ID", "Question 4918"."count" AS "Question 4918__count" FROM "PUBLIC"."ORDERS" LEFT JOIN ( SELECT "PUBLIC"."REVIEWS"."PRODUCT_ID" AS "PRODUCT_ID", count(*) AS "count" FROM "PUBLIC"."REVIEWS" LEFT JOIN "PUBLIC"."PRODUCTS" "PRODUCTS__via__PRODUCT_ID" ON "PUBLIC"."REVIEWS"."PRODUCT_ID" = "PRODUCTS__via__PRODUCT_ID"."ID" WHERE "PRODUCTS__via__PRODUCT_ID"."CATEGORY" = 'Doohickey' GROUP BY "PUBLIC"."REVIEWS"."PRODUCT_ID" ORDER BY "PUBLIC"."REVIEWS"."PRODUCT_ID" ASC) "Question 4918" ON "PUBLIC"."ORDERS"."PRODUCT_ID" = "Question 4918"."PRODUCT_ID") "source" LIMIT 1048575 ``` But when it was looking through to see which fields to hoist top level, it was also finding `"PRODUCTS__via__PRODUCT_ID"."ID"` and `"PRODUCTS__via__PRODUCT_ID"."CATEGORY"` as fields it should hoist up because it searched the whole tree underneath it which includes join conditions and such. But of course that doesn't matter, the only thing is really analyzing what fields come out of a query, and those fields do not come out of the nested query ```sql SELECT "PUBLIC"."REVIEWS"."PRODUCT_ID" AS "PRODUCT_ID", count(*) AS "count" FROM "PUBLIC"."REVIEWS" LEFT JOIN "PUBLIC"."PRODUCTS" "PRODUCTS__via__PRODUCT_ID" ON "PUBLIC"."REVIEWS"."PRODUCT_ID" = "PRODUCTS__via__PRODUCT_ID"."ID" WHERE "PRODUCTS__via__PRODUCT_ID"."CATEGORY" = 'Doohickey' GROUP BY "PUBLIC"."REVIEWS"."PRODUCT_ID" ORDER BY "PUBLIC"."REVIEWS"."PRODUCT_ID" ASC) "Question 4918" ``` that only returns product_id and count. And this matches the error it was (helpfully) throwing: ```clojure investigation=> (qp/compile nested-query) Execution error (ExceptionInfo) at metabase.query-processor.util.add-alias-info/field-source-table-alias (add_alias_info.clj:182). Cannot determine the source table or query for Field clause [:field 26 #_ products.category {:source-field 33, #_reviews.product_id :join-alias "PRODUCTS__via__PRODUCT_ID", :metabase.query-processor.util.add-alias-info/source-table "PRODUCTS__via__PRODUCT_ID", :metabase.query-processor.util.add-alias-info/source-alias "CATEGORY"}] ``` And here's the query it was analyzing when determining which fields it needed to hoist (looking for ones with `:join-alias` (which is also in the nest_query_test.clj file): ```clojure ;; removed some cruft and extra field information for brevity {:source-table 2, :expressions {"CC" [:+ 1 1]}, :fields [[:field 33 {:join-alias "Question 4918",}] [:field "count" {:join-alias "Question 4918"}]] :joins [{:alias "Question 4918", :strategy :left-join, :fields [[:field 33 {:join-alias "Question 4918"}] [:field "count" {:join-alias "Question 4918"}]] :condition [:= [:field 5 nil] [:field 33 {:join-alias "Question 4918",}]], :source-card-id 4918, :source-query {:source-table 4, ;; nested query has filter values with join-alias that should not ;; be selected :filter [:= [:field 26 {:join-alias "PRODUCTS__via__PRODUCT_ID"}] [:value "Doohickey" {}]], :aggregation [[:aggregation-options [:count] {:name "count"}]], :breakout [[:field 33 nil]], :limit 2, :order-by [[:asc [:field 33 nil]]], ;; nested query has an implicit join with conditions that should ;; not be selected :joins [{:alias "PRODUCTS__via__PRODUCT_ID", :strategy :left-join, :condition [:= [:field 33 nil] [:field 30 {:join-alias "PRODUCTS__via__PRODUCT_ID"}]] :source-table 1, :fk-field-id 33}]}, :source-metadata [{:field_ref [:field 33 nil]} {:field_ref [:aggregation 0]}]}]} ``` * Add round-trip test through qp This test is essentially the repro from the ticket https://github.com/metabase/metabase/issues/20809 ```clojure debug-qp=> (to-mbql-shorthand (:dataset_query (metabase.models/Card 4919))) (mt/mbql-query orders {:joins [{:source-table "card__4918", :alias "Question 4918", :condition [:= $product_id [:field %reviews.product_id {:join-alias "Question 4918"}]], :fields :all}], :expressions {"CC" [:+ 1 1]}}) debug-qp=> (to-mbql-shorthand (:dataset_query (metabase.models/Card 4918))) (mt/mbql-query reviews {:breakout [$product_id], :aggregation [[:count]], :filter [:= $product_id->products.category "Doohickey"]}) ``` Thanks to @cam for providing such a lovely tool
-
Braden Shepherdson authored
* Serdes v2: Add `--v2 true` flag to the `dump` and `load` commands This is the last piece for end-to-end serialization and deserialization. The results should be similar to v1, though probably not identical.
-
- Jul 23, 2022
-
-
metamben authored
-
- Jul 22, 2022
-
-
metamben authored
-
Braden Shepherdson authored
`PulseChannelRecipients` are a `:recipients` field on `PulseChannels`. This requires some careful handling when inserting or updating a `PulseChannel` to upsert the recipients.
-
- Jul 21, 2022
-
-
Noah Moss authored
-
Braden Shepherdson authored
-
Cal Herries authored
* Update search query to not match search string tokens with dataset_query column if the question is not native * Add test * Rewrite test * Rename query-card -> mbql-card * Use the correct dataset_query map
-
Cal Herries authored
* Correct `score-and-result` to count the weight of the text score when there is no text match * Replace nil-punning of scores with explicit checking for zero scores at each step * Update scoring functions that can return nil to return zero instead * Address archived search with no search string * Update e2e test * Update e2e test with new ordering of results * Fix dataset search with blank query string
-
Braden Shepherdson authored
-
- Jul 20, 2022
-
-
Noah Moss authored
* basic logic for including params in dashboard subscriptions * BE date formatting * redo approach to date formatting locale * refactor and use the same formatting code for parameters listed in subscription header * fix tests and lint * fix more tests * fix final (?) test and fix namespace lint error * a couple extra test cases * fix tests * new cypress test * address braden and ngoc's comments * fix tests * guard against nil text in substitute_tags
-
Ngoc Khuat authored
* fixing tests and make chain-filter dashboard endpoints returns the flag * remove using anonymous function in update * Update dashboard endpoints and `FieldValuesWidget` to handle `has_more_values` scenario (#24022) * update dashboard action/reducer * Update FieldValuesWidget * Update isSearchable to accept map of args * Add unit tests for various FVW utils * Split isSearchable logic a little Co-authored-by:
Dalton <daltojohnso@users.noreply.github.com>
-
Braden Shepherdson authored
-
metamben authored
This parameter makes the impression that Metabase can deal with encrypted private keys, but this is not the case. The internal keystore password can just as well be randomly generated in the code.
-