From 73599275ee0c255beba4cd0095f39587e95fce6f Mon Sep 17 00:00:00 2001 From: dpsutton <dan@dpsutton.com> Date: Wed, 27 Jul 2022 10:21:50 -0500 Subject: [PATCH] Ensure uploaded secrets are stable (#24325) * Ensure uploaded secrets are stable Fixes: https://github.com/metabase/metabase/issues/23034 Background: Uploaded secrets are stored as bytes in our application db since cloud doesn't have a filesystem. To make db connections we stuff them into temporary files and use those files. We also are constantly watching for db detail changes so we can recompose the connection pool. Each time you call `db->pooled-connection-spec` we check if the hash of the connection spec has changed and recompose the pool if it has. Problem: These uploaded files have temporary files and we make new temp files each time we call `db->pooled-connection-spec`. So the hashes always appear different: ```clojure connection=> (= x y) true connection=> (take 2 (clojure.data/diff (connection-details->spec :postgres (:details x)) (connection-details->spec :postgres (:details y)))) ({:sslkey #object[java.io.File 0x141b0f09 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_1388256635324085910.tmp"], :sslrootcert #object[java.io.File 0x6f443fac "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_9248342447139746747.tmp"], :sslcert #object[java.io.File 0xbb13300 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_17076432929457451876.tmp"]} {:sslkey #object[java.io.File 0x6fbb3b7b "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_18336254363340056265.tmp"], :sslrootcert #object[java.io.File 0x6ba4c390 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_11775804023700307206.tmp"], :sslcert #object[java.io.File 0x320184a0 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_10098480793225259237.tmp"]}) ``` And this is quite a problem: each time we get a db connection we are making a new file, putting the contents of the secret in it, and then considering the pool stale, recomposing it, starting our query. And if you are on a dashboard, each card will kill the pool of the previously running cards. This behavior does not happen with the local-file path because the secret is actually the filepath and we load that. So the file returned is always the same. It's only for the uploaded bits that we dump into a temp file (each time). Solution: Let's memoize the temp file created by the secret. We cannot use the secret as the key though because the secret can (always?) includes a byte array: ```clojure connection-test=> (hash {:x (.getBytes "hi")}) 1771366777 connection-test=> (hash {:x (.getBytes "hi")}) -709002180 ``` So we need to come up with a stable key. I'm using `value->string` here, falling back to `(gensym)` because `value->string` doesn't always return a value due to its cond. ```clojure (defn value->string "Returns the value of the given `secret` as a String. `secret` can be a Secret model object, or a secret-map (i.e. return value from `db-details-prop->secret-map`)." {:added "0.42.0"} ^String [{:keys [value] :as _secret}] (cond (string? value) value (bytes? value) (String. ^bytes value StandardCharsets/UTF_8))) ``` Why did this bug come up recently? [pull/21604](https://github.com/metabase/metabase/pull/21604) gives some light. That changed `(hash details)` -> `(hash (connection-details->spec driver details))` with the message > also made some tweaks so the SQL JDBC driver connection pool cache is > invalidated when the (unpooled) JDBC spec returned by > connection-details->spec changes, rather than when the details map > itself changes. This means that changes to other things outside of > connection details that affect the JDBC connection parameters, for > example the report-timezone or start-of-week Settings will now properly > result in the connection pool cache being flushed So we want to continue to hash the db spec but ensure that the spec is stable. * typehint the memoized var with ^java.io.File * Switch memoization key from String to vector of bytes Copying comment from Github: When you upload a sequence of bytes as a secret, we want to put them in a file once and only once and always reuse that temporary file. We will eventually hash the whole connection spec but I don't care about collisions there. It's only given the same sequence of bytes, you should always get back the exact same temporary file that has been created. So i'm making a function `f: Secret -> file` that given the same Secret always returns the exact same file. This was not the case before this. Each uploaded secret would return a new temporary file with the contents of the secret each time you got its value. So you would end up with 35 temporary files each with the same key in it. An easy way to get this guarantee is to memoize the function. But the secret itself isn't a good key to memoize against because it contains a byte array. If the memoization key is the byte-array itself, this will fail because arrays have reference identity: ```clojure user=> (= (.getBytes "hi") (.getBytes "hi")) false ``` So each time we load the same secret from the database we get a new byte array, ask for its temp file and get a different temp file each time. This means that memoization cannot be driven off of the byte array. But one way to gain this back quickly is just stuff those bytes into a string, because strings compare on value not identity. This is what is currently in the PR (before this change). I was banking on the assumption that Strings are just opaque sequences of bytes that will compare byte by byte, regardless of whether those bytes make sense. But you've pointed out a good point that maybe that is flirting with undefined behavior. If we use the hash of the contents of the byte array as the memoization key with (j.u.Arrays/hashCode array), then we open ourselves up the (albeit rare) case that two distinct secret values hash to the same value. This sounds really bad. Two distinct secrets (think two ssh keys) but both would map to only one file containing a single ssh key. An easy way to have the value semantics we want for the memoization is just to call (vec array) on the byte array and use this sequence of bytes as the memoization key. Clojure vectors compare by value not reference. So two secrets would return the same file if and only if the sequence of bytes are identical, in which case we would expect the files to be identical. This gives me the same guarantee that I was wanting from the String behavior I used initially without entwining this with charsets, utf8, etc. --- src/metabase/models/secret.clj | 21 +++++++++++++++- .../driver/sql_jdbc/connection_test.clj | 24 ++++++++++++++++-- test/metabase/models/secret_test.clj | 25 ++++++++++++++++++- 3 files changed, 66 insertions(+), 4 deletions(-) diff --git a/src/metabase/models/secret.clj b/src/metabase/models/secret.clj index bda70deb584..b75121da4d2 100644 --- a/src/metabase/models/secret.clj +++ b/src/metabase/models/secret.clj @@ -1,5 +1,6 @@ (ns metabase.models.secret (:require [cheshire.generate :refer [add-encoder encode-map]] + [clojure.core.memoize :as memoize] [clojure.java.io :as io] [clojure.string :as str] [clojure.tools.logging :as log] @@ -54,7 +55,7 @@ (->> (filter #(= :secret (keyword (:type %))) conn-props) (reduce (fn [acc prop] (assoc acc (:name prop) prop)) {}))) -(defn value->file! +(defn value->file!* "Returns the value of the given `secret` instance in the form of a file. If the given instance has a `:file-path` as its source, a `File` referring to that is returned. Otherwise, the `:value` is written to a temporary file, which is then returned. @@ -100,6 +101,24 @@ (.write out v))) tmp-file))) +(def + ^java.io.File + ^{:arglists '([{:keys [connection-property-name id value] :as secret} driver?])} + value->file! + "Returns the value of the given `secret` instance in the form of a file. If the given instance has a `:file-path` as + its source, a `File` referring to that is returned. Otherwise, the `:value` is written to a temporary file, which is + then returned. + + `driver?` is an optional argument that is only used if an ostensibly existing file value (i.e. `:file-path`) can't be + resolved, in order to render a more user-friendly error message (by looking up the display names of the connection + properties involved)." + (memoize/memo + (with-meta value->file!* + {::memoize/args-fn (fn [[secret _driver?]] + ;; not clear if value->string could return nil due to the cond so we'll just cache on a key + ;; that is unique + [(vec (:value secret))])}))) + (defn get-sub-props "Return a map of secret subproperties for the property `connection-property-name`." [connection-property-name] diff --git a/test/metabase/driver/sql_jdbc/connection_test.clj b/test/metabase/driver/sql_jdbc/connection_test.clj index 678783f67df..56a25a4b827 100644 --- a/test/metabase/driver/sql_jdbc/connection_test.clj +++ b/test/metabase/driver/sql_jdbc/connection_test.clj @@ -6,7 +6,7 @@ [metabase.driver.sql-jdbc.connection :as sql-jdbc.conn] [metabase.driver.sql-jdbc.test-util :as sql-jdbc.tu] [metabase.driver.util :as driver.u] - [metabase.models.database :refer [Database]] + [metabase.models :refer [Database Secret]] [metabase.sync :as sync] [metabase.test :as mt] [metabase.test.data :as data] @@ -153,4 +153,24 @@ (is (not= db-hash-1 db-hash-2))))))) (finally ;; restore the original test DB details, no matter what just happened - (db/update! Database (mt/id) :details (:details db)))))))) + (db/update! Database (mt/id) :details (:details db))))))) + (testing "postgres secrets are stable (#23034)" + (mt/with-temp* [Secret [secret {:name "file based secret" + :kind :perm-cert + :source nil + :value (.getBytes "super secret")}]] + (let [db {:engine :postgres + :details {:ssl true + :ssl-mode "verify-ca" + :ssl-root-cert-options "uploaded" + :ssl-root-cert-creator-id (mt/user->id :crowberto) + :ssl-root-cert-source nil + :ssl-root-cert-id (:id secret) + :ssl-root-cert-created-at "2022-07-25T15:57:51.556-05:00"}}] + (is (instance? java.io.File + (:sslrootcert (#'sql-jdbc.conn/connection-details->spec :postgres + (:details db)))) + "Secrets not loaded for db connections") + (is (= (#'sql-jdbc.conn/jdbc-spec-hash db) + (#'sql-jdbc.conn/jdbc-spec-hash db)) + "Same db produced different hashes due to secrets"))))) diff --git a/test/metabase/models/secret_test.clj b/test/metabase/models/secret_test.clj index 49a04bbb62a..43f3e802902 100644 --- a/test/metabase/models/secret_test.clj +++ b/test/metabase/models/secret_test.clj @@ -120,7 +120,30 @@ (let [result (byte-array (.length val-file))] (with-open [in (DataInputStream. (io/input-stream val-file))] (.readFully in result)) - result)))))))))) + result))))))))) + (testing "value->file! returns the same file for secrets" + (testing "for file paths" + (let [file-secret-val "dingbat" + ^File tmp-file (doto (File/createTempFile "value-to-file-test_" ".txt") + (.deleteOnExit))] + (spit tmp-file file-secret-val) + (mt/with-temp* [Secret [secret {:name "file based secret" + :kind :perm-cert + :source "file-path" + :value (.getAbsolutePath tmp-file)}]] + (is (instance? java.io.File (secret/value->file! secret nil))) + (is (= (secret/value->file! secret nil) + (secret/value->file! secret nil)) + "Secret did not return the same file")))) + (testing "for upload files (#23034)" + (mt/with-temp* [Secret [secret {:name "file based secret" + :kind :perm-cert + :source nil + :value (.getBytes "super secret")}]] + (is (instance? java.io.File (secret/value->file! secret nil))) + (is (= (secret/value->file! secret nil) + (secret/value->file! secret nil)) + "Secret did not return the same file"))))) (defn- decode-ssl-db-property [content mime-type property] (let [value-key (keyword (str property "-value")) -- GitLab