-
- Downloads
Ensure uploaded secrets are stable (#24325)
* Ensure uploaded secrets are stable Fixes: https://github.com/metabase/metabase/issues/23034 Background: Uploaded secrets are stored as bytes in our application db since cloud doesn't have a filesystem. To make db connections we stuff them into temporary files and use those files. We also are constantly watching for db detail changes so we can recompose the connection pool. Each time you call `db->pooled-connection-spec` we check if the hash of the connection spec has changed and recompose the pool if it has. Problem: These uploaded files have temporary files and we make new temp files each time we call `db->pooled-connection-spec`. So the hashes always appear different: ```clojure connection=> (= x y) true connection=> (take 2 (clojure.data/diff (connection-details->spec :postgres (:details x)) (connection-details->spec :postgres (:details y)))) ({:sslkey #object[java.io.File 0x141b0f09 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_1388256635324085910.tmp"], :sslrootcert #object[java.io.File 0x6f443fac "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_9248342447139746747.tmp"], :sslcert #object[java.io.File 0xbb13300 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_17076432929457451876.tmp"]} {:sslkey #object[java.io.File 0x6fbb3b7b "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_18336254363340056265.tmp"], :sslrootcert #object[java.io.File 0x6ba4c390 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_11775804023700307206.tmp"], :sslcert #object[java.io.File 0x320184a0 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_10098480793225259237.tmp"]}) ``` And this is quite a problem: each time we get a db connection we are making a new file, putting the contents of the secret in it, and then considering the pool stale, recomposing it, starting our query. And if you are on a dashboard, each card will kill the pool of the previously running cards. This behavior does not happen with the local-file path because the secret is actually the filepath and we load that. So the file returned is always the same. It's only for the uploaded bits that we dump into a temp file (each time). Solution: Let's memoize the temp file created by the secret. We cannot use the secret as the key though because the secret can (always?) includes a byte array: ```clojure connection-test=> (hash {:x (.getBytes "hi")}) 1771366777 connection-test=> (hash {:x (.getBytes "hi")}) -709002180 ``` So we need to come up with a stable key. I'm using `value->string` here, falling back to `(gensym)` because `value->string` doesn't always return a value due to its cond. ```clojure (defn value->string "Returns the value of the given `secret` as a String. `secret` can be a Secret model object, or a secret-map (i.e. return value from `db-details-prop->secret-map`)." {:added "0.42.0"} ^String [{:keys [value] :as _secret}] (cond (string? value) value (bytes? value) (String. ^bytes value StandardCharsets/UTF_8))) ``` Why did this bug come up recently? [pull/21604](https://github.com/metabase/metabase/pull/21604) gives some light. That changed `(hash details)` -> `(hash (connection-details->spec driver details))` with the message > also made some tweaks so the SQL JDBC driver connection pool cache is > invalidated when the (unpooled) JDBC spec returned by > connection-details->spec changes, rather than when the details map > itself changes. This means that changes to other things outside of > connection details that affect the JDBC connection parameters, for > example the report-timezone or start-of-week Settings will now properly > result in the connection pool cache being flushed So we want to continue to hash the db spec but ensure that the spec is stable. * typehint the memoized var with ^java.io.File * Switch memoization key from String to vector of bytes Copying comment from Github: When you upload a sequence of bytes as a secret, we want to put them in a file once and only once and always reuse that temporary file. We will eventually hash the whole connection spec but I don't care about collisions there. It's only given the same sequence of bytes, you should always get back the exact same temporary file that has been created. So i'm making a function `f: Secret -> file` that given the same Secret always returns the exact same file. This was not the case before this. Each uploaded secret would return a new temporary file with the contents of the secret each time you got its value. So you would end up with 35 temporary files each with the same key in it. An easy way to get this guarantee is to memoize the function. But the secret itself isn't a good key to memoize against because it contains a byte array. If the memoization key is the byte-array itself, this will fail because arrays have reference identity: ```clojure user=> (= (.getBytes "hi") (.getBytes "hi")) false ``` So each time we load the same secret from the database we get a new byte array, ask for its temp file and get a different temp file each time. This means that memoization cannot be driven off of the byte array. But one way to gain this back quickly is just stuff those bytes into a string, because strings compare on value not identity. This is what is currently in the PR (before this change). I was banking on the assumption that Strings are just opaque sequences of bytes that will compare byte by byte, regardless of whether those bytes make sense. But you've pointed out a good point that maybe that is flirting with undefined behavior. If we use the hash of the contents of the byte array as the memoization key with (j.u.Arrays/hashCode array), then we open ourselves up the (albeit rare) case that two distinct secret values hash to the same value. This sounds really bad. Two distinct secrets (think two ssh keys) but both would map to only one file containing a single ssh key. An easy way to have the value semantics we want for the memoization is just to call (vec array) on the byte array and use this sequence of bytes as the memoization key. Clojure vectors compare by value not reference. So two secrets would return the same file if and only if the sequence of bytes are identical, in which case we would expect the files to be identical. This gives me the same guarantee that I was wanting from the String behavior I used initially without entwining this with charsets, utf8, etc.
Showing
- src/metabase/models/secret.clj 20 additions, 1 deletionsrc/metabase/models/secret.clj
- test/metabase/driver/sql_jdbc/connection_test.clj 22 additions, 2 deletionstest/metabase/driver/sql_jdbc/connection_test.clj
- test/metabase/models/secret_test.clj 24 additions, 1 deletiontest/metabase/models/secret_test.clj
Loading
Please register or sign in to comment