Skip to content
Snippets Groups Projects
Unverified Commit 6ca66e6e authored by dpsutton's avatar dpsutton Committed by GitHub
Browse files

Handle `nil` cache during initial cache population (#22779)

* Handle `nil` cache during initial cache population

this bug only happens soon after startup. The general idea of the cache
is that we repopulate if needed, and other threads can use the stale
version while repopulating. This is valid except at startup before the
cache has finished populating. The `(setting.cache/cache)` value will
still be nil to other threads while the first thread populates it.

this is a race but sounds like an unlikely one to hit in practice. But
we have a special case: `site-url`. We have middleware that checks the
site-url on every api request and sets it if it is nil. So if an
instance gets two requests before the cache has been populated, it will
set it to whatever value is in the headers, overriding whatever has been
set in the UI.

The following snippet was how i diagnosed this. But to simulate startup
you can just `(require 'metabase.models.setting.cache :reload)` to get
back to an "empty" cache and let the threads race each other. I also put
high contention on reloading by dropping the millisecond cache update
interval to 7 milliseconds. But this refresh interval does not matter:
we just fall back to the old version of the cache. It is only the
initial cache population that is using `nil` as a current cache.

```clojure
public-settings=> (let [mismatch1 (atom [])
                        mismatch2 (atom [])
                        iterations 100000
                        latch (java.util.concurrent.CountDownLatch. 2)
                        nil-value (atom [])]
                    (future
                      (dotimes [_ iterations]
                        (let [value (site-url)
                              cache (metabase.models.setting.cache/cache)]
                          (when (not= value "http://localhost:3000")
                            (swap! mismatch1 conj value))
                          (when (not= (get cache "site-url") "http://localhost:3000")
                            (swap! nil-value conj cache))))
                      (.countDown latch))
                    (future
                      (dotimes [_ iterations]
                        (let [value (site-url)
                              cache (metabase.models.setting.cache/cache)]
                          (when (not= value "http://localhost:3000")
                            (swap! mismatch2 conj value))
                          (when (not= (get cache "site-url") "http://localhost:3000")
                            (swap! nil-value conj cache))))
                      (.countDown latch))
                    (.await latch)
                    (def nil-value nil-value)
                    [(count @mismatch1) (take 10 @mismatch1) (count @mismatch2) (take 10 @mismatch2)])
[0 () 1616 (nil nil nil nil nil nil nil nil nil nil)]
```

* Don't attempt to get setting values from db/cache before db ready

* We check `db-is-setup?` above this level

db check has to happen higher up at `db-or-cache-value` as neither the
db will work, nor the cache based on selecting from the db, if the db is
not set up.

* Remove some of the nested whens (thanks howon)

Lots of nesting due to requiring and checking for nil of the required
var. This can never really be nil but just to overly cautiously guard,
put in an or with the `(constantly false)`. Combine the two predicates
into a single `and` and then swap our homegrown `(when (seq x) x)` for
the equivalent `(not-empty x)`.
parent 5ae75e8c
No related branches found
No related tags found
No related merge requests found
......@@ -383,15 +383,24 @@
(defn- db-or-cache-value
"Get the value, if any, of `setting-definition-or-name` from the DB (using / restoring the cache as needed)."
^String [setting-definition-or-name]
(let [setting (resolve-setting setting-definition-or-name)]
(when (allows-site-wide-values? setting)
(let [setting (resolve-setting setting-definition-or-name)
db-is-set-up? (or (requiring-resolve 'metabase.db/db-is-set-up?)
;; this should never be hit. it is just overly cautious against a NPE here. But no way this
;; cannot resolve
(constantly false))]
;; cannot use db (and cache populated from db) if db is not set up
(when (and (db-is-set-up?) (allows-site-wide-values? setting))
(let [v (if *disable-cache*
(db/select-one-field :value Setting :key (setting-name setting-definition-or-name))
(do
(setting.cache/restore-cache-if-needed!)
(clojure.core/get (setting.cache/cache) (setting-name setting-definition-or-name))))]
(when (seq v)
v)))))
(let [cache (setting.cache/cache)]
(if (nil? cache)
;; If another thread is populating the cache for the first time, we will have a nil value for
;; the cache and must hit the db while the cache populates
(db/select-one-field :value Setting :key (setting-name setting-definition-or-name))
(clojure.core/get cache (setting-name setting-definition-or-name))))))]
(not-empty v)))))
(defn default-value
"Get the `:default` value of `setting-definition-or-name` if one was specified."
......
......@@ -157,11 +157,8 @@
;; attempt to acquire the lock. Returns immediately if lock is is already held.
(when (.tryLock restore-cache-lock)
(try
;; don't try to restore the cache before the application DB is ready, it's not going to work...
(when-let [db-is-set-up? (resolve 'metabase.db/db-is-set-up?)]
(when (db-is-set-up?)
(reset! last-update-check (System/currentTimeMillis))
(when (cache-out-of-date?)
(restore-cache!))))
(reset! last-update-check (System/currentTimeMillis))
(when (cache-out-of-date?)
(restore-cache!))
(finally
(.unlock restore-cache-lock)))))))
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment