Delete old persistence schemas in redshift (#30013)

* Delete old persistence schemas in redshift Reminder that recently a change occurred that populates a cache_info table in each persisted schema: ```sql -- postgres/redshift test-data=# select * from metabase_cache_424a9_379.cache_info ; key | value ------------------+-------------------------------------- settings-version | 1 created-at | 2023-03-29T14:16:24.849866Z instance-uuid | 407e4ba8-2bab-470f-aeb5-9fc63fd18c4e instance-name | Metabase Test (4 rows) ``` So for each cache schema, we can classify it as - old style (a commit before this change. more and more infrequent) - new style recent - new style expired And we can delete them accordingly in startup: ``` 2023-04-11 20:09:03,402 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_359 2023-04-11 20:09:04,733 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_70 2023-04-11 20:09:05,557 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_71 2023-04-11 20:09:06,284 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_90 ... 2023-04-11 20:20:33,271 INFO data.redshift :: Dropping expired cache schema: metabase_cache_fe4a7_90 2023-04-11 20:20:34,284 INFO data.redshift :: Dropping old cache schema without `cache_info` table: metabase_cache_8f4b8_358 2023-04-11 20:20:35,076 INFO data.redshift :: Dropping old cache schema without `cache_info` table: metabase_cache_8f4b8_69 ... ``` It's possible this will at first cause a few flakes if we are unlucky enough to drop a cache schema without `cache_info` for an instance that is running tests at that point. But the `cache_info` table has been backported so the chances of that become smaller each day. I've let a week elapse from that change before committing this so hopefully it is not an issue in practice. Number of queries: Makes a single query to get all schemas, then for each schema makes a query to classify it. This can be unified into a single query with some shenanigans like ```clojure (sql/format {:select [:schema :created-at] :from {:union-all (for [schema schemas] {:select [[[:inline schema] :schema] [{:select [:value] :from [(keyword schema "cache_info")] :where [:= :key [:inline "created-at"]]} :created-at]]})}} {:dialect :ansi}) ``` But i found that this query is extremely slow and does not offer any benefit over the simpler, repeated queries. And as we run this on each commit now, the number of schemas will be far lower and it will be on the order of 5-10 schemas (and therefore queries) and therefore not an issue. * Ngoc's suggestions - docstring for `delete-old-schemas!` - combine nested `doseq` - use java-time over interop with java.time

Delete old persistence schemas in redshift (#30013)
be52e4fc · dpsutton · GitHub · 232472de · be52e4fc
Unverified Commit be52e4fc authored 1 year ago by dpsutton Committed by GitHub 1 year ago
--- a/modules/drivers/redshift/test/metabase/test/data/redshift.clj
+++ b/modules/drivers/redshift/test/metabase/test/data/redshift.clj
 (ns metabase.test.data.redshift
  (:require
   [clojure.java.jdbc :as jdbc]
+   [clojure.string :as str]
+   [java-time :as t]
   [metabase.driver.sql-jdbc.connection :as sql-jdbc.conn]
   [metabase.driver.sql-jdbc.sync :as sql-jdbc.sync]
   [metabase.driver.sql.test-util.unique-prefix :as sql.tu.unique-prefix]
@@ -71,15 +73,89 @@

 ;;; Create + destroy the schema used for this test session

-(defn- delete-old-schemas! [^java.sql.Connection conn]
-  (with-open [rset (.. conn getMetaData getSchemas)
-              stmt (.createStatement conn)]
-    (while (.next rset)
-      (let [schema (.getString rset "TABLE_SCHEM")
-            sql    (format "DROP SCHEMA IF EXISTS \"%s\" CASCADE;" schema)]
-        (when (sql.tu.unique-prefix/old-dataset-name? schema)
-          (log/info (u/format-color 'blue "[redshift] %s" sql))
-          (.execute stmt sql))))))
+(defn- reducible-result-set [^java.sql.ResultSet rset]
+  (reify clojure.lang.IReduceInit
+    (reduce [_ rf init]
+      (with-open [rset rset]
+        (loop [res init]
+          (if (.next rset)
+            (recur (rf res rset))
+            res))))))
+
+(defn- fetch-schemas [^java.sql.Connection conn]
+  (reify clojure.lang.IReduceInit
+    (reduce [_ rf init]
+      (reduce ((map (fn [^java.sql.ResultSet rset]
+                      (.getString rset "TABLE_SCHEM"))) rf)
+              init
+              (reducible-result-set (.. conn getMetaData getSchemas))))))
+
+(def ^Long HOURS-BEFORE-EXPIRED-THRESHOLD
+  "Number of hours that elapse before a persisted schema is considered expired."
+  6)
+
+(defn- classify-cache-schemas
+  "Classifies the persistence cache schemas. Returns a map with where each value is a (possibly empty) sequence of
+  schemas:
+
+  {:old-style-cache    schemas without a `cache_info` table
+   :recent             schemas with a `cache_info` table and are recently created
+   :expired            `cache_info` table and created [[HOURS-BEFORE-EXPIRED-THRESHOLD]] ago
+   :lacking-created-at should never happen, but if they lack an entry for `created-at`
+   :unknown-error      if an error was thrown while classifying the schema}"
+  [^java.sql.Connection conn schemas]
+  (let [threshold (t/minus (t/instant) (t/hours HOURS-BEFORE-EXPIRED-THRESHOLD))]
+    (with-open [stmt (.createStatement conn)]
+      (let [classify! (fn [schema-name]
+                        (try (let [sql (format "select value from %s.cache_info where key = 'created-at'"
+                                               schema-name)
+                                   rset (.executeQuery stmt sql)]
+                               (if (.next rset)
+                                 (let [date-string (.getString rset "value")
+                                       created-at  (java.time.Instant/parse date-string)]
+                                   (if (t/before? created-at threshold)
+                                     :expired
+                                     :recent))
+                                 :lacking-created-at))
+                             (catch com.amazon.redshift.util.RedshiftException e
+                               (if (re-find #"relation .* does not exist" (or (ex-message e) ""))
+                                 :old-style-cache
+                                 (do (log/error "Error classifying cache schema" e)
+                                     :unknown-error)))
+                             (catch Exception e
+                               (log/error "Error classifying cache schema" e)
+                               :unknown-error)))]
+
+        (group-by classify! schemas)))))
+
+(defn- delete-old-schemas!
+  "Remove unneeded schemas from redshift. Local databases are thrown away after a test run. Shared cloud instances do
+  not have this luxury. Test runs can create schemas where models are persisted and nothing cleans these up, leading
+  to redshift clusters hitting the max number of tables allowed."
+  [^java.sql.Connection conn]
+  (let [{old-convention   :old
+         caches-with-info :cache}    (reduce (fn [acc s]
+                                               (cond (sql.tu.unique-prefix/old-dataset-name? s)
+                                                     (update acc :old conj s)
+                                                     (str/starts-with? s "metabase_cache_")
+                                                     (update acc :cache conj s)
+                                                     :else acc))
+                                             {:old [] :cache []}
+                                             (fetch-schemas conn))
+        {:keys [expired
+                old-style-cache
+                lacking-created-at]} (classify-cache-schemas conn caches-with-info)
+        drop-sql                     (fn [schema-name] (format "DROP SCHEMA IF EXISTS \"%s\" CASCADE;"
+                                                               schema-name))]
+    ;; don't delete unknown-error and recent.
+    (with-open [stmt (.createStatement conn)]
+      (doseq [[collection fmt-str] [[old-convention "Dropping old data schema: %s"]
+                                    [expired "Dropping expired cache schema: %s"]
+                                    [lacking-created-at "Dropping cache without created-at info: %s"]
+                                    [old-style-cache "Dropping old cache schema without `cache_info` table: %s"]]
+              schema               collection]
+        (log/infof fmt-str schema)
+        (.execute stmt (drop-sql schema))))))

 (defn- create-session-schema! [^java.sql.Connection conn]
  (with-open [stmt (.createStatement conn)]