-
- Downloads
Delete old persistence schemas in redshift (#30013)
* Delete old persistence schemas in redshift Reminder that recently a change occurred that populates a cache_info table in each persisted schema: ```sql -- postgres/redshift test-data=# select * from metabase_cache_424a9_379.cache_info ; key | value ------------------+-------------------------------------- settings-version | 1 created-at | 2023-03-29T14:16:24.849866Z instance-uuid | 407e4ba8-2bab-470f-aeb5-9fc63fd18c4e instance-name | Metabase Test (4 rows) ``` So for each cache schema, we can classify it as - old style (a commit before this change. more and more infrequent) - new style recent - new style expired And we can delete them accordingly in startup: ``` 2023-04-11 20:09:03,402 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_359 2023-04-11 20:09:04,733 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_70 2023-04-11 20:09:05,557 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_71 2023-04-11 20:09:06,284 INFO data.redshift :: Dropping expired cache schema: metabase_cache_0149c_90 ... 2023-04-11 20:20:33,271 INFO data.redshift :: Dropping expired cache schema: metabase_cache_fe4a7_90 2023-04-11 20:20:34,284 INFO data.redshift :: Dropping old cache schema without `cache_info` table: metabase_cache_8f4b8_358 2023-04-11 20:20:35,076 INFO data.redshift :: Dropping old cache schema without `cache_info` table: metabase_cache_8f4b8_69 ... ``` It's possible this will at first cause a few flakes if we are unlucky enough to drop a cache schema without `cache_info` for an instance that is running tests at that point. But the `cache_info` table has been backported so the chances of that become smaller each day. I've let a week elapse from that change before committing this so hopefully it is not an issue in practice. Number of queries: Makes a single query to get all schemas, then for each schema makes a query to classify it. This can be unified into a single query with some shenanigans like ```clojure (sql/format {:select [:schema :created-at] :from {:union-all (for [schema schemas] {:select [[[:inline schema] :schema] [{:select [:value] :from [(keyword schema "cache_info")] :where [:= :key [:inline "created-at"]]} :created-at]]})}} {:dialect :ansi}) ``` But i found that this query is extremely slow and does not offer any benefit over the simpler, repeated queries. And as we run this on each commit now, the number of schemas will be far lower and it will be on the order of 5-10 schemas (and therefore queries) and therefore not an issue. * Ngoc's suggestions - docstring for `delete-old-schemas!` - combine nested `doseq` - use java-time over interop with java.time
Please register or sign in to comment