Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated .
  1. Apr 04, 2023
  2. Apr 03, 2023
    • dpsutton's avatar
      Manual backport persistent schema cache info table (#29741) · 03efe54d
      dpsutton authored
      * Create `cache_info` table in our persisted schemas
      
      ```sql
      -- postgres/redshift
      test-data=# select * from metabase_cache_424a9_379.cache_info ;
             key        |                value
      ------------------+--------------------------------------
       settings-version | 1
       created-at       | 2023-03-28
       instance-uuid    | 407e4ba8-2bab-470f-aeb5-9fc63fd18c4e
       instance-name    | Metabase Test
      (4 rows)
      ```
      
      ```sql
      --mysql
      mysql> select * from metabase_cache_424a9_435.cache_info;
      +------------------+--------------------------------------+
      | key              | value                                |
      +------------------+--------------------------------------+
      | settings-version | 1                                    |
      | created-at       | 2023-03-28                           |
      | instance-uuid    | 407e4ba8-2bab-470f-aeb5-9fc63fd18c4e |
      | instance-name    | Metabase Test                        |
      +------------------+--------------------------------------+
      ```
      
      our key values in v1:
      ```clojure
      (defn kv-table-values
        "Version 1 of the values to go in the key/value table `cache_info` table."
        []
        [{:key   "settings-version"
          :value "1"}
         {:key   "created-at"
          ;; "2023-03-28"
          :value (.format (LocalDate/now) DateTimeFormatter/ISO_LOCAL_DATE)}
         {:key   "instance-uuid"
          :value (public-settings/site-uuid)}
         {:key   "instance-name"
          :value (public-settings/site-name)}])
      ```
      
      This will enable us to delete cached schemas in shared databases (cloud
      versions like redshift) without risking wiping out a cached schema
      during an active test run.
      
      The code to delete the schemas will be something like,
      
      ```clojure
      (def spec (sql-jdbc.conn/connection-details->spec
                 :redshift
                 {:host "xxx.us-east-1.reshift.amazonaws.com"
                  :db "db"
                  :port 5439
                  :user "user"
                  :password "password"}))
      
      (let [days-prior 1 ;; probably 2 to handle crossing day time. Can also adjust to 6 hours, etc
            threshold (.minus (java.time.LocalDate/now)
                              days-prior
                              java.time.temporal.ChronoUnit/DAYS)]
        (doseq [schema (->> ["select nspname from pg_namespace where nspname like 'metabase_cache%'"]
                            (jdbc/query spec)
                            (map :nspname))]
          (let [[{created :value}] (jdbc/query spec (sql/format {:select [:value]
                                                                 :from [(keyword schema "cache_info")]
                                                                 :where [:= :key "created-at"]}
                                                                {:dialect :ansi}))]
            (when created
              (let [creation-day (java.time.LocalDate/parse created)]
                (when (.isBefore creation-day threshold)
                  (jdbc/execute! spec [(format "drop schema %s cascade" schema)])))))))
      ```
      
      or if we want to do it in a single query:
      
      ```clojure
      schemas=> (let [days-prior 1
                      threshold  (.minus (java.time.LocalDate/now)
                                         days-prior
                                         java.time.temporal.ChronoUnit/DAYS)]
                  (let [schemas (->> ["select nspname
                               from pg_namespace
                               where nspname like 'metabase_cache%'"]
                                     (jdbc/query spec)
                                     (map :nspname))]
                    (jdbc/with-db-connection [conn spec]
                      (jdbc/execute! conn [(format "set search_path= %s;" (str/join ", " schemas))])
                      (let [sql               (sql/format {:select [:schema :created-at]
                                                           :from   {:union-all
                                                                    (for [schema schemas]
                                                                      {:select [[[:inline schema] :schema]
                                                                                [{:select [:value]
                                                                                  :from   [(keyword schema "cache_info")]
                                                                                  :where  [:= :key [:inline "created-at"]]}
                                                                                 :created-at]]})}}
                                                          {:dialect :ansi})
                            old?              (fn [{create-str :created-at}]
                                                (let [created (java.time.LocalDate/parse create-str)]
                                                  (.isBefore created threshold)))
                            schemas-to-delete (->> (jdbc/query spec sql)
                                                   (filter old?)
                                                   (map :schema))]
                        schemas-to-delete))))
      ("metabase_cache_424a9_503") ;; when there are any, string join them and delete them as above
      ```
      
      * Use Instants so we can compare by hours instead of just days
      
      * require toucan as db
      Unverified
      03efe54d
    • Anton Kostenko's avatar
      Manual backport of 29686 (#29761) · 5307e06b
      Anton Kostenko authored
      Unverified
      5307e06b
    • metabase-bot[bot]'s avatar
    • Noah Moss's avatar
      backport 29738 (#29755) · e5aded4e
      Noah Moss authored
      Unverified
      e5aded4e
    • Denis Berezin's avatar
    • metabase-bot[bot]'s avatar
  3. Mar 31, 2023
    • metabase-bot[bot]'s avatar
      Don't log in bigquery results hotpath (#29727) (#29740) · 69f3ccf5
      metabase-bot[bot] authored
      * Don't log in bigquery results hotpath
      
      Right now we log in the parser of bigquery results for unrecognized
      types. But the problem is that we log for each _value_ and not each
      column. This is causing an _enormous_ amount of logs and performance
      penalty.
      
      See
      - https://github.com/metabase/metabase/issues/29118 (performance)
      - https://github.com/metabase/metabase/issues/28868
      
       (filling disk space)
      
      This log was added between 45.1 and 45.3
      
      ```diff
      ❯ git diff v0.45.1..v0.45.3 modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/**
      diff --git a/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj b/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj
      index a0d8081c30..f367199b55 100644
      --- a/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj
      +++ b/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj
      @@ -22,7 +22,7 @@
                   [metabase.util :as u]
                   [metabase.util.date-2 :as u.date]
                   [metabase.util.honeysql-extensions :as hx]
      -            [metabase.util.i18n :refer [tru]]
      +            [metabase.util.i18n :refer [trs tru]]
                   [pretty.core :refer [PrettyPrintable]]
                   [schema.core :as s])
         (:import [com.google.cloud.bigquery Field$Mode FieldValue]
      @@ -88,7 +88,8 @@
           (parse-fn v)))
      
       (defmethod parse-result-of-type :default
      -  [_ column-mode _ v]
      +  [column-type column-mode _ v]
      +  (log/warn (trs "Warning: missing type mapping for parsing BigQuery results of type {0}." column-type))
         (parse-value column-mode v identity))
      ```
      
      The result is that selecting 50,000 rows for download in excel:
      
      | version                      | time       |
      |------------------------------|------------|
      | 0.45.1                       | 28 seconds |
      | 0.45.3                       | 52 seconds |
      | 0.45.3 with logging disabled | 30 seconds |
      
      (disable logging by adding `<Logger
      name="metabase.driver.bigquery-cloud-sdk.query-processor"
      level="ERROR"/>` and `-Dlog4j2.configurationFile=log4j2.xml` to jar
      startup)
      
      For the query (3 string columns, 5 rows):
      
      ```sql
      SELECT game_id, first_name, last_name
      FROM `bigquery-public-data.ncaa_basketball.mbb_players_games_sr`
      LIMIT 5
      ```
      
      BEFORE:
      
      ```
      ```
      2023-03-31 17:17:52,146 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,147 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,147 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
      2023-03-31 17:17:52,155 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 795.2 ms (6 DB calls) App DB connections: 0/10 Jetty threads: 4/50 (2 idle, 0 queued) (192 total active threads) Queries in flight: 0 (0 queued)
      ```
      
      Note this is 15 logs (3 columns x 5 rows)
      
      AFTER:
      
      ```
      2023-03-31 17:19:15,694 WARN driver.bigquery-cloud-sdk :: Warning: missing type mapping for parsing BigQuery results column game_id of type STRING.
      2023-03-31 17:19:15,694 WARN driver.bigquery-cloud-sdk :: Warning: missing type mapping for parsing BigQuery results column first_name of type STRING.
      2023-03-31 17:19:15,694 WARN driver.bigquery-cloud-sdk :: Warning: missing type mapping for parsing BigQuery results column last_name of type STRING.
      2023-03-31 17:19:15,757 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 973.5 ms (6 DB calls) App DB connections: 0/10 Jetty threads: 4/50 (3 idle, 0 queued) (193 total active threads) Queries in flight: 0 (0 queued)
      ```
      
      * unused require to appease our overlords
      
      Co-authored-by: default avatardpsutton <dan@dpsutton.com>
      Unverified
      69f3ccf5
    • metabase-bot[bot]'s avatar
  4. Mar 30, 2023
  5. Mar 29, 2023
  6. Mar 28, 2023
  7. Mar 27, 2023
  8. Mar 25, 2023
  9. Mar 24, 2023
Loading