Skip to content
Snippets Groups Projects
Unverified Commit a33fa568 authored by dpsutton's avatar dpsutton Committed by GitHub
Browse files

Ensure we are paginating resultsets (#18477)

* Ensure we are paginating resultsets

Made big tables in both pg and mysql

pg:
```sql
create table large_table
(
    id         serial primary key,
    large_text text
);

insert into large_table (large_text)
select repeat('Z', 4000)
from generate_series(1, 500000)
```

In mysql use the repl:
```clojure

  (jdbc/execute! (sql-jdbc.conn/db->pooled-connection-spec 5)
                 ["CREATE TABLE large_table (id int NOT NULL PRIMARY KEY AUTO_INCREMENT, foo text);"])

  (do
    (jdbc/insert-multi! (sql-jdbc.conn/db->pooled-connection-spec 5)
                        :large_table
                        (repeat 50000 {:foo (apply str (repeat 5000 "Z"))}))
    :done)

  (jdbc/execute! (sql-jdbc.conn/db->pooled-connection-spec 5)
                 ["ALTER TABLE large_table add column properties json default null"])

  (jdbc/execute! (sql-jdbc.conn/db->pooled-connection-spec 5)
                 ["update large_table set properties = '{\"data\":{\"cols\":null,\"native_form\":{\"query\":\"SELECT
                 `large_table`.`id` AS `id`, `large_table`.`foo` AS `foo` FROM `large_table` LIMIT
                 1\",\"params\":null},\"results_timezone\":\"UTC\",\"results_metadata\":{\"checksum\":\"0MnSKb8145UERWn18F5Uiw==\",\"columns\":[{\"semantic_type\":\"type/PK\",\"coercion_strategy\":null,\"name\":\"id\",\"field_ref\":[\"field\",200,null],\"effective_type\":\"type/Integer\",\"id\":200,\"display_name\":\"ID\",\"fingerprint\":null,\"base_type\":\"type/Integer\"},{\"semantic_type\":null,\"coercion_strategy\":null,\"name\":\"foo\",\"field_ref\":[\"field\",201,null],\"effective_type\":\"type/Text\",\"id\":201,\"display_name\":\"Foo\",\"fingerprint\":{\"global\":{\"distinct-count\":1,\"nil%\":0.0},\"type\":{\"type/Text\":{\"percent-json\":0.0,\"percent-url\":0.0,\"percent-email\":0.0,\"percent-state\":0.0,\"average-length\":500.0}}},\"base_type\":\"type/Text\"}]},\"insights\":null,\"count\":1}}'"])

```

and then from the terminal client repeat this until we have 800,000 rows:
```sql
insert into large_table (foo, properties) select foo, properties from large_table;
```

Then can exercise from code with the following:

```clojure
(-> (qp/process-query {:database 5 ; use appropriate db and tables here
                        :query {:source-table 42
                                ;; :limit 1000000
                                },
                        :type :query}
                        ;; don't retain any rows, purely just counting
                        ;; so resultset is what retains too many rows
                       {:rff (fn [metadata]
                               (let [c (volatile! 0)]
                                 (fn count-rff
                                   ([]
                                    {:data metadata})
                                   ([result]
                                    (assoc-in result [:data :count] @c))
                                   ([result _row]
                                    (vswap! c inc)
                                    result))))
                        })
     :data :count)
```

PG was far easier to blow up. Mysql took quite a bit of data.

Then we just set a fetch size on the result set so that we (hopefully)
only have than many rows in memory in the resultset at once. The
streaming will write to the download stream as it goes.

PG has one other complication in that the fetch size can only be honored
if autoCommit is false. The reasoning seems to be that each statement is
in a transaction and commits and to commit it has to close resultsets
and therefore it has to realize the entire resultset otherwise you would
only get the initial page if any.

* Set default fetch size to 500

;; Long queries on gcloud pg
;; limit 10,000
;; fetch size | t1   | t2   | t3
;; -------------------------------
;; 100        | 6030 | 8804 | 5986
;; 500        | 1537 | 1535 | 1494
;; 1000       | 1714 | 1802 | 1611
;; 3000       | 1644 | 1595 | 2044

;; limit 30,000
;; fetch size | t1    | t2    | t3
;; -------------------------------
;; 100        | 17341 | 15991 | 16061
;; 500        | 4112  | 4182  | 4851
;; 1000       | 5075  | 4546  | 4284
;; 3000       | 5405  | 5055  | 4745

* Only set fetch size if not default (0)

Details of `:additional-options "defaultRowFetchSize=3000"` can set a
default fetch size and we can easily honor that. This allows overriding
per db without much work on our part.

* Remove redshift custom fetch size code

This removes the automatic insertion of a defaultRowFetchSize=5000 on
redshift dbs. Now we always set this to 500 in the sql-jdbc statement
and prepared statement fields. And we also allow custom ones to persist
over our default of 500.

One additional benefit of removing this is that it always included the
option even if a user added ?defaultRowFetchSize=300 themselves so this
should actually give more control to our users.

Profiling quickly on selecting 79,000 rows from redshift, there
essentially no difference between a fetch size of 500 (the default) and
5000 (the old redshift default); both were 12442 ms or so.

* unused require of settings in redshift tests

* Appease the linter

* Unnecessary redshift connection details tests
parent 76a35846
No related branches found
No related tags found
No related merge requests found
......@@ -207,8 +207,7 @@
(merge
{:classname "com.amazon.redshift.jdbc42.Driver"
:subprotocol "redshift"
:subname (str "//" host ":" port "/" db
"?defaultRowFetchSize=" (pubset/redshift-fetch-size))
:subname (str "//" host ":" port "/" db)
:ssl true
:OpenSourceSubProtocolOverride false}
(dissoc opts :host :port :db))))
......
......@@ -9,7 +9,6 @@
[metabase.driver.sql-jdbc.sync.describe-database :as sync.describe-database]
[metabase.models.database :refer [Database]]
[metabase.models.field :refer [Field]]
[metabase.models.setting :as setting]
[metabase.models.table :refer [Table]]
[metabase.plugins.jdbc-proxy :as jdbc-proxy]
[metabase.public-settings :as pubset]
......@@ -22,8 +21,7 @@
[metabase.test.util :as tu]
[metabase.util :as u]
[toucan.db :as db])
(:import [java.sql ResultSet ResultSetMetaData]
metabase.plugins.jdbc_proxy.ProxyDriver))
(:import metabase.plugins.jdbc_proxy.ProxyDriver))
(use-fixtures :once (fixtures/initialize :plugins))
(use-fixtures :once (fixtures/initialize :db))
......@@ -208,35 +206,6 @@
(partial into {})
(db/select [Field :name :database_type :base_type] :table_id table-id {:order-by [:name]}))))))))))
(defn- assert-jdbc-url-fetch-size [db fetch-size]
(with-open [conn (.getConnection (sql-jdbc.execute/datasource db))]
(let [md (.getMetaData conn)
url (.getURL md)]
(is (str/includes? url (str "defaultRowFetchSize=" fetch-size))))))
(deftest test-jdbc-fetch-size
(testing "Redshift JDBC fetch size is set correctly in PreparedStatement"
(mt/test-driver :redshift
;; the default value should always be picked up if nothing is set
(assert-jdbc-url-fetch-size (mt/db) (:default (setting/resolve-setting :redshift-fetch-size)))
(mt/with-temporary-setting-values [redshift-fetch-size "14"]
;; create a new DB in order to pick up the change to the setting here
(mt/with-temp Database [db {:engine :redshift, :details (:details (mt/db))}]
(mt/with-db db
;; make sure the JDBC URL has the defaultRowFetchSize parameter set correctly
(assert-jdbc-url-fetch-size db 14)
;; now, actually run a query and see if the PreparedStatement has the right fetchSize set
(mt/with-everything-store
(let [orig-fn sql-jdbc.execute/reducible-rows
new-fn (fn [driver ^ResultSet rs ^ResultSetMetaData rsmeta canceled-chan]
(is (= 14 (.getFetchSize (.getStatement rs))))
(orig-fn driver rs rsmeta canceled-chan))]
(with-redefs [sql-jdbc.execute/reducible-rows new-fn]
(is (= [[1]] (-> {:query "SELECT 1"}
(mt/native-query)
(qp/process-query)
(mt/rows)))))))))))))
(deftest syncable-schemas-test
(mt/test-driver :redshift
(testing "Should filter out schemas for which the user has no perms"
......@@ -295,31 +264,3 @@
(is (contains? (schemas) fake-schema-name))))
(testing "normally, ::fake-schema should be filtered out (because it does not exist)"
(is (not (contains? (schemas) fake-schema-name)))))))))))))
(deftest connection-details->spec-test
(mt/with-temporary-setting-values [redshift-fetch-size "14"]
(testing "Configure connection without additional-options should include defaultRowFetchSize"
(is (= {:classname "com.amazon.redshift.jdbc42.Driver"
:subprotocol "redshift"
:subname "//testhost:5432/testdb?defaultRowFetchSize=14"
:OpenSourceSubProtocolOverride false
:user "testuser"
:ssl true}
(sql-jdbc.conn/connection-details->spec :redshift
{:host "testhost"
:port 5432
:db "testdb"
:user "testuser"}))))
(testing "Configure connection with additional-options should not replace defaultRowFetchSize"
(is (= {:classname "com.amazon.redshift.jdbc42.Driver"
:subprotocol "redshift"
:subname "//testhost:5432/testdb?defaultRowFetchSize=14&TCPKeepAlive=FALSE"
:OpenSourceSubProtocolOverride false
:user "testuser"
:ssl true}
(sql-jdbc.conn/connection-details->spec :redshift
{:host "testhost"
:port 5432
:db "testdb"
:user "testuser"
:additional-options "TCPKeepAlive=FALSE"}))))))
......@@ -14,6 +14,7 @@
[metabase.driver.sql-jdbc.execute.old-impl :as execute.old]
[metabase.driver.sql-jdbc.sync.interface :as sql-jdbc.sync]
[metabase.mbql.util :as mbql.u]
[metabase.models.setting :refer [defsetting]]
[metabase.query-processor.context :as context]
[metabase.query-processor.error-type :as qp.error-type]
[metabase.query-processor.interface :as qp.i]
......@@ -195,6 +196,12 @@
(.setReadOnly conn true)
(catch Throwable e
(log/debug e (trs "Error setting connection to read-only"))))
(try
;; set autocommit to false so that pg honors fetchSize. Otherwise it commits the transaction and needs the
;; entire realized result set
(.setAutoCommit conn false)
(catch Throwable e
(log/debug e (trs "Error setting connection to autoCommit false"))))
(try
(.setHoldability conn ResultSet/CLOSE_CURSORS_AT_COMMIT)
(catch Throwable e
......@@ -267,6 +274,13 @@
(set-parameter driver stmt (inc i) param))
params)))
(defsetting ^:private sql-jdbc-fetch-size
"Fetch size for result sets. We want to ensure that the jdbc ResultSet objects are not realizing the entire results
in memory."
:default 500
:type :integer
:visibility :internal)
(defmethod prepared-statement :sql-jdbc
[driver ^Connection conn ^String sql params]
(let [stmt (.prepareStatement conn
......@@ -279,6 +293,11 @@
(.setFetchDirection stmt ResultSet/FETCH_FORWARD)
(catch Throwable e
(log/debug e (trs "Error setting prepared statement fetch direction to FETCH_FORWARD"))))
(try
(when (zero? (.getFetchSize stmt))
(.setFetchSize stmt (sql-jdbc-fetch-size)))
(catch Throwable e
(log/debug e (trs "Error setting prepared statement fetch size to fetch-size"))))
(set-parameters! driver stmt params)
stmt
(catch Throwable e
......@@ -301,6 +320,11 @@
(.setFetchDirection stmt ResultSet/FETCH_FORWARD)
(catch Throwable e
(log/debug e (trs "Error setting statement fetch direction to FETCH_FORWARD"))))
(try
(when (zero? (.getFetchSize stmt))
(.setFetchSize stmt (sql-jdbc-fetch-size)))
(catch Throwable e
(log/debug e (trs "Error setting statement fetch size to fetch-size"))))
stmt
(catch Throwable e
(.close stmt)
......
......@@ -406,9 +406,3 @@
:visibility :public
:type :integer
:default 180)
(defsetting redshift-fetch-size
(deferred-tru "Controls the fetch size used for Redshift queries (in PreparedStatement), via defaultRowFetchSize.")
:visibility :public
:type :integer
:default 5000)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment