Skip to content
Snippets Groups Projects
Unverified Commit 564e642f authored by dpsutton's avatar dpsutton Committed by GitHub
Browse files

Raise model index limit (#40642)

* Use expression fix on both pk-ref and value-ref

The gist: inside the query expressions are referred to as [:expression
"foo"]. But as a nested query, it's just another field, and the fact
that it was an expression is irrelevant and should not leak from that
stage. So "downstream" should just be [:field "foo" base-type].

Oddly, if you just fix one field or the other, the query would be
"valid", would log an error, but no valid values would be returned.

It seems like the invalid expression reference was just dropped so we
only selected a single column and those were removed (we run `(filter
valid-tuples?)` as we only want valid key/value pairs).

So let's add a `(> (count values) 0)` which was failing previously.

* Fix indexing error

When updating the model index, it diffs the existing values vs the new
values and deletes appropriate and adds the new values. But it was using
the wrong column when updating the model index values. It used `pk_ref`
but that table uses `model_pk`. This led to the following errors on
stats:

```sql
select pk_ref, value_ref, state, error from model_index where state = 'error'

pk_ref,value_ref,state,error
"[""field"",563028,null]","[""field"",563018,null]",error,"ERROR: column ""pk_ref"" does not exist Position: 68"
"[""field"",534030,null]","[""field"",534037,null]",error,"ERROR: column ""pk_ref"" does not exist Position: 68"
"[""field"",545945,null]","[""field"",545948,null]",error,"ERROR: column ""pk_ref"" does not exist Position: 68"
```

Now those errors will be fixed and the model indexes will correctly
track the values

* Bump indexing threshold to 25,000

Had been limited to 5,000 rows of model indexing. This was a made up
number as we went into this endeavor. But it's proving to be far too
small. Bumping to 25,000

* Parititon deletions and inserts

With a higher limit need to ensure that we don't send too large of a
query.

```clojure
model-index=> (t2/insert! ModelIndexValue
                          (map (fn [id]
                                 {:name           (str (gensym "value"))
                                  :model_pk       id
                                  :model_index_id 1})
                               (range 25000)))
Execution error (PSQLException) at org.postgresql.jdbc.PgPreparedStatement/<init> (PgPreparedStatement.java:105).
PreparedStatement can have at most 65,535 parameters. Please consider using arrays, or splitting the query in several ones, or using COPY. Given query has 75,000 parameters
```

* move comment and fix test failure explanation

comment should be next to the number? check. and also need to use
`mc/explain` which returns info rather than `mc/validate` which returns
a bool:

```clojure
;; bad
model-index-test=> (let [values [[1 "foo"] [2 "foo"] ["foo" "foo"]]]
                     (-> (mc/validate [:sequential [:tuple number? string?]] values)
                         (me/humanize)))
nil
;; good
model-index-test=> (let [values [[1 "foo"] [2 "foo"] ["foo" "foo"]]]
                     (-> (mc/explain [:sequential [:tuple number? string?]] values)
                         (me/humanize)))
[nil nil [["should be a number"]]]
```
parent 4543a7fd
No related branches found
No related tags found
No related merge requests found
......@@ -46,7 +46,7 @@
(def max-indexed-values
"Maximum number of values we will index. Actually take one more than this to test if there are more than the
threshold."
5000)
25000)
;;;; indexing functions
......@@ -60,34 +60,35 @@
Expression refs (`[:expression \"full-name\"]`) are how the _query_ refers to a custom column. But nested queries
don't, (and shouldn't) care that those are expressions. They are just another field. The field type is always
`:type/Text` enforced by the endpoint to create model indexes."
[field-ref :- mbql.s/Field]
[field-ref :- mbql.s/Field base-type]
(case (first field-ref)
:field field-ref
:expression (let [[_ expression-name] field-ref]
;; api validated that this is a text field when the model-index was created. When selecting the
;; expression we treat it as a field.
[:field expression-name {:base-type :type/Text}])
(throw (ex-info (trs "Invalid field ref for indexing: {0}" field-ref)
[:field expression-name {:base-type base-type}])
(throw (ex-info (format "Invalid field ref for indexing: %s" field-ref)
{:field-ref field-ref
:valid-clauses [:field :expression]}))))
(defn- fetch-values
[model-index]
(let [model (t2/select-one Card :id (:model_id model-index))
value-ref (-> model-index
:value_ref
mbql.normalize/normalize-field-ref
fix-expression-refs)]
(try [nil (->> (qp/process-query
{:database (:database_id model)
:type :query
:query {:source-table (format "card__%d" (:id model))
:breakout [(:pk_ref model-index) value-ref]
:limit (inc max-indexed-values)}})
:data :rows (filter valid-tuples?))]
(catch Exception e
(log/warn (trs "Error fetching indexed values for model {0}" (:id model)) e)
[(ex-message e) []]))))
fix (fn [field-ref base-type] (-> field-ref mbql.normalize/normalize-field-ref (fix-expression-refs base-type)))
;; :type/Text and :type/Integer are ensured at creation time on the api.
value-ref (-> model-index :value_ref (fix :type/Text))
pk-ref (-> model-index :pk_ref (fix :type/Integer))]
(try
[nil (->> (qp/process-query
{:database (:database_id model)
:type :query
:query {:source-table (format "card__%d" (:id model))
:breakout [pk-ref value-ref]
:limit (inc max-indexed-values)}})
:data :rows (filter valid-tuples?))]
(catch Exception e
(log/warn (trs "Error fetching indexed values for model {0}" (:id model)) e)
[(ex-message e) []]))))
(defn find-changes
"Find additions and deletions in indexed values. `source-values` are from the db, `indexed-values` are what we
......@@ -107,38 +108,43 @@ don't, (and shouldn't) care that those are expressions. They are just another fi
"Add indexed values to the model_index_value table."
[model-index]
(let [[error-message values-to-index] (fetch-values model-index)
current-index-values (into #{}
(map (juxt :model_pk :name))
(t2/select ModelIndexValue
:model_index_id (:id model-index)))]
current-index-values (into #{}
(map (juxt :model_pk :name))
(t2/select ModelIndexValue
:model_index_id (:id model-index)))]
(if-not (str/blank? error-message)
(t2/update! ModelIndex (:id model-index) {:state "error"
:error error-message
(t2/update! ModelIndex (:id model-index) {:state "error"
:error error-message
:indexed_at :%now})
(try
(t2/with-transaction [_conn]
(let [{:keys [additions deletions]} (find-changes {:current-index current-index-values
:source-values values-to-index})]
(when (seq deletions)
(t2/delete! ModelIndexValue
:model_index_id (:id model-index)
:pk_ref [:in (->> deletions (map first))]))
(doseq [deletions-part (partition-all 10000 deletions)]
(t2/delete! ModelIndexValue
:model_index_id (:id model-index)
:model_pk [:in (->> deletions-part (map first))])))
(when (seq additions)
(t2/insert! ModelIndexValue
(map (fn [[id v]]
{:name v
:model_pk id
:model_index_id (:id model-index)})
additions))))
(doseq [additions-part (partition-all 10000 additions)]
(t2/insert! ModelIndexValue
(map (fn [[id v]]
{:name v
:model_pk id
:model_index_id (:id model-index)})
additions-part)))))
(t2/update! ModelIndex (:id model-index)
{:indexed_at :%now
:state (if (> (count values-to-index) max-indexed-values)
"overflow"
"indexed")}))
:state (if (> (count values-to-index) max-indexed-values)
"overflow"
"indexed")}))
(catch Exception e
(log/error (format "Error saving model-index values for model-index: %d, model: %d"
(:id model-index) (:model-id model-index))
e)
(t2/update! ModelIndex (:id model-index)
{:state "error"
:error (ex-message e)
{:state "error"
:error (ex-message e)
:indexed_at :%now}))))))
......
......@@ -40,13 +40,22 @@
(deftest quick-run-through
(with-scheduler-setup
(mt/dataset test-data
(let [query (mt/mbql-query products)
(let [query (mt/mbql-query products
{:filter [:and
[:> $id 0]
[:< $id 10]]})
pk_ref (mt/$ids $products.id)
value_ref (mt/$ids $products.title)]
(t2.with-temp/with-temp [Card model (assoc (mt/card-with-source-metadata-for-query query)
:type :model
:name "model index test")]
(let [model-index (mt/user-http-request :rasta :post 200 "/model-index"
(mt/with-model-cleanup [:model/Card]
;; with-temp doesn't let us update the model to simulate different values returned
(let [response (mt/user-http-request :rasta :post 200 "/card"
(assoc (mt/card-with-source-metadata-for-query query)
:type :model
:name "model index test"
:visualization_settings {}
:display "table"))
model (t2/select-one :model/Card :id (:id response))
model-index (mt/user-http-request :rasta :post 200 "/model-index"
{:model_id (:id model)
:pk_ref pk_ref
:value_ref value_ref})
......@@ -59,10 +68,32 @@
(mt/user-http-request :rasta :get 200 (str "/model-index/" (:id model-index))))))
(testing "We can invoke the task ourself manually"
(model-index/add-values! model-index)
(is (= 200 (count (t2/select ModelIndexValue :model_index_id (:id model-index)))))
(is (= 9 (count (t2/select ModelIndexValue :model_index_id (:id model-index)))))
(is (= (into #{} cat (mt/rows (qp/process-query
(mt/mbql-query products {:fields [$title]}))))
(mt/mbql-query products {:fields [$title]
:filter [:and
[:> $id 0]
[:< $id 10]]}))))
(t2/select-fn-set :name ModelIndexValue :model_index_id (:id model-index)))))
(testing "When the values change the indexed values change"
;; update the filter on the model to simulate different values indexed
(t2/update! :model/Card
(:id model)
{:dataset_query (mt/mbql-query products
{:filter [:and
[:> $id 10]
[:< $id 20]]})})
(model-index/add-values! model-index)
(is (= 9 (count (t2/select ModelIndexValue :model_index_id (:id model-index)))))
(is (= (into #{} cat (mt/rows (qp/process-query
(mt/mbql-query products {:fields [$title]
:filter [:and
[:> $id 10]
[:< $id 20]]}))))
(t2/select-fn-set :name ModelIndexValue :model_index_id (:id model-index))))
(is (=? {:error nil
:state "indexed"}
(t2/select-one :model/ModelIndex :id (:id model-index)))))
(let [index-trigger! #(->> (task/scheduler-info)
:jobs
(by-key "metabase.task.IndexValues.job")
......@@ -146,10 +177,13 @@
(doseq [[scenario query [field-refs]]
(remove nil?
[[:mbql (mt/mbql-query products {:fields [$id $title]})]
[:mbql-custom-column (mt/mbql-query products {:expressions
{"full-name"
[:concat $title "custom"]}})
[(mt/$ids [$products.id [:expression "full-name"]])]]
[:mbql-custom-column (mt/mbql-query products
{:expressions
{"inc-id"
[:+ $id 1]
"full-name"
[:concat $title "custom"]}})
[[[:expression "inc-id"] [:expression "full-name"]]]]
[:native (mt/native-query
(qp.compile/compile
(mt/mbql-query products {:fields [$id $title]})))]
......@@ -180,9 +214,10 @@
:pk_ref pk-ref
:value_ref value-ref})]
(is (nil? error))
(is (> (count values) 0))
;; oracle returns BigDecimal ids so need `number?` rather than `int?`
(is (mc/validate [:sequential [:tuple number? string?]] values)
(-> (mc/validate [:sequential [:tuple int? string?]] values)
(-> (mc/explain [:sequential [:tuple number? string?]] values)
(me/humanize))))))))))
(defn- test-index
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment