.dir-locals.el · 8a2d75baca10f86341f69cd1ad303bae14ff6587 · Engineering Digital Service / Metabase

3 years ago

Handle nested queries which have agg at top level and nested (#19437) · 50818a92

dpsutton authored 3 years ago

* Handle nested queries which have agg at top level and nested

Previously when matching columns in the outer query with columns in the
inner query we had use id, and then recently, field_ref. This is
problematic for aggregations.

Consider https://github.com/metabase/metabase/issues/19403

The mbql for this query is

```clojure
{:type :query
 :query {:aggregation [[:aggregation-options
                        [:count]
                        {:name "count"}]
                       [:aggregation-options
                        [:avg
                         [:field
                          "sum"
                          {:base-type :type/Float}]]
                        {:name "avg"}]]
         :limit 10
         :source-card-id 1960
         :source-query {:source-table 1
                        :aggregation [[:aggregation-options
                                       [:sum
                                        [:field 23 nil]]
                                       {:name "sum"}]
                                      [:aggregation-options
                                       [:max
                                        [:field 28 nil]]
                                       {:name "max"}]]
                        :breakout [[:field 26 nil]]
                        :order-by [[:asc
                                    [:field 26 nil]]]}}
 :database 1}
```

The aggregations in the top level select will be type checked as :name
"count" :field_ref [:aggregation 0]. The aggregations in the nested
query will be turned into :name "sum" :field_ref [:aggregation 0]! This
is because aggregations are numbered "on their level" and not
globally. So when the fields on the top level look at the metadata for
the nested query and merge it, it unifies the two [:aggregation 0]
fields but this is INCORRECT. These aggregations are not the same, they
just happen to be the first aggregations at each level.

Its illustrative to see what a (select * from (query with aggregations))
looks like:

```clojure
{:database 1
 :query {:source-card-id 1960
         :source-metadata [{:description "The type of product, valid values include: Doohicky, Gadget, Gizmo and Widget"
                            :semantic_type :type/Category
                            :coercion_strategy nil
                            :name "CATEGORY"
                            :field_ref [:field 26 nil]
                            :effective_type :type/Text
                            :id 26
                            :display_name "Category"
                            :fingerprint {:global {:distinct-count 4
                                                   :nil% 0}
                                          :type {:type/Text {:percent-json 0
                                                             :percent-url 0
                                                             :percent-email 0
                                                             :percent-state 0
                                                             :average-length 6.375}}}
                            :base_type :type/Text}
                           {:name "sum"
                            :display_name "Sum of Price"
                            :base_type :type/Float
                            :effective_type :type/Float
                            :semantic_type nil
                            :field_ref [:aggregation 0]}
                           {:name "max"
                            :display_name "Max of Rating"
                            :base_type :type/Float
                            :effective_type :type/Float
                            :semantic_type :type/Score
                            :field_ref [:aggregation 1]}]
         :fields ([:field 26 nil]
                  [:field
                   "sum"
                   {:base-type :type/Float}]
                  [:field
                   "max"
                   {:base-type :type/Float}])
         :source-query {:source-table 1
                        :aggregation [[:aggregation-options
                                       [:sum
                                        [:field 23 nil]]
                                       {:name "sum"}]
                                      [:aggregation-options
                                       [:max
                                        [:field 28 nil]]
                                       {:name "max"}]]
                        :breakout [[:field 26 nil]]
                        :order-by [[:asc
                                    [:field 26 nil]]]}}
 :type :query
 :middleware {:js-int-to-string? true
              :add-default-userland-constraints? true}
 :info {:executed-by 1
        :context :ad-hoc
        :card-id 1960
        :nested? true
        :query-hash #object["[B" 0x10227bf4 "[B@10227bf4"]}
 :constraints {:max-results 10000
               :max-results-bare-rows 2000}}
```

The important bits are that it understands the nested query's metadata
to be

```clojure
{:name "sum"
 :display_name "Sum of Price"
 :field_ref [:aggregation 0]}
{:name "max"
 :display_name "Max of Rating"
 :field_ref [:aggregation 1]}
```

And the fields on the outer query to be:
```clojure
([:field
  "sum"
  {:base-type :type/Float}]
 [:field
  "max"
  {:base-type :type/Float}])
```

So there's the mismatch: the field_ref on the outside is [:field "sum"]
but on the inside the field_ref is [:aggregation 0]. So the best way to
match up when "looking down" into sub queries is by id and then by name.

* Some drivers return 4.0 instead of 4 so make them all ints

* Handle dataset metadata in a special way

rather than trying to set confusing merge rules, just special case
metadata from datasets.

Previously, was trying to merge the "preserved keys" on top of the
normal merge order. This caused lots of issues. They were
trivial. Native display names are very close to the column name, whereas
mbql names go through some humanization. So when you select price
from (select PRICE ...) its an mbql with a nested native query. The
merge order meant that the display name went from "Price"
previously (the mbql nice name for the outer select) to "PRICE", the
underlying native column name. Now we don't special case the display
name (and other fields) of regular source-metadata.

Also, there were issues with mbql on top of an mbql dataset. Since it is
all mbql, everything is pulled from the database. So if there were
overrides in the nested mbql dataset, like description, display name,
etc, the outer field select already had the display name, etc. from the
database rather than allowing the edits to override from the nested
query.

Also, using a long `:source-query/dataset?` keyword so it is far easier
to find where this is set. With things called just `:dataset` it can be
quite hard to find where these keys are used. When using the namespaced
keyword, greping and finding usages is trivial. And the namespace gives
more context

Unverified

50818a92

History

Handle nested queries which have agg at top level and nested (#19437)

dpsutton authored 3 years ago

* Handle nested queries which have agg at top level and nested

Previously when matching columns in the outer query with columns in the
inner query we had use id, and then recently, field_ref. This is
problematic for aggregations.

Consider https://github.com/metabase/metabase/issues/19403

The mbql for this query is

```clojure
{:type :query
 :query {:aggregation [[:aggregation-options
                        [:count]
                        {:name "count"}]
                       [:aggregation-options
                        [:avg
                         [:field
                          "sum"
                          {:base-type :type/Float}]]
                        {:name "avg"}]]
         :limit 10
         :source-card-id 1960
         :source-query {:source-table 1
                        :aggregation [[:aggregation-options
                                       [:sum
                                        [:field 23 nil]]
                                       {:name "sum"}]
                                      [:aggregation-options
                                       [:max
                                        [:field 28 nil]]
                                       {:name "max"}]]
                        :breakout [[:field 26 nil]]
                        :order-by [[:asc
                                    [:field 26 nil]]]}}
 :database 1}
```

The aggregations in the top level select will be type checked as :name
"count" :field_ref [:aggregation 0]. The aggregations in the nested
query will be turned into :name "sum" :field_ref [:aggregation 0]! This
is because aggregations are numbered "on their level" and not
globally. So when the fields on the top level look at the metadata for
the nested query and merge it, it unifies the two [:aggregation 0]
fields but this is INCORRECT. These aggregations are not the same, they
just happen to be the first aggregations at each level.

Its illustrative to see what a (select * from (query with aggregations))
looks like:

```clojure
{:database 1
 :query {:source-card-id 1960
         :source-metadata [{:description "The type of product, valid values include: Doohicky, Gadget, Gizmo and Widget"
                            :semantic_type :type/Category
                            :coercion_strategy nil
                            :name "CATEGORY"
                            :field_ref [:field 26 nil]
                            :effective_type :type/Text
                            :id 26
                            :display_name "Category"
                            :fingerprint {:global {:distinct-count 4
                                                   :nil% 0}
                                          :type {:type/Text {:percent-json 0
                                                             :percent-url 0
                                                             :percent-email 0
                                                             :percent-state 0
                                                             :average-length 6.375}}}
                            :base_type :type/Text}
                           {:name "sum"
                            :display_name "Sum of Price"
                            :base_type :type/Float
                            :effective_type :type/Float
                            :semantic_type nil
                            :field_ref [:aggregation 0]}
                           {:name "max"
                            :display_name "Max of Rating"
                            :base_type :type/Float
                            :effective_type :type/Float
                            :semantic_type :type/Score
                            :field_ref [:aggregation 1]}]
         :fields ([:field 26 nil]
                  [:field
                   "sum"
                   {:base-type :type/Float}]
                  [:field
                   "max"
                   {:base-type :type/Float}])
         :source-query {:source-table 1
                        :aggregation [[:aggregation-options
                                       [:sum
                                        [:field 23 nil]]
                                       {:name "sum"}]
                                      [:aggregation-options
                                       [:max
                                        [:field 28 nil]]
                                       {:name "max"}]]
                        :breakout [[:field 26 nil]]
                        :order-by [[:asc
                                    [:field 26 nil]]]}}
 :type :query
 :middleware {:js-int-to-string? true
              :add-default-userland-constraints? true}
 :info {:executed-by 1
        :context :ad-hoc
        :card-id 1960
        :nested? true
        :query-hash #object["[B" 0x10227bf4 "[B@10227bf4"]}
 :constraints {:max-results 10000
               :max-results-bare-rows 2000}}
```

The important bits are that it understands the nested query's metadata
to be

```clojure
{:name "sum"
 :display_name "Sum of Price"
 :field_ref [:aggregation 0]}
{:name "max"
 :display_name "Max of Rating"
 :field_ref [:aggregation 1]}
```

And the fields on the outer query to be:
```clojure
([:field
  "sum"
  {:base-type :type/Float}]
 [:field
  "max"
  {:base-type :type/Float}])
```

So there's the mismatch: the field_ref on the outside is [:field "sum"]
but on the inside the field_ref is [:aggregation 0]. So the best way to
match up when "looking down" into sub queries is by id and then by name.

* Some drivers return 4.0 instead of 4 so make them all ints

* Handle dataset metadata in a special way

rather than trying to set confusing merge rules, just special case
metadata from datasets.

Previously, was trying to merge the "preserved keys" on top of the
normal merge order. This caused lots of issues. They were
trivial. Native display names are very close to the column name, whereas
mbql names go through some humanization. So when you select price
from (select PRICE ...) its an mbql with a nested native query. The
merge order meant that the display name went from "Price"
previously (the mbql nice name for the outer select) to "PRICE", the
underlying native column name. Now we don't special case the display
name (and other fields) of regular source-metadata.

Also, there were issues with mbql on top of an mbql dataset. Since it is
all mbql, everything is pulled from the database. So if there were
overrides in the nested mbql dataset, like description, display name,
etc, the outer field select already had the display name, etc. from the
database rather than allowing the edits to override from the nested
query.

Also, using a long `:source-query/dataset?` keyword so it is far easier
to find where this is set. With things called just `:dataset` it can be
quite hard to find where these keys are used. When using the namespaced
keyword, greping and finding usages is trivial. And the namespace gives
more context

Code owners

Assign users and groups as approvers for specific file changes. Learn more.