Skip to content
Snippets Groups Projects
  • dpsutton's avatar
    50818a92
    Handle nested queries which have agg at top level and nested (#19437) · 50818a92
    dpsutton authored
    * Handle nested queries which have agg at top level and nested
    
    Previously when matching columns in the outer query with columns in the
    inner query we had use id, and then recently, field_ref. This is
    problematic for aggregations.
    
    Consider https://github.com/metabase/metabase/issues/19403
    
    The mbql for this query is
    
    ```clojure
    {:type :query
     :query {:aggregation [[:aggregation-options
                            [:count]
                            {:name "count"}]
                           [:aggregation-options
                            [:avg
                             [:field
                              "sum"
                              {:base-type :type/Float}]]
                            {:name "avg"}]]
             :limit 10
             :source-card-id 1960
             :source-query {:source-table 1
                            :aggregation [[:aggregation-options
                                           [:sum
                                            [:field 23 nil]]
                                           {:name "sum"}]
                                          [:aggregation-options
                                           [:max
                                            [:field 28 nil]]
                                           {:name "max"}]]
                            :breakout [[:field 26 nil]]
                            :order-by [[:asc
                                        [:field 26 nil]]]}}
     :database 1}
    ```
    
    The aggregations in the top level select will be type checked as :name
    "count" :field_ref [:aggregation 0]. The aggregations in the nested
    query will be turned into :name "sum" :field_ref [:aggregation 0]! This
    is because aggregations are numbered "on their level" and not
    globally. So when the fields on the top level look at the metadata for
    the nested query and merge it, it unifies the two [:aggregation 0]
    fields but this is INCORRECT. These aggregations are not the same, they
    just happen to be the first aggregations at each level.
    
    Its illustrative to see what a (select * from (query with aggregations))
    looks like:
    
    ```clojure
    {:database 1
     :query {:source-card-id 1960
             :source-metadata [{:description "The type of product, valid values include: Doohicky, Gadget, Gizmo and Widget"
                                :semantic_type :type/Category
                                :coercion_strategy nil
                                :name "CATEGORY"
                                :field_ref [:field 26 nil]
                                :effective_type :type/Text
                                :id 26
                                :display_name "Category"
                                :fingerprint {:global {:distinct-count 4
                                                       :nil% 0}
                                              :type {:type/Text {:percent-json 0
                                                                 :percent-url 0
                                                                 :percent-email 0
                                                                 :percent-state 0
                                                                 :average-length 6.375}}}
                                :base_type :type/Text}
                               {:name "sum"
                                :display_name "Sum of Price"
                                :base_type :type/Float
                                :effective_type :type/Float
                                :semantic_type nil
                                :field_ref [:aggregation 0]}
                               {:name "max"
                                :display_name "Max of Rating"
                                :base_type :type/Float
                                :effective_type :type/Float
                                :semantic_type :type/Score
                                :field_ref [:aggregation 1]}]
             :fields ([:field 26 nil]
                      [:field
                       "sum"
                       {:base-type :type/Float}]
                      [:field
                       "max"
                       {:base-type :type/Float}])
             :source-query {:source-table 1
                            :aggregation [[:aggregation-options
                                           [:sum
                                            [:field 23 nil]]
                                           {:name "sum"}]
                                          [:aggregation-options
                                           [:max
                                            [:field 28 nil]]
                                           {:name "max"}]]
                            :breakout [[:field 26 nil]]
                            :order-by [[:asc
                                        [:field 26 nil]]]}}
     :type :query
     :middleware {:js-int-to-string? true
                  :add-default-userland-constraints? true}
     :info {:executed-by 1
            :context :ad-hoc
            :card-id 1960
            :nested? true
            :query-hash #object["[B" 0x10227bf4 "[B@10227bf4"]}
     :constraints {:max-results 10000
                   :max-results-bare-rows 2000}}
    ```
    
    The important bits are that it understands the nested query's metadata
    to be
    
    ```clojure
    {:name "sum"
     :display_name "Sum of Price"
     :field_ref [:aggregation 0]}
    {:name "max"
     :display_name "Max of Rating"
     :field_ref [:aggregation 1]}
    ```
    
    And the fields on the outer query to be:
    ```clojure
    ([:field
      "sum"
      {:base-type :type/Float}]
     [:field
      "max"
      {:base-type :type/Float}])
    ```
    
    So there's the mismatch: the field_ref on the outside is [:field "sum"]
    but on the inside the field_ref is [:aggregation 0]. So the best way to
    match up when "looking down" into sub queries is by id and then by name.
    
    * Some drivers return 4.0 instead of 4 so make them all ints
    
    * Handle dataset metadata in a special way
    
    rather than trying to set confusing merge rules, just special case
    metadata from datasets.
    
    Previously, was trying to merge the "preserved keys" on top of the
    normal merge order. This caused lots of issues. They were
    trivial. Native display names are very close to the column name, whereas
    mbql names go through some humanization. So when you select price
    from (select PRICE ...) its an mbql with a nested native query. The
    merge order meant that the display name went from "Price"
    previously (the mbql nice name for the outer select) to "PRICE", the
    underlying native column name. Now we don't special case the display
    name (and other fields) of regular source-metadata.
    
    Also, there were issues with mbql on top of an mbql dataset. Since it is
    all mbql, everything is pulled from the database. So if there were
    overrides in the nested mbql dataset, like description, display name,
    etc, the outer field select already had the display name, etc. from the
    database rather than allowing the edits to override from the nested
    query.
    
    Also, using a long `:source-query/dataset?` keyword so it is far easier
    to find where this is set. With things called just `:dataset` it can be
    quite hard to find where these keys are used. When using the namespaced
    keyword, greping and finding usages is trivial. And the namespace gives
    more context
    Handle nested queries which have agg at top level and nested (#19437)
    dpsutton authored
    * Handle nested queries which have agg at top level and nested
    
    Previously when matching columns in the outer query with columns in the
    inner query we had use id, and then recently, field_ref. This is
    problematic for aggregations.
    
    Consider https://github.com/metabase/metabase/issues/19403
    
    The mbql for this query is
    
    ```clojure
    {:type :query
     :query {:aggregation [[:aggregation-options
                            [:count]
                            {:name "count"}]
                           [:aggregation-options
                            [:avg
                             [:field
                              "sum"
                              {:base-type :type/Float}]]
                            {:name "avg"}]]
             :limit 10
             :source-card-id 1960
             :source-query {:source-table 1
                            :aggregation [[:aggregation-options
                                           [:sum
                                            [:field 23 nil]]
                                           {:name "sum"}]
                                          [:aggregation-options
                                           [:max
                                            [:field 28 nil]]
                                           {:name "max"}]]
                            :breakout [[:field 26 nil]]
                            :order-by [[:asc
                                        [:field 26 nil]]]}}
     :database 1}
    ```
    
    The aggregations in the top level select will be type checked as :name
    "count" :field_ref [:aggregation 0]. The aggregations in the nested
    query will be turned into :name "sum" :field_ref [:aggregation 0]! This
    is because aggregations are numbered "on their level" and not
    globally. So when the fields on the top level look at the metadata for
    the nested query and merge it, it unifies the two [:aggregation 0]
    fields but this is INCORRECT. These aggregations are not the same, they
    just happen to be the first aggregations at each level.
    
    Its illustrative to see what a (select * from (query with aggregations))
    looks like:
    
    ```clojure
    {:database 1
     :query {:source-card-id 1960
             :source-metadata [{:description "The type of product, valid values include: Doohicky, Gadget, Gizmo and Widget"
                                :semantic_type :type/Category
                                :coercion_strategy nil
                                :name "CATEGORY"
                                :field_ref [:field 26 nil]
                                :effective_type :type/Text
                                :id 26
                                :display_name "Category"
                                :fingerprint {:global {:distinct-count 4
                                                       :nil% 0}
                                              :type {:type/Text {:percent-json 0
                                                                 :percent-url 0
                                                                 :percent-email 0
                                                                 :percent-state 0
                                                                 :average-length 6.375}}}
                                :base_type :type/Text}
                               {:name "sum"
                                :display_name "Sum of Price"
                                :base_type :type/Float
                                :effective_type :type/Float
                                :semantic_type nil
                                :field_ref [:aggregation 0]}
                               {:name "max"
                                :display_name "Max of Rating"
                                :base_type :type/Float
                                :effective_type :type/Float
                                :semantic_type :type/Score
                                :field_ref [:aggregation 1]}]
             :fields ([:field 26 nil]
                      [:field
                       "sum"
                       {:base-type :type/Float}]
                      [:field
                       "max"
                       {:base-type :type/Float}])
             :source-query {:source-table 1
                            :aggregation [[:aggregation-options
                                           [:sum
                                            [:field 23 nil]]
                                           {:name "sum"}]
                                          [:aggregation-options
                                           [:max
                                            [:field 28 nil]]
                                           {:name "max"}]]
                            :breakout [[:field 26 nil]]
                            :order-by [[:asc
                                        [:field 26 nil]]]}}
     :type :query
     :middleware {:js-int-to-string? true
                  :add-default-userland-constraints? true}
     :info {:executed-by 1
            :context :ad-hoc
            :card-id 1960
            :nested? true
            :query-hash #object["[B" 0x10227bf4 "[B@10227bf4"]}
     :constraints {:max-results 10000
                   :max-results-bare-rows 2000}}
    ```
    
    The important bits are that it understands the nested query's metadata
    to be
    
    ```clojure
    {:name "sum"
     :display_name "Sum of Price"
     :field_ref [:aggregation 0]}
    {:name "max"
     :display_name "Max of Rating"
     :field_ref [:aggregation 1]}
    ```
    
    And the fields on the outer query to be:
    ```clojure
    ([:field
      "sum"
      {:base-type :type/Float}]
     [:field
      "max"
      {:base-type :type/Float}])
    ```
    
    So there's the mismatch: the field_ref on the outside is [:field "sum"]
    but on the inside the field_ref is [:aggregation 0]. So the best way to
    match up when "looking down" into sub queries is by id and then by name.
    
    * Some drivers return 4.0 instead of 4 so make them all ints
    
    * Handle dataset metadata in a special way
    
    rather than trying to set confusing merge rules, just special case
    metadata from datasets.
    
    Previously, was trying to merge the "preserved keys" on top of the
    normal merge order. This caused lots of issues. They were
    trivial. Native display names are very close to the column name, whereas
    mbql names go through some humanization. So when you select price
    from (select PRICE ...) its an mbql with a nested native query. The
    merge order meant that the display name went from "Price"
    previously (the mbql nice name for the outer select) to "PRICE", the
    underlying native column name. Now we don't special case the display
    name (and other fields) of regular source-metadata.
    
    Also, there were issues with mbql on top of an mbql dataset. Since it is
    all mbql, everything is pulled from the database. So if there were
    overrides in the nested mbql dataset, like description, display name,
    etc, the outer field select already had the display name, etc. from the
    database rather than allowing the edits to override from the nested
    query.
    
    Also, using a long `:source-query/dataset?` keyword so it is far easier
    to find where this is set. With things called just `:dataset` it can be
    quite hard to find where these keys are used. When using the namespaced
    keyword, greping and finding usages is trivial. And the namespace gives
    more context
Code owners
Assign users and groups as approvers for specific file changes. Learn more.