Handle nested queries which have agg at top level and nested (#19437)
* Handle nested queries which have agg at top level and nested Previously when matching columns in the outer query with columns in the inner query we had use id, and then recently, field_ref. This is problematic for aggregations. Consider https://github.com/metabase/metabase/issues/19403 The mbql for this query is ```clojure {:type :query :query {:aggregation [[:aggregation-options [:count] {:name "count"}] [:aggregation-options [:avg [:field "sum" {:base-type :type/Float}]] {:name "avg"}]] :limit 10 :source-card-id 1960 :source-query {:source-table 1 :aggregation [[:aggregation-options [:sum [:field 23 nil]] {:name "sum"}] [:aggregation-options [:max [:field 28 nil]] {:name "max"}]] :breakout [[:field 26 nil]] :order-by [[:asc [:field 26 nil]]]}} :database 1} ``` The aggregations in the top level select will be type checked as :name "count" :field_ref [:aggregation 0]. The aggregations in the nested query will be turned into :name "sum" :field_ref [:aggregation 0]! This is because aggregations are numbered "on their level" and not globally. So when the fields on the top level look at the metadata for the nested query and merge it, it unifies the two [:aggregation 0] fields but this is INCORRECT. These aggregations are not the same, they just happen to be the first aggregations at each level. Its illustrative to see what a (select * from (query with aggregations)) looks like: ```clojure {:database 1 :query {:source-card-id 1960 :source-metadata [{:description "The type of product, valid values include: Doohicky, Gadget, Gizmo and Widget" :semantic_type :type/Category :coercion_strategy nil :name "CATEGORY" :field_ref [:field 26 nil] :effective_type :type/Text :id 26 :display_name "Category" :fingerprint {:global {:distinct-count 4 :nil% 0} :type {:type/Text {:percent-json 0 :percent-url 0 :percent-email 0 :percent-state 0 :average-length 6.375}}} :base_type :type/Text} {:name "sum" :display_name "Sum of Price" :base_type :type/Float :effective_type :type/Float :semantic_type nil :field_ref [:aggregation 0]} {:name "max" :display_name "Max of Rating" :base_type :type/Float :effective_type :type/Float :semantic_type :type/Score :field_ref [:aggregation 1]}] :fields ([:field 26 nil] [:field "sum" {:base-type :type/Float}] [:field "max" {:base-type :type/Float}]) :source-query {:source-table 1 :aggregation [[:aggregation-options [:sum [:field 23 nil]] {:name "sum"}] [:aggregation-options [:max [:field 28 nil]] {:name "max"}]] :breakout [[:field 26 nil]] :order-by [[:asc [:field 26 nil]]]}} :type :query :middleware {:js-int-to-string? true :add-default-userland-constraints? true} :info {:executed-by 1 :context :ad-hoc :card-id 1960 :nested? true :query-hash #object["[B" 0x10227bf4 "[B@10227bf4"]} :constraints {:max-results 10000 :max-results-bare-rows 2000}} ``` The important bits are that it understands the nested query's metadata to be ```clojure {:name "sum" :display_name "Sum of Price" :field_ref [:aggregation 0]} {:name "max" :display_name "Max of Rating" :field_ref [:aggregation 1]} ``` And the fields on the outer query to be: ```clojure ([:field "sum" {:base-type :type/Float}] [:field "max" {:base-type :type/Float}]) ``` So there's the mismatch: the field_ref on the outside is [:field "sum"] but on the inside the field_ref is [:aggregation 0]. So the best way to match up when "looking down" into sub queries is by id and then by name. * Some drivers return 4.0 instead of 4 so make them all ints * Handle dataset metadata in a special way rather than trying to set confusing merge rules, just special case metadata from datasets. Previously, was trying to merge the "preserved keys" on top of the normal merge order. This caused lots of issues. They were trivial. Native display names are very close to the column name, whereas mbql names go through some humanization. So when you select price from (select PRICE ...) its an mbql with a nested native query. The merge order meant that the display name went from "Price" previously (the mbql nice name for the outer select) to "PRICE", the underlying native column name. Now we don't special case the display name (and other fields) of regular source-metadata. Also, there were issues with mbql on top of an mbql dataset. Since it is all mbql, everything is pulled from the database. So if there were overrides in the nested mbql dataset, like description, display name, etc, the outer field select already had the display name, etc. from the database rather than allowing the edits to override from the nested query. Also, using a long `:source-query/dataset?` keyword so it is far easier to find where this is set. With things called just `:dataset` it can be quite hard to find where these keys are used. When using the namespaced keyword, greping and finding usages is trivial. And the namespace gives more context
Showing
- .dir-locals.el 1 addition, 0 deletions.dir-locals.el
- src/metabase/query_processor/middleware/annotate.clj 10 additions, 11 deletionssrc/metabase/query_processor/middleware/annotate.clj
- src/metabase/query_processor/middleware/fetch_source_query.clj 11 additions, 6 deletions...etabase/query_processor/middleware/fetch_source_query.clj
- test/metabase/query_processor/middleware/annotate_test.clj 2 additions, 1 deletiontest/metabase/query_processor/middleware/annotate_test.clj
- test/metabase/query_processor_test/nested_queries_test.clj 40 additions, 3 deletionstest/metabase/query_processor_test/nested_queries_test.clj
Loading
Please register or sign in to comment