Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated .
  1. Dec 20, 2021
  2. Dec 19, 2021
  3. Dec 18, 2021
  4. Dec 17, 2021
  5. Dec 16, 2021
  6. Dec 15, 2021
    • Howon Lee's avatar
      Fix combos in tables of column cardinality bigger than the columns being used... · 46c7180d
      Howon Lee authored
      Fix combos in tables of column cardinality bigger than the columns being used in combo being broken (#19368)
      
      Pursuant to #18676. Whacks a regression from the second fix of the combo viz
      Unverified
      46c7180d
    • dpsutton's avatar
      Datasets preserve metadata (#19158) · 5f8bc305
      dpsutton authored
      * Preserve metadata and surface metadata of datasets
      
      Need to handle two cases:
      1. querying the dataset itself
      2. a question that is a nested question of the dataset
      
      1. Querying the dataset itself
      This is a bit of a shift of how metadata works. Previously it was just
      thrown away and saved on each query run. This kind of needs to be this
      way because when you edit a question, we do not ensure that the metadata
      stays in sync! There's some checksum operation that ensures that the
      metadata hasn't been tampered with, but it doesn't ensure that it
      actually matches the query any longer.
      
      So imagine you add a new column to a query. The metadata is not changed,
      but its checksum matches the original query's metadata and the backend
      happily saves this. Then on a subsequent run of the query (or if you hit
      visualize before saving) the metadata is tossed and updated.
      
      So to handle this carelessness, we have to allow the metadata that can
      be edited to persist across just running the dataset query. So when
      hitting the card api, stick the original metadata in the middleware (and
      update the normalize ns not to mangle field_ref -> field-ref among
      others). Once we have this smuggled in, when computing the metadata in
      annotate, we need a way to index the columns. The old and bad way was
      the following:
      
      ```clojure
      ;; old
      (let [field-id->metadata (u/key-by :id source-metadata)] ...)
      ;; new and better
      (let [ref->metadata (u/key-by (comp u/field-ref->key :field_ref) source-metadata)] )
      ```
      
      This change is important because ids are only for fields that map to
      actual database columns. computed columns, case, manipulations, and all
      native fields will lack this. But we can make field references.
      
      Then for each field in the newly computed metadata, allow the non-type
      information to persist. We do not want to override type information as
      this can break a query, but things like description, display name,
      semantic type can survive.
      
      This metadata is then saved in the db as always so we can continue with
      the bit of careless metadata saving that we do.
      
      2. a question that is a nested question of the dataset
      This was a simpler change to grab the source-metadata and ensure that it
      is blended into the result metadata in the same way.
      
      Things i haven't looked at yet: column renaming, if we need to allow
      conversions to carry through or if those necessarily must be opaque (ie,
      once it has been cast forget that it was originally a different type so
      we don't try to cast the already cast value), and i'm sure some other
      things. But it has been quite a pain to figure all of this stuff
      out. Especially the divide between native and mbql since native requires
      the first row of values back before it can detect some types.
      
      * Add in base-type specially
      
      Best to use field_refs to combine metadata from datasets. This means
      that we add this ref before the base-type is known. So we have to update
      this base-type later once they are known from sampling the results
      
      * Allow column information through
      
      I'm not sure how this base-type is set for
      annotate-native-cols. Presumably we don't have and we get it from the
      results but this is not true. I guess we do some analysis on count
      types. I'm not sure why they failed though.
      
      * Correctly infer this stuff
      
      This was annoying. I like :field_ref over :name for indexing, as it has
      a guaranteed unique name. But datasets will have unique names due to a
      restriction*. The problem was that annotating the native results before
      we had type information gave us refs like `[:field "foo" {:base-type
      :type/*}]`, but then this ruined the merge strategy at the end and
      prevented a proper ref being merged on top. Quite annoying. This stuff
      is very whack-a-mole in that you fix one bit and another breaks
      somewhere else**.
      
      * cannot have identical names for a subselect:
          select id from (select 1 as id, 2 as id)
      
      ** in fact, another test broke on this commit
      
      * Revert "Correctly infer this stuff"
      
      This reverts commit 1ffe44e90076b024efd231f84ea8062a281e69ab.
      
      * Annotate but de-annotate in a way
      
      To combine metadata from the db, really, really want to make sure they
      actually match up. Cannot use name as this could collide when there are
      two IDs in the same query. Combining metadata on that gets nasty real
      quick.
      
      For mbql and native, its best to use field_refs. Field_refs offer the
      best of both worlds: if id, we are golden and its by id. If by name,
      they have been uniquified already. So this will run into issues if you
      reorder a query or add a new column in with the same name but i think
      that's the theoretical best we can do.
      
      BUT, we have to do a little cleanup for this stuff. When native adds the
      field_ref, it needs to include some type information. But this isn't
      known until after the query runs for native since its just an opaque
      query until we run it. So annotating will add a `[:field name
      {:base_type :type/*}]` and then our merging doesn't clobber that
      later. So its best to add the field_refs, match up with any db metadata,
      and then remove the field_refs.
      
      * Test that metadata flows through
      
      * Test mbql datasets and questions based on datasets
      
      * Test mbql/native queries and nested queries
      
      * Recognize that native query bubbles into nested
      
      When using a nested query based on a native query, the metadata from the
      underlying dataset is used. Previously we would clobber this with the
      metadata from the expected cols of the wrapping mbql query. This would
      process the display name with `humanization/name->human-readable-name`
      whereas for native it goes through `u/qualified-name`.
      
      I originally piped the native's name through the humanization but that
      leads to lots of test failures, and perhaps correct failures. For
      instance, a csv test asserts the column title is "COUNT(*)" but the
      change would emit "Count(*)", a humanization of count(*) isn't
      necessarily an improvement nor even correct.
      
      It is possible that we could change this in the future but I'd want it
      to be a deliberate change. It should be mechanical, just adjusting
      `annotate-native-cols` in annotate.clj to return a humanized display
      name and then fixing tests.
      
      * Allow computed display name on top of source metadata name
      
      If we have a join, we want the "pretty" name to land on top of the
      underlying table's name. "alias → B Column" vs "B Column".
      
      * Put dataset metadata in info, not middleware
      
      * Move metadata back under dataset key in info
      
      We want to ensure that dataset information is propagated, but card
      information should be computed fresh each time. Including the card
      information each time leads to errors as it erroneously thinks the
      existing card info should shadow the dataset information. This is
      actually a tricky case: figuring out when to care about information at
      arbitrary points in the query processor.
      
      * Update metadata to :info not :middleware in tests
      
      * Make var private and comment about info metadata
      Unverified
      5f8bc305
    • Gustavo Saiani's avatar
    • Cam Saul's avatar
      Fix logging test utils & re-enable tests (#19334) · f4073a15
      Cam Saul authored
      * Fix logging utils
      
      * Test fixes :wrench:
      Unverified
      f4073a15
    • Gustavo Saiani's avatar
    • Jeff Evans's avatar
      Fix MongoDB path collision error with nested columns (#19252) · 7de81ef7
      Jeff Evans authored
      Remove any parent fields from the field list when building projections, which should have the same effect as before (only child fields selected) while being compatible with MongoDB 4.4+
      
      Add test to confirm the parent fields are removed
      Unverified
      7de81ef7
    • Cam Saul's avatar
      Fix X-Rays with filters (#19370) · 4361d6f7
      Cam Saul authored
      Unverified
      4361d6f7
  7. Dec 14, 2021
Loading