Skip to content
Snippets Groups Projects
  • dpsutton's avatar
    6b8ddc84
    Semantic types 2 effective type (#15022) · 6b8ddc84
    dpsutton authored
    
    * First pass using coercions
    
    * Coercions
    
    * Handle effective_type coercion_strategy in test data sets
    
    * special-type -> semantic type in sample db
    
    ```clojure
    user> (def config (metabase.db.spec/h2 {:db (str "/Users/dan/projects/clojure/metabase/resources/sample-dataset.db"
                                                     ";UNDO_LOG=0;CACHE_SIZE=131072;QUERY_CACHE_SIZE=128;COMPRESS=TRUE;"
                                                     "MULTI_THREADED=TRUE;MVCC=TRUE;DEFRAG_ALWAYS=TRUE;MAX_COMPACT_TIME=5000;"
                                                     "ANALYZE_AUTO=100")}))
    user> (jdbc/execute! config ["UPDATE _metabase_metadata
                            SET keypath = 'PEOPLE.ZIP.semantic_type'
                            WHERE keypath = 'PEOPLE.ZIP.special_type'" ])
    [1]
    user> (jdbc/execute! config ["UPDATE _metabase_metadata
                            SET keypath = 'REVIEWS.BODY.semantic_type'
                            WHERE keypath = 'REVIEWS.BODY.special_type'" ])
    [1]
    ```
    
    * Correct mismatch in validation preventing sync
    
    * fixing up alternative date tests
    
    * More passing tests
    
    * Tests for values, nested queries, fetch metadata
    
    * tests
    
    * tests passing
    
    * Fixup mongo qp for coercions
    
    locally i have some failing tests that are off by 1 errors:
    
    Fail in compile-time-interval-test
    
    [36m:mongo[0m Make sure time-intervals work the way they're supposed to. [:time-interval $date -4 :month] should give us something like Oct 01 2020 - Feb 01 2021 if today is Feb 17 2021
    
    expected: [{$match {$and [{:$expr {$gte [$date {:$dateFromString {:dateString 2020-10-01T00:00Z}}]}} {:$expr {$lt [$date {:$dateFromString {:dateString 2021-02-01T00:00Z}}]}}]}} {$group {_id {date~~~day {:$let {:vars {:parts {:$dateToParts {:date $date}}}, :in {:$dateFromParts {:year $$parts.year, :month $$parts.month, :day $$parts.day}}}}}}} {$sort {_id 1}} {$project {_id false, date~~~day $_id.date~~~day}} {$sort {date~~~day 1}} {$limit 1048576}]
    
      actual: [{"$match"
                {"$and"
                 [{:$expr {"$gte" ["$date" {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
                  {:$expr {"$lt" ["$date" {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}
               {"$group"
                {"_id"
                 {"date~~~day"
                  {:$let
                   {:vars {:parts {:$dateToParts {:date "$date"}}},
                    :in {:$dateFromParts {:year "$$parts.year", :month "$$parts.month", :day "$$parts.day"}}}}}}}
               {"$sort" {"_id" 1}}
               {"$project" {"_id" false, "date~~~day" "$_id.date~~~day"}}
               {"$sort" {"date~~~day" 1}}
               {"$limit" 1048576}]
        diff: - [{"$match"
                  {"$and"
                   [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-10-01T00:00Z"}}]}}
                    {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-02-01T00:00Z"}}]}}]}}]
              + [{"$match"
                  {"$and"
                   [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
                    {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}]
    
    * ee fixes
    
    * UI to set coercion type
    
    * Don't need to populate effective-type here
    
    it actually has knock on effects:
    - does more work now as almost every field has an update to do in
    `add-extra-metadata`
    - we have databases that have state that we don't create. druid for
    example has stuff to mimic the dataset in tqpt/with-flattened-dbdef on
    checkins but we don't actually create this. And our dbdef has a field
    called "date" that is not present in the druid db, so if we attempt to
    add metadata it fails and kills the rest of the metadata that we add.
    - tests need this metadata to be present and the error causes field
    visibilities (for example) to not be set
    
    * Docstrings on shared lib
    
    * Add effective and coercion to redshift expectations
    
    * Fixup google analytics
    
    * Derecordize instead of recordize the expectation
    
    object details didn't work out well here. they added way more stuff
    from the db than what is flowing through here.
    
    ```clojure
      actual: {:field
               {:name "DATE",
                :parent_id nil,
                :table_id 69,
                :base_type :type/Date,
                :effective_type :type/Date,
                :coercion_strategy nil,
                :semantic_type nil},
               :value {:type :date/all-options, :value "past5days"}}
        diff: - {:field
                 {:description nil,
                  :database_type "VARCHAR",
                  :fingerprint_version 0,
                  :has_field_values nil,
                  :settings nil,
                  :caveats nil,
                  :fk_target_field_id nil,
                  :custom_position 0,
                  :active true,
                  :last_analyzed nil,
                  :position 1,
                  :visibility_type :normal,
                  :preview_display true,
                  :database_position 0,
                  :fingerprint nil,
                  :points_of_interest nil}}
    ```
    
    Object defaults adds quite a bit of stuff such that we'd be dissoc'ing
    more than we are currently adding in
    
    Co-authored-by: default avatarCam Saul <1455846+camsaul@users.noreply.github.com>
    Semantic types 2 effective type (#15022)
    dpsutton authored
    
    * First pass using coercions
    
    * Coercions
    
    * Handle effective_type coercion_strategy in test data sets
    
    * special-type -> semantic type in sample db
    
    ```clojure
    user> (def config (metabase.db.spec/h2 {:db (str "/Users/dan/projects/clojure/metabase/resources/sample-dataset.db"
                                                     ";UNDO_LOG=0;CACHE_SIZE=131072;QUERY_CACHE_SIZE=128;COMPRESS=TRUE;"
                                                     "MULTI_THREADED=TRUE;MVCC=TRUE;DEFRAG_ALWAYS=TRUE;MAX_COMPACT_TIME=5000;"
                                                     "ANALYZE_AUTO=100")}))
    user> (jdbc/execute! config ["UPDATE _metabase_metadata
                            SET keypath = 'PEOPLE.ZIP.semantic_type'
                            WHERE keypath = 'PEOPLE.ZIP.special_type'" ])
    [1]
    user> (jdbc/execute! config ["UPDATE _metabase_metadata
                            SET keypath = 'REVIEWS.BODY.semantic_type'
                            WHERE keypath = 'REVIEWS.BODY.special_type'" ])
    [1]
    ```
    
    * Correct mismatch in validation preventing sync
    
    * fixing up alternative date tests
    
    * More passing tests
    
    * Tests for values, nested queries, fetch metadata
    
    * tests
    
    * tests passing
    
    * Fixup mongo qp for coercions
    
    locally i have some failing tests that are off by 1 errors:
    
    Fail in compile-time-interval-test
    
    [36m:mongo[0m Make sure time-intervals work the way they're supposed to. [:time-interval $date -4 :month] should give us something like Oct 01 2020 - Feb 01 2021 if today is Feb 17 2021
    
    expected: [{$match {$and [{:$expr {$gte [$date {:$dateFromString {:dateString 2020-10-01T00:00Z}}]}} {:$expr {$lt [$date {:$dateFromString {:dateString 2021-02-01T00:00Z}}]}}]}} {$group {_id {date~~~day {:$let {:vars {:parts {:$dateToParts {:date $date}}}, :in {:$dateFromParts {:year $$parts.year, :month $$parts.month, :day $$parts.day}}}}}}} {$sort {_id 1}} {$project {_id false, date~~~day $_id.date~~~day}} {$sort {date~~~day 1}} {$limit 1048576}]
    
      actual: [{"$match"
                {"$and"
                 [{:$expr {"$gte" ["$date" {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
                  {:$expr {"$lt" ["$date" {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}
               {"$group"
                {"_id"
                 {"date~~~day"
                  {:$let
                   {:vars {:parts {:$dateToParts {:date "$date"}}},
                    :in {:$dateFromParts {:year "$$parts.year", :month "$$parts.month", :day "$$parts.day"}}}}}}}
               {"$sort" {"_id" 1}}
               {"$project" {"_id" false, "date~~~day" "$_id.date~~~day"}}
               {"$sort" {"date~~~day" 1}}
               {"$limit" 1048576}]
        diff: - [{"$match"
                  {"$and"
                   [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-10-01T00:00Z"}}]}}
                    {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-02-01T00:00Z"}}]}}]}}]
              + [{"$match"
                  {"$and"
                   [{:$expr {"$gte" [nil {:$dateFromString {:dateString "2020-11-01T00:00Z"}}]}}
                    {:$expr {"$lt" [nil {:$dateFromString {:dateString "2021-03-01T00:00Z"}}]}}]}}]
    
    * ee fixes
    
    * UI to set coercion type
    
    * Don't need to populate effective-type here
    
    it actually has knock on effects:
    - does more work now as almost every field has an update to do in
    `add-extra-metadata`
    - we have databases that have state that we don't create. druid for
    example has stuff to mimic the dataset in tqpt/with-flattened-dbdef on
    checkins but we don't actually create this. And our dbdef has a field
    called "date" that is not present in the druid db, so if we attempt to
    add metadata it fails and kills the rest of the metadata that we add.
    - tests need this metadata to be present and the error causes field
    visibilities (for example) to not be set
    
    * Docstrings on shared lib
    
    * Add effective and coercion to redshift expectations
    
    * Fixup google analytics
    
    * Derecordize instead of recordize the expectation
    
    object details didn't work out well here. they added way more stuff
    from the db than what is flowing through here.
    
    ```clojure
      actual: {:field
               {:name "DATE",
                :parent_id nil,
                :table_id 69,
                :base_type :type/Date,
                :effective_type :type/Date,
                :coercion_strategy nil,
                :semantic_type nil},
               :value {:type :date/all-options, :value "past5days"}}
        diff: - {:field
                 {:description nil,
                  :database_type "VARCHAR",
                  :fingerprint_version 0,
                  :has_field_values nil,
                  :settings nil,
                  :caveats nil,
                  :fk_target_field_id nil,
                  :custom_position 0,
                  :active true,
                  :last_analyzed nil,
                  :position 1,
                  :visibility_type :normal,
                  :preview_display true,
                  :database_position 0,
                  :fingerprint nil,
                  :points_of_interest nil}}
    ```
    
    Object defaults adds quite a bit of stuff such that we'd be dissoc'ing
    more than we are currently adding in
    
    Co-authored-by: default avatarCam Saul <1455846+camsaul@users.noreply.github.com>
Code owners
Assign users and groups as approvers for specific file changes. Learn more.