Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated .
  1. Nov 17, 2021
    • Jeff Evans's avatar
      Secrets :closed_lock_with_key: PR 6 - Update Oracle properties (#18320) · 0480c2cc
      Jeff Evans authored
      
      * Add SSL keystore and truststore secret properties to Oracle driver YAML files
      
      
      Update Oracle driver to set keystore and truststore connection options from secret values
      
      Adding new `select-keys-sequentially` helper function for dealing with assertions on transformed conn props
      
      Add new function to the secret namespace to return a lighter weight "secret map", and remove that functionality from the `handle-db-details-secret-prop!`, so that it can be reused from connection testing
      
      Modifying CircleCI to set the corresponding truststore secret conn prop vars instead of JVM-level truststore settings, as the test had been doing previously
      
      Expand Oracle test connection details to incorporate all SSL related properties, and also make it a function instead of delayed def (so changes can be picked up later)
      
      Misc. fixes in Oracle driver YAML files
      
      Update `exception-classes-not-to-retry` to include SSLHandshakeException
      
      Fix misc issues in secret and database model code
      
      Update CircleCI config to use correct secret based env var keys and values
      
      Add *database-name-override* dynamic var in `metabase.test.data.interface` to allow for DB name overriding
      
      Fix up ssl connectivity test so all parts pass successfully
      
      Get rid of crazy with-redefs to swap in the existing test-data's schema, and instead, just dropping the DB if it was created under a different schema (similar to what H2 does)
      
      Co-authored-by: default avatarGustavo Saiani <gustavo@poe.ma>
      Unverified
      0480c2cc
  2. Nov 08, 2021
    • Jeff Evans's avatar
      Update new BigQuery driver to only use project-id in qualified names if it... · 3ef50788
      Jeff Evans authored
      Update new BigQuery driver to only use project-id in qualified names if it differs from credentials (#18843)
      
      * Update new BigQuery driver to only use project-id in qualified names if it differs from credentials
      
      Add new key to db-details for :bigquery-cloud-sdk when a database is updated, called :project-id-from-credentials, and extract the project-id from the service account creds when the database is saved
      
      Updating logic in bigquery-cloud-sdk QP to only include the project-id when qualifying an Identifier, if the project-id (override) value is set, and its value differs from the :project-id-from-credentials value mentioned above
      
      Adding a test to confirm that a card query continues to work after changing the associated DB's driver from :bigquery to :bigquery-cloud-sdk
      
      Change all syncs in the test to {:scan :schema} to speed them up (deeper level not needed)
      Unverified
      3ef50788
  3. Nov 04, 2021
  4. Oct 27, 2021
    • Jeff Evans's avatar
      Make the checkins_interval transient datasets more uniquely named in BigQuery test runs (#18600) · 5a76ef70
      Jeff Evans authored
      * Make the checkins_interval transient datasets more uniquely named
      
      Append a timestamp suffix (initialized when the namespace was loaded) to transient dataset names
      
      Add verbose comments explaining what we're doing
      
      Add transient-dataset? functions for both BQ drivers to put logic in one place (which currently just see if the dataset is named `checkins_interval_*`
      
      Ditch the "legacydriver_" prefix from the legacy BigQuery driver when creating new datasets; it's no longer needed now that transient datasets will have a much more unique suffix
      
      Update the logic for the transient dataset suffix so that it can be more easily detected/parsed later (for cleanup)
      
      Remove old (more than 2 hours) transient datasets when test extensions namespaces load
      
      Add various docstrings
      Unverified
      5a76ef70
  5. Oct 26, 2021
  6. Oct 19, 2021
    • Jeff Evans's avatar
      Prevent duplicate connection properties (#18359) · 22d23a98
      Jeff Evans authored
      Removing duplicate property declaration from presto-jdbc driver YAML
      
      Add test that executes against all drivers to confirm that no duplicate names come out of connection-properties
      
      Change the way the test runs to avoid needing to initialize test data namespace (to make googleanalytics happy)
      
      Unskip repro Cypress test
      Unverified
      22d23a98
  7. Oct 18, 2021
    • Cam Saul's avatar
    • Howon Lee's avatar
      Mongo custexp fixes: group by and filters (#18403) · 5c819b89
      Howon Lee authored
      Previously mongo custexps were just columns: you couldn't have group bys with them and you couldn't have filters with them. This allows those, because those weren't really tested before. Also enabled nemanja's tests for them.
      Unverified
      5c819b89
    • dpsutton's avatar
      Ensure we are paginating resultsets (#18477) · a33fa568
      dpsutton authored
      * Ensure we are paginating resultsets
      
      Made big tables in both pg and mysql
      
      pg:
      ```sql
      create table large_table
      (
          id         serial primary key,
          large_text text
      );
      
      insert into large_table (large_text)
      select repeat('Z', 4000)
      from generate_series(1, 500000)
      ```
      
      In mysql use the repl:
      ```clojure
      
        (jdbc/execute! (sql-jdbc.conn/db->pooled-connection-spec 5)
                       ["CREATE TABLE large_table (id int NOT NULL PRIMARY KEY AUTO_INCREMENT, foo text);"])
      
        (do
          (jdbc/insert-multi! (sql-jdbc.conn/db->pooled-connection-spec 5)
                              :large_table
                              (repeat 50000 {:foo (apply str (repeat 5000 "Z"))}))
          :done)
      
        (jdbc/execute! (sql-jdbc.conn/db->pooled-connection-spec 5)
                       ["ALTER TABLE large_table add column properties json default null"])
      
        (jdbc/execute! (sql-jdbc.conn/db->pooled-connection-spec 5)
                       ["update large_table set properties = '{\"data\":{\"cols\":null,\"native_form\":{\"query\":\"SELECT
                       `large_table`.`id` AS `id`, `large_table`.`foo` AS `foo` FROM `large_table` LIMIT
                       1\",\"params\":null},\"results_timezone\":\"UTC\",\"results_metadata\":{\"checksum\":\"0MnSKb8145UERWn18F5Uiw==\",\"columns\":[{\"semantic_type\":\"type/PK\",\"coercion_strategy\":null,\"name\":\"id\",\"field_ref\":[\"field\",200,null],\"effective_type\":\"type/Integer\",\"id\":200,\"display_name\":\"ID\",\"fingerprint\":null,\"base_type\":\"type/Integer\"},{\"semantic_type\":null,\"coercion_strategy\":null,\"name\":\"foo\",\"field_ref\":[\"field\",201,null],\"effective_type\":\"type/Text\",\"id\":201,\"display_name\":\"Foo\",\"fingerprint\":{\"global\":{\"distinct-count\":1,\"nil%\":0.0},\"type\":{\"type/Text\":{\"percent-json\":0.0,\"percent-url\":0.0,\"percent-email\":0.0,\"percent-state\":0.0,\"average-length\":500.0}}},\"base_type\":\"type/Text\"}]},\"insights\":null,\"count\":1}}'"])
      
      ```
      
      and then from the terminal client repeat this until we have 800,000 rows:
      ```sql
      insert into large_table (foo, properties) select foo, properties from large_table;
      ```
      
      Then can exercise from code with the following:
      
      ```clojure
      (-> (qp/process-query {:database 5 ; use appropriate db and tables here
                              :query {:source-table 42
                                      ;; :limit 1000000
                                      },
                              :type :query}
                              ;; don't retain any rows, purely just counting
                              ;; so resultset is what retains too many rows
                             {:rff (fn [metadata]
                                     (let [c (volatile! 0)]
                                       (fn count-rff
                                         ([]
                                          {:data metadata})
                                         ([result]
                                          (assoc-in result [:data :count] @c))
                                         ([result _row]
                                          (vswap! c inc)
                                          result))))
                              })
           :data :count)
      ```
      
      PG was far easier to blow up. Mysql took quite a bit of data.
      
      Then we just set a fetch size on the result set so that we (hopefully)
      only have than many rows in memory in the resultset at once. The
      streaming will write to the download stream as it goes.
      
      PG has one other complication in that the fetch size can only be honored
      if autoCommit is false. The reasoning seems to be that each statement is
      in a transaction and commits and to commit it has to close resultsets
      and therefore it has to realize the entire resultset otherwise you would
      only get the initial page if any.
      
      * Set default fetch size to 500
      
      ;; Long queries on gcloud pg
      ;; limit 10,000
      ;; fetch size | t1   | t2   | t3
      ;; -------------------------------
      ;; 100        | 6030 | 8804 | 5986
      ;; 500        | 1537 | 1535 | 1494
      ;; 1000       | 1714 | 1802 | 1611
      ;; 3000       | 1644 | 1595 | 2044
      
      ;; limit 30,000
      ;; fetch size | t1    | t2    | t3
      ;; -------------------------------
      ;; 100        | 17341 | 15991 | 16061
      ;; 500        | 4112  | 4182  | 4851
      ;; 1000       | 5075  | 4546  | 4284
      ;; 3000       | 5405  | 5055  | 4745
      
      * Only set fetch size if not default (0)
      
      Details of `:additional-options "defaultRowFetchSize=3000"` can set a
      default fetch size and we can easily honor that. This allows overriding
      per db without much work on our part.
      
      * Remove redshift custom fetch size code
      
      This removes the automatic insertion of a defaultRowFetchSize=5000 on
      redshift dbs. Now we always set this to 500 in the sql-jdbc statement
      and prepared statement fields. And we also allow custom ones to persist
      over our default of 500.
      
      One additional benefit of removing this is that it always included the
      option even if a user added ?defaultRowFetchSize=300 themselves so this
      should actually give more control to our users.
      
      Profiling quickly on selecting 79,000 rows from redshift, there
      essentially no difference between a fetch size of 500 (the default) and
      5000 (the old redshift default); both were 12442 ms or so.
      
      * unused require of settings in redshift tests
      
      * Appease the linter
      
      * Unnecessary redshift connection details tests
      Unverified
      a33fa568
  8. Oct 05, 2021
  9. Oct 03, 2021
  10. Sep 30, 2021
    • Jeff Evans's avatar
      Fix errors from disallowed characters in BigQuery custom expression names (#18055) · cb05523b
      Jeff Evans authored
      Add `escape-alias` multimethod to `sql.qp` to handle translating an alias (whether for a field or expression) into something supported by the driver in quesion
      
      Add default impl that is essentially identity
      
      Marking `sql.qp/field->alias` as deprecated, and changing its default implementation to invoke the new `sql.qp/escape-alias` on the field's `:name` (with the intention of removing it in a future release, since individual driver won't need to override this, so much as they'll need to override `escape-alias` itself).
      
      Override `sql.qp/escape-alias` method for both BigQuery drivers to run through the same `->valid-field-identifier` function already defined for this purpose
      
      Add test for a custom expression with weird characters
      Unverified
      cb05523b
  11. Sep 28, 2021
    • Pawit Pornkitprasan's avatar
      Fix loading sample dataset in Redshift (#18089) · a31a7d08
      Pawit Pornkitprasan authored
      - `TEXT` in Redshift is equivalent to `VARCHAR(256)`, use
        `VARCHAR(1024)` instead
      - Unskip all tests skipped on Redshift due to sample dataset issue
      - Add explicit `:order-by` to `join-source-queries-with-joins-test`
        to make it pass in Redshift as Redshift does not preserve order
        after join (like a lot of other distributed query engines)
      - Add `VACUUM` and `ANALYZE` call after INSERT to improve performance
      Unverified
      a31a7d08
    • Jeff Evans's avatar
      Fix BigQuery Percentile and Median expressions not using correct backtick quoting (#17979) · bca22d97
      Jeff Evans authored
       Fix BigQuery Percentile and Median expressions not using correct backtick quoting
      
      Ensure the identifier components are quoted by reifying `ToSql` for the `APPROX_QUANTILES` expression
      
      Add test to ensure query is created properly
      
      Mark `:percentile-aggregations` as being supported in the new driver
      Unverified
      bca22d97
  12. Sep 27, 2021
    • Jeff Evans's avatar
      Add test for ordering when nils are present (#17914) · 25aac414
      Jeff Evans authored
      Add test for ordering when nils are present
      
      Adding test to ensure that we can handle a column containing nils in an ORDER BY clause
      
      Adding `format-name` override for `bigquery-cloud-sdk` to convert dash to underscore in test table names
      
      Adding comment showing IDs of each row in the daily-bird-counts.edn file
      
      Format last column as int to make Oracle happy
      
      Add impls for `presto` and `presto-jdbc` `sorts-nil-first?` as false
      
      Make the `sorts-nil-first?` multimethod accept a base type in order to return an answer
      
      Vertica, at least, has a different answer for different types; see:
      https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AnalyzingData/Optimizations/NULLPlacementByAnalyticFunctions.htm#2
      
      Updating Vertica implementation to consider the `base-type` as per its documentation
      
      Updating all invocations of `sorts-nil-first?` in tests to pass the relevant base type
      Unverified
      25aac414
    • Jeff Evans's avatar
      Support query cancellation in new BigQuery driver (#17959) · 3aeac641
      Jeff Evans authored
      Support query cancellation in new BigQuery driver
      
      Remove now unused 60 second timeout
      
      Remove a bunch of unneeded code (`with-finished-response`) from the legacy driver
      
      Add support for canceling query, which stops pagination of results
      
      Add test to confirm pagination stops upon cancellation
      
      Add test to confirm that max rows enforcement via middleware prevents unnecessary pages from being fetched
      Unverified
      3aeac641
  13. Sep 20, 2021
  14. Sep 16, 2021
  15. Sep 15, 2021
  16. Sep 13, 2021
  17. Sep 10, 2021
    • Howon Lee's avatar
      BQ expression fixes (#17347) · 9774e08d
      Howon Lee authored
      This was a miserable grind in the old BQ syntax but a happy fun time in the new BQ syntax, basically just turning it on and squashing a bunch of test bugs.
      
      One test bug to note especially is that the underscore-transformed set of bird scarcity data for the bird scarcity tests is actually necessary, because BQ will hit you with a complaint about underscores even if you successfully and correctly quote the table names.
      Unverified
      9774e08d
  18. Sep 01, 2021
  19. Aug 31, 2021
  20. Aug 30, 2021
    • Jeff Evans's avatar
      Isolate BigQuery legacy and new driver test runs (#17626) · 41773668
      Jeff Evans authored
      * Randomize BigQuery test view names
      
      Use tu/random-name instead of a name based on the output of gensym
      
      * Add extra prefix for legacy BigQuery driver created databases
      
      * Update legacy BigQuery driver to have unique name for transient datasets
      
      For those datasets created and destroyed by the legacy BigQuery driver QP tests, add an additional prefix so that it doesn't conflict with the new BigQuery driver tests
      Unverified
      41773668
  21. Aug 25, 2021
    • Jeff Evans's avatar
      Update Presto JDBC lib version (#17591) · 0023987f
      Jeff Evans authored
      0.254 -> 0.260
      Unverified
      0023987f
    • dpsutton's avatar
      Effective type in result cols (#17533) · 32a0d6cb
      dpsutton authored
      * Include the results base type as the cols effective type
      
      previously was merged the effective type from the col which caused
      issues with dates when selecting a particular temporal unit, ie month.
      
      Queries like this return months as integers so the base and effective
      types are :type/Integer, and something somewhere else is responsible
      for the unit conversion of `1` to January (or feb, months get weird i
      think)
      
      ```clojure
      
      ;; after
      {:description "The date and time an order was submitted.",
       :unit :month-of-year,
       :name "CREATED_AT",
       :field_ref [:field 4 {:temporal-unit :month-of-year}],
       :effective_type :type/Integer, ;; we have the proper effective type
       :display_name "Created At",
       :base_type :type/Integer}
      
      ;; before:
      {:description "The date and time an order was submitted.",
       :unit :month-of-year,
       :name "CREATED_AT",
       :field_ref [:field 4 {:temporal-unit :month-of-year}],
       :effective_type :type/DateTime, ;; included effective type from db
       :display_name "Created At",
       :base_type :type/Integer}
      ```
      
      * tests
      
      * Driver tests
      
      * Ignore effective/base types in breakout tests
      
      sparksql is weird and its totally unrelated to the file under test
      
      * Use correct options for datetimes that are projected to month, etc
      
      When aggregating by a datetime column, you can choose a unit: month,
      hour-of-day, week, etc. You often (always?) end up with an
      integer. Previously the base_type was (inexplicably) :type/Integer and
      the effective_type was :type/DateTime which really just didn't make
      sense. The ideal solution is :type/DateTime as the base_type and
      :type/Integer as the effective_type. But this breaks the frontend as
      it expects to treat these columns in a special way. But it expected
      them to report as date columns.
      
      I've changed it only here. I thought about making isDate understand
      the `column.unit` attribute and recognize that these are in fact
      dates, but that seems wrong. These are integers that are special
      cases. It seems that the contexts that need to talk about dates should
      understand integers in a special way than all date code needs to be
      aware that integers might flow through.
      
      This might mean extra work but ultimately feels better as the correct
      solution.
      
      * unit is not default
      Unverified
      32a0d6cb
  22. Aug 24, 2021
    • Jeff Evans's avatar
      Add Kerberos support to Presto JDBC Driver (#16307) · f5cf4748
      Jeff Evans authored
      * Add Kerberos support to Presto JDBC Driver
      
      Adding Kerberos properties as DB details properties
      
      Refactoring utility methods for dealing with additional options
      
      Adding test for Kerb properties -> connection string
      
      Adding GitHub action to run integration test
      
      Start to move logic for capturing necessary files to new script, and calling that from mba run
      
      Adjust settings in `metabase.test.data.presto-jdbc` to capture test parameters
      Unverified
      f5cf4748
    • Jeff Evans's avatar
      Support counting grouped on an aggregated field in BigQuery (#17536) · ee73902a
      Jeff Evans authored
      Fix counting on grouped aggregated field
      
      Adding QP test
      
      Reworking the BigQuery version of the :breakout to more closely match the structure of the standard sql.qp version
      Unverified
      ee73902a
  23. Aug 23, 2021
    • Howon Lee's avatar
      Mongo custom expressions (#17117) · eb25bc71
      Howon Lee authored
      Mongo custom expressions now are turned on for mongo 5.0 and after only. One idiosyncrasy is that we don't really support the BSON ID format at all in our typesystem despite having it in there, but both of my attempts to get them to work with the native mongo pipeline commands bounced because they really aren't strings, they're BSON ID's. We do rely on the pipeline commands heavily which is the reason for the version requirement.
      Unverified
      eb25bc71
  24. Aug 20, 2021
    • Jeff Evans's avatar
      New BigQuery Driver (#16746) · a980e085
      Jeff Evans authored
      New BigQuery Driver
      
      Create new :bigquery-cloud-sdk driver using the google-cloud-bigquery library instead, and whose source is adapted from the :bigquery driver
      
      https://cloud.google.com/bigquery/docs/reference/libraries
      
      Marking existing :bigquery driver as deprecated, and superseded-by the new one (bigquery-cloud-sdk)
      
      Update new driver and query processor code to use newer Google SDK
      
      Switch test data loading over to use new API
      
      Add project-id connection property to override the value from the service account JSON, and use it as part of qualified names in the query processor if set
      
      Updating google driver so its libraries are compatible with the newer ones used in BigQuery
      
      Update date bucketing tests to skip :bigquery-cloud-sdk (new driver) where :bigquery is skipped
      
      Update `with-bigquery-fks` to take in the driver, since there are now multiple ones
      
      Adding test to confirm that overriding project-id for a public BQ project works (sync and query)
      
      Fixing a bunch of whitespace alignment errors in tests
      Unverified
      a980e085
  25. Aug 16, 2021
  26. Aug 10, 2021
  27. Aug 05, 2021
  28. Aug 04, 2021
    • Jeff Evans's avatar
      Fix Presto JDBC connection error (#17293) · 7285e3d4
      Jeff Evans authored
      Remove the `driver/can-connect?` implementation and move the `select-keys` logic instead to the existing place in `sql-jdbc.conn/connection-details->spec`
      
      Update test to also include the `:let-user-control-scheduling` details entry
      Unverified
      7285e3d4
  29. Aug 02, 2021
    • Jeff Evans's avatar
      Fixing Presto JDBC issues (#17265) · f4050ff1
      Jeff Evans authored
      Updating schema connection property to have proper display name
      
      Updating display name for database to be "Schema"
      
      Fixing `driver/can-connect?` impl in `presto_jdbc.clj` to actually use the correct driver, and dissoc `:engine` before calling the `sql-jdbc.conn` method
      
      Adding test for the `test-database-connection` method, which is what led to the error from the UI
      Unverified
      f4050ff1
  30. Jul 30, 2021
  31. Jul 29, 2021
  32. Jul 23, 2021
    • Jeff Evans's avatar
      Implement JDBC based Presto driver (#16194) · 80b46b1f
      Jeff Evans authored
      Implement JDBC based Presto driver
      
      Adding new Presto JDBC driver using the PrestoDB JDBC driver from `https://github.com/prestodb/presto`
      
      Marking the old Presto driver as being `superseded-by` the new one
      
      Pulling out common Presto code into new presto-common driver (modeled after the relationship between, ex: `googleanalytics` and `google)`
      
      Putting common QP/HoneySQL logic into the new (abstract) :presto-common driver
      
      Updating :presto driver to extend from the new common driver and only adding HTTP/REST API related methods
      
      Adding implementation of Presto JDBC driver, named :presto-jdbc, extending from :presto-common and :sql-jdbc
      
      Using com.facebook.presto/presto-jdbc as underlying JDBC driver dependency (since this is explicitly for Presto clusters, as opposed to Trino)
      
      Adapting code from the existing Presto driver where appropriate
      
      Adding new dependency-satisfied? implementation for :env-var, to allow for a plugin to require an env var be set, and making the Presto JDBC driver depend on that (specifically: `mb-enable-presto-jdbc-driver`)
      
      Adding CircleCI configuration to run against the newer Presto (0.254) Docker image
      
      Adding explicit ordering in a few tests where it was missing
      
      Fixing presto-type->base-type (timestamps were being synced as :type/Time because the regex pattern was wrong)
      
      Add tx/format-name test implementation for :presto-jdbc to lowercase table name
      
      Make modified test Oracle friendly
      
      Fixing bug parsing the `[:zone :time]` case within `metabase.util.date-2.parse/parse-with-formatter`; the offset is nil so it can't be passed directly in this case, so use the `standard-offset` fn (which was moved from `date-2` to `common` to get a standard offset for that zone
      
      Fixing more test failures by adding explicit ordering
      
      Changing sync to check whether the driver supports foreign keys before attempting to sync those (since drivers might throw an exception if attempting to check)
      
      Moving some common test dataset functionality between :presto and :presto-jdbc to a new test.data ns for :presto-common
      
      Adding HoneySQL form for :count-where, since we have to explicitly give a higher precision Decimal in order for Presto to not reduce the precision in results
      
      Put limit within subquery for `expression-using-aggregation-test` (since ordering from subquery is not guaranteed in Presto)
      
      Adding impls for ->prepared-substitution to handle substitutions for native query params
      
      Adding HoneySQL impls for `mod` (to do as "mod(x,y)" and `timestamp` (since it's a function with no parens to invoke) functions
      
      Adding various `sql.qp/date` impls that use the `AT TIME ZONE` operator to account for report tz, and make bucketing tests happy
      Unverified
      80b46b1f
Loading