Commits · 7c859a25bfb9bf7183768df68822bf608341acb6 · Engineering Digital Service / Metabase

This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated 1 minute ago.

Dec 17, 2024
- Add basic auth on druid JDBC driver (#50905) (#51331) · 6af3f9e5
  github-automation-metabase authored 2 months ago
  
  Co-authored-by: Demetrio <53023423+ddowbnac@users.noreply.github.com>
  6af3f9e5
Dec 16, 2024

Implement cumulative aggregators in mongo (#51135) (#51293) · 1540d83e

github-automation-metabase authored 2 months ago

Previously, cumulative aggregators were being done in clojure after the mongo query was finished. However, this
didn't work on multi-step queries where the cumulative aggregator wasn't the last step (for example, do a cumulative
sum and then filter the result). Implementing cumulative operators directly in mongo fixes this issue.

Co-authored-by: William <william@metabase.com>

1540d83e

Dec 10, 2024

backported "test: syncing views against all drivers" (#50897) · 68a464e4

github-automation-metabase authored 3 months ago


* test: see which drivers pass these tests

* Fix build

* Change test to use standard dataset, add initial implementation for bigquery

* Use fully qualified names for views

* Don't use transactions for databricks

* Make drop table only if exists, fix describe-index in mongo for views

* don't sync system.views

* Fix oracle drop-views, disable non-syncing views

* Use metabase-instance to find the table, this is important because redshift and oracle munge table name

* Fix oracle tests

* Fix oracle views

* Fix formatting

* Fix snowflake qualified components

* Change tests to opt out

* Fix sqlsserver

* Fix postgres test

* Disable h2 describe views test

* Address review

Co-authored-by: Case Nelson <case@metabase.com>

68a464e4

Nov 28, 2024
- update bigquery and netty (#50588) (#50617) · a4a2a001
  github-automation-metabase authored 3 months ago
  
  Co-authored-by: Alexander Solovyov <alexander@solovyov.net>
  a4a2a001
Nov 27, 2024

fix: snowflake allow setting schema to use (#50528) (#50544) · 8ae59fc0

github-automation-metabase authored 3 months ago


Fixes: #50422

Although we passed additional-options through subname in url parameters
to DriverManager, we were also passing `:schema nil` as connection property.

Co-authored-by: Case Nelson <case@metabase.com>

8ae59fc0

[Databricks] Add catalog validation into `can-connect?` implementation (#50468) (#50561) · 0617f368

github-automation-metabase authored 3 months ago


* Add catalog validation in can-connect? implementation

* Remove redundant let

* Adjust description

* Update modules/drivers/databricks/src/metabase/driver/databricks.clj



---------

Co-authored-by: lbrdnk <lbrdnk@users.noreply.github.com>
Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

0617f368

Nov 13, 2024

fix: bigquery all parameterized types (#49914) · 9ae02bef

Case Nelson authored 3 months ago

Fixes: #49913

In #49786 we handled parameterized STRING types like `STRING(255)`

Here we add handling for the other other types https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#parameterized_data_types

`BIGDECIMAL` and `DECIMAL` are aliases for `BIGNUMERIC` and `NUMERIC`
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#parameterized_decimal_type

9ae02bef

Nov 11, 2024
- Map Oracle DATE to type/DateTime (#49592) · 4850a85b
  lbrdnk authored 4 months ago
  
  * Map Oracle DATE to type/DateTime * Modify test * Add test
  4850a85b
Nov 08, 2024

fix: Handle bigquery string(255) in sync (#49800) · a2a4e7a5
Case Nelson authored 4 months ago
```
Fixes: #49786
```
a2a4e7a5

Bump Oracle driver (#49430) · 34827169

Luis Paolini authored 4 months ago


* Bump Oracle driver

Move to ojdbc11

* Update modules/drivers/oracle/deps.edn

Co-authored-by: Cam Saul <1455846+camsaul@users.noreply.github.com>

---------

Co-authored-by: Cam Saul <1455846+camsaul@users.noreply.github.com>

34827169

Nov 07, 2024
- redshift defines "text" as varchar(255) and we want it to be way longer (#49710) · 5331d381
  Alexander Solovyov authored 4 months ago
  
  5331d381
Nov 01, 2024

bump snowflake driver version (#47997) · f1fc4f3a

dpsutton authored 4 months ago


* bump snowflake for java 17+

* Update modules/drivers/snowflake/deps.edn

Co-authored-by: Cam Saul <1455846+camsaul@users.noreply.github.com>

* Update deps.edn

---------

Co-authored-by: Luis Paolini <paoliniluis@gmail.com>
Co-authored-by: Cam Saul <1455846+camsaul@users.noreply.github.com>

f1fc4f3a

fix: add UserAgentEntry to databricks connection (#49361) · 5bbafa83
Case Nelson authored 4 months ago
```
* fix: add UserAgentEntry to databricks connection

Fixes: #49350

* Add test

* Fix formatting
```
5bbafa83

Oct 31, 2024

[Databricks] Use system's information schema during sync (#48950) · 6923ccb0

lbrdnk authored 4 months ago


* Use system schema

* Use data only from selected catalog

* Adjust tests

* Update modules/drivers/databricks/test/metabase/driver/databricks_test.clj

Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

* Comments

---------

Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

6923ccb0

Oct 30, 2024

perf: Implement faster sync methods for postgres (#48576) · 318e4489

Case Nelson authored 4 months ago

* perf: Implement faster sync methods for postgres

Fixes #48575

Pulls work from redshift into the common postgres driver.

* Fix tests and formatting

* Move nested-field-column sync to sync functions so describe-fields will also get them

* Fix test

* Fix test

* Remove fixed safety test

* Add test specific database-supports feature for pk metadata

* Fix test

* Adrress PR feedback

* Fix test

* Don't use subselect for field-comment

* Fix quoting weird identifiers

* Make format string inline

318e4489

[redshift] Disable legacy parsing implementation for Redshift driver (#49205) · 9ab8d426
Oleksandr Yakushev authored 4 months ago

9ab8d426

Oct 29, 2024

Mongo objects should download as JSON, not EDN (#49255) · d23d7a3a

adam-james authored 4 months ago

* Mongo objects should download as JSON, not EDN

Fixes #48198

Prior to this change, object columns (base or effective type of :type/Dictionary) were just formatted with `(str
value)` which results in a csv or json download containing EDN formatted objects.

This is a bug because we present object column values as json in the app, so the expected formatting of the download
should match this.

The formatter function now takes this type into account. As well, since this is a type of formatting that should be
always applied (even when format_rows is false), the function is modified to unconditionally apply the json/encode
formatting to dictionary types when encountered.

* add a test

* add proper condition to test

* card-download should be public

* uncomment json encoding formatter

* set-cell! should keep encoded json string for Objects

I think this is the correct change; I don't really understand the reason for wrapping, encoding, decoding, and then
string-ing that value. Maybe I'm missing something.

* Adjusted xlsx Object set-cell! implementation

* forgot the not... inverted

d23d7a3a

Oct 21, 2024
- fix: bigquery repeated records should not expose nested fields (#48860) · c712c35c
  Case Nelson authored 4 months ago
  
  c712c35c
Oct 18, 2024

Use case expression type inference logic similar to annotate middleware in MLv2 (#47902) · 63a3b4b3

lbrdnk authored 4 months ago

* Use type date for case expression when there are Date and DateTime args

* Use logic for getting case expr type from annotate middleware

* Add e2e test

* Add bigquery test

* Update test

* cljfmt and comment

* Adjust e2e test

63a3b4b3

Oct 17, 2024
- fix: Always cast json number types as decimal (#48648) · 7925599e
  Case Nelson authored 4 months ago
  
  * fix: Always cast json number types as decimal Fixes #48507 * Fix tests * Fix tests * Fix test
  7925599e
Oct 14, 2024
- Use `information_schema` as a basis for `describe-database` in Databricks driver (#48585) · 9b5ecc05
  lbrdnk authored 4 months ago
  
  * Use information_schema for describe-database * Unrelated fix: set dynamic-dataset-loading defaults to true * cljfmt * Fix test
  9b5ecc05
Oct 09, 2024

Lean on DB queries for describe-table for Mongo (#46598) · 6324f948

Cal Herries authored 5 months ago


This PR reimplements driver/describe-table for MongoDB. Before we would query a sample of documents from a collection and analyse them in Clojure. Instead, we now now execute a query that does a similar aggregation, but most of the calculation is done in the Mongo database.

Based on a few tests the performance is slightly slower when the collection contains small or deeply nested documents but much faster for large ones. But the main difference is in memory usage. This uses very little memory in the Metabase instance because all of the aggregation is done in the database.


Nested fields are a naturally recursive problem but here we unroll potential recursions to a `max-depth` number of queries that look for nesting at each depth level.

* ~ use DB to describe the table

* ~ optimize root query

* ~ nested-level-query works and gets objects too

* + root query gets objects too

* + driver/describe-table :mongo works

* ~ remove old implementation

* Various fixes for faster sync

Upgraded driver to 5.2.0
Updated data load to insert many rather than 1 row at a time.
Dropped max-depth to 7, see comment.

---------

Co-authored-by: Case Nelson <case@metabase.com>

6324f948

Oct 08, 2024

[Databricks] Address initial remarks (#48377) · 72495873

lbrdnk authored 5 months ago

* Address initial remarks

* Extract hive-like to separate module and set it as dependency

* Remove hive-like also from spark

72495873

SHAVE 7 MINUTES OFF OF NON-CORE DRIVER TEST RUNS IN CI

(#47681) · cd4d7646

Cam Saul authored 5 months ago

* Parallel driver tests PoC

* Set fail-fast to false for now

* Try splitting up non-driver tests to see how broken tests are

* Whoops fix plain BE tests

* Ok nvm I'll test this in another branch

* Fix fail-fast

* Experiment with the improved Hawk split logic

* Fix some broken/flaky tests

* Experiment: try splitting MySQL 8 tests into FOUR jobs

* Divide other Postgres and MySQL tests up and use num-partitions = 2

* Another test fix 

* Flaky test fix 

* Try making more stuff fast

* Make athena fast??

* Fix a few more things

* Test fixes? 

* Fix configs

* Fix Mongo job syntax

* Fix busted test from #46942

* Fix Mongo config again

* wait-for-port needs to specify shell I guess

* More cleanup

* await-port can't have a timeout-minutes I guess

* Let's only parallelize MySQL for now.

* Cleanup action

* Cleanup wait-for-port action

* Fix another flaky test

* NOW driver tests will be FAST

* Need to mark driver tests too

* Fix wrong tag

* Use Hawk 1.0.5

* Fix busted metabase.public-settings-test/landing-page-setting-test

* Fix busted `metabase.api.database-test/get-database-test` etc. (hopefully)

* Fix busted `metabase.sync.sync-metadata.fields-test/sync-fks-and-fields-test` for Oracle

* Maybe this fixed `metabase.query-processor.middleware.permissions-test/e2e-ignore-user-supplied-perms-test` maybe not

* Fix busted metabase.api.dashboard-test/dependent-metadata-test because endpoint had differemt sort order than test

* Ok my test fix did not work

* Fix metabase.sync.sync-metadata.fields-test/sync-fks-and-fields-test for Redshift

* Better test name

* More test fixes 

* Schema fix

* PR feedback

* Split off test partitioning into separate PR

* Fix failing Oracle tests

* Another round of test fixes, hopefully

* Fix failing Redshift tests

* Maybe the last round of test fixes

* Fix Oracle

* Fix stray line

cd4d7646

Oct 02, 2024

fix: bigquery more resilient querying (#48175) · 3fd17081

Case Nelson authored 5 months ago

* fix: bigquery more resilient querying

Inline some function calls to make it easier to track what's happening.

Make sure that cancellation during the initial query and subsequent page
fetches are handled properly. Explicitly throw when cancelled.

Only retry queries if bigquery says they are retryable.

Try to cancel the BigQuery job if an exception or cancellation occurs.

* Add comment explaining execution flow

* Bump bigquery deps

* Bump biquery dependencies

* Fix tests

* Fix formatting

3fd17081

Sep 27, 2024

[QP] Don't attach `:temporal-unit :default` to every temporal clause (#48085) · e8deecc5

Braden Shepherdson authored 5 months ago

This was old logic to support certain drivers (eg. pre-JDBC Druid) and
isn't required for most. It's perfectly sound to filter or even break
out on a datetime column without bucketing.

Adds a new `:temporal/requires-default-unit` driver feature, and enables it only for the legacy Druid driver.

Fixes #47341

e8deecc5

Temporarily fix hive-like dependency (#48148) · 22260b73
lbrdnk authored 5 months ago

22260b73

Sep 26, 2024

Databricks JDBC driver (#42263) · c04928d5

lbrdnk authored 5 months ago

* Databricks JDBC driver base

* Add databricks CI job

* WIP data loading -- it works, further cleanup needed

* Cleanup

* Implement ->honeysql to enable data loading

* Hardcode catalog job var

* Implement driver methods and update tests

* Derive hive instead of sql-jdbc

* Cleanup leftovers after deriving hive

* Run databricks tests on push

* Cleanp and enable set-timezone

* Disable database creation by tests

* Add Databricks to broken drivers for timezone tests

* Exclude Databricks from test

* Enable have-select-privilege?-test

* Restore sql-jdbc-drivers-using-default-describe-table-or-fields-impl post rebase

* Restore joined-date-filter-test

* Adjust to work with dataset definition tests

* Adjust alternative date tests

* Remove leftover reflecttion warning set

* Update test exts

* cljfmt vscode

* Add databricks to kondo drivers

* Update metabase-plugin.yaml

* Update databricks_jdbc.clj

* Rework test extensions

* Update general data loading code to work with Databricks

* Reset tests to orig

* Use DateTimeWithLocalTZ for TIMESTAMP database type

* Convert to LocalDateTime for set-parameter

* Update test extensions field-base-type->sql-tyoe

* Update database-type->base-type

* Enable creation of time columns in test data even though not supported

* Fix typo

* Update tests

* Udpate tests

* Update drivers.yml

* Disable dynamic dataset loading tests

* Adjust the iso-8601-text-fields-should-be-queryable-date-test

* Update load-data/row-xform

* Add time type exception to test

* Update test data loading and enable test

* Whitespace

* Enable all driver jobs

* Update comment

* Make catalog mandatory

* Remove comment

* Remove log level from spec generation

* Update sql.qp/datetime-diff

* Update read-column-thunk

* Remove comment

* Simplify date-time->results-local-date-time

* Update comment

* Move definitions

* Update test extension types mapping

* Remove now obsolete ddl/insert-rows-honeysql-form implementation

* Update sql-jdbc.conn/connection-details->spec for perturb-db-details

* Update load-data/do-insert!

* Remove ssh tunnel from driver as tests do not work with it

* Update test

* Promote ::dynamic-dataset-loading to :test/dynamic-dataset-loading and modify corresponding tests

* Adjust to broken TIMESTAMP_NTZ sync

* Update read-column-thunk to return timestamps always in Z

* Comment

* Disable tests for dynamic datasets

* Return spark jobs into drivers.yml

* Update Databricks CI catalog

* Remove vscode cljfmt tweak

* Update iso-8601-text-fields-expected-rows

* Update datetime-diff

* Formatting

* cljfmt

* Add placeholder test

* Remove comment

* cljfmt

* Use EnableArrow=0 connection param

* Remove comment

* Comment

* Update tests

* cljfmt

* Update driver's deps.edn

* Update tests

* Implement alternative `describe-table`

* WIP Workaround for timestamp_ntz sync, will be thrown away probably

* Update metabase-plugin.yaml with schema filters

* Update driver to use schema filters and remove now redundant sync implemnetations

* Update tests

* Update tests extensions

* Update test

* Add feature flags for fast sync

* Implement describe-fields

* Implement describe-fks-sql

* Enable fast sync features

* Use full_data_type

* Comment

* Add exception for timestamp_ntz columns to new sync code

* Implement db-default-timezone

* Add timestamp_ntz ignored test

* Add db-default-timezone-test

* Fix typo

* Update setReadOnly

* Add comment on setAutoCommit

* Update chunk-size

* Add timezone-in-set-and-read-functions-test

* Drop Athena from driver exceptions

* Use set/intersection instead of a filter

* Add explicit fast-sync tests

* Update describe-fields-sql and add comment

* Add preprocess-additional-options

* Add leading semicolon test

* Disable dataset creation and update comment

* Rename driver to `databricks`

* Use old secret names

* Fix wrongly copied hsql list

* Temporarily allow database creation

* Add *allow-database-deletion*

* Temporarily allow database creation

* Disable database creation

* cljfmt

* cljfmt

c04928d5

Sep 24, 2024

Bump redshift driver (#47957) · 1e45098d
Case Nelson authored 5 months ago

1e45098d

feat: BigQuery Faster Sync (#48027) · 1186871d

Case Nelson authored 5 months ago


* feat: BigQuery Faster Sync

Use describe-fields for much faster big-query sync.

---------

Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

1186871d

Sep 16, 2024

fix: snowflake compile day trunc to timestamp_ltz (#47874) · 85857aeb

Case Nelson authored 5 months ago

Fixes #47426

Date operations return values based on the timestamp offset rather than
the session/db timezone. Converting to `timestamp_ltz` first ensures
that we get predictable results.

In the test, it is important that database timezone matches
report_timezone so that post-processing gives proper results.

85857aeb

Sep 09, 2024

fix: bigquery add null checks to result processing (#47590) · 07dd373f

Case Nelson authored 6 months ago

* fix: bigquery add null checks to result processing

Fixes #47339

On the related issue there are different stacktraces indicating
likely sources of null pointer exceptions.

1. `.getNextPage` is likely returning a nil value. I was unable to reproduce this but one thing I did notice is that `hasNextPage` is recommended over checking `.getNextPageToken`. Added nil handling around `page` possibly being nil.

2. `cancel-chan` may be triggered before processing begins and such `execute-bigquery` would pass nil as a TableResult to the initial reducer. Testing if cancel-chan happens at just the right moment would be too flaky for CI testing but I was able to reproduce this locally and is fixed by the nil handling added.

3. `cancel-chan` may be triggered during query processing. This is covered by a test now.

* Check hasNextPage in test:

* Add test for null getNextPage

* Fix cljfmt

07dd373f

Sep 04, 2024

fix: sqlserver handle uniqueidentifier uuids (#47544) · b46a6592

Case Nelson authored 6 months ago

* fix: sqlserver handle uniqueidentifier uuids

Fixes #46148

Include sqlserver in `uuid-type` handling as its `uniqueidentifier` type
stores uuids.

* Don't be so precise with varchar size

* Add seam for drivers to cast to text type

* Fix arg order

b46a6592

Aug 30, 2024

fix: bigquery handle local date params as date (#47423) · f06643c2

Case Nelson authored 6 months ago

Fixes #30602

We were overriding localdate in bigquery driver to produce
offsetdatetimes for no apparent reason. This causes a number of problems
when dealing with date type columns.

f06643c2

Bigquery verify bignumeric fingerprint and bin (#47381) · 0931ccec

Case Nelson authored 6 months ago

* Bigquery verify bignumeric fingerprint and bin

Fixes #28573

Field values must be scanned.

In 45.3, these fields were not being fingerprinted properly. (tag `v1.45.3`)
In a 45 point release that appears to have been fixed. (branch `release-x.45.x`)
Confirmed fixed in master, adding test

* cljfmt

0931ccec

Aug 29, 2024
- update aws-java-sdk-core: brings in new jackson with fixed vulnerability (#47384) · 036513db
  Alexander Solovyov authored 6 months ago
  
  036513db
Aug 28, 2024

[QP] Return datetimes in SQL Server, even with date-sized truncation (#47248) · cb222495

Braden Shepherdson authored 6 months ago

Previously, truncating a `:type/DateTime` column by `:month` or `:day`
would return a `:type/Date`, which subtly broke the query.

In particular, if you try to order-by the breakout column `Created At (month)`
then it would not get de-duplicated, causing a SQL error about conflicting
ORDER BY clauses.

Fixes #46992.

cb222495

Aug 27, 2024
- Bump MS SQL Driver to 12.8.1 (#47264) · 9fe5a85e
  Luis Paolini authored 6 months ago
  
  Lotsa fixes for this
  9fe5a85e
Aug 26, 2024

Add database type adjustments to server side generated temporal values (#47039) · ed509b53

lbrdnk authored 6 months ago


* Add database type adjustments to server side generated temporal values

* Speculative base type change

* Infer timezone only from _actual_ user, not global user level

* Set timezone for actual user in test extensions

* Add test case

* Update type info generation

* Adjust server side relative datetime

* Update src/metabase/driver/sql/query_processor.clj

Co-authored-by: Braden Shepherdson <braden@metabase.com>

* Fix base type logic

* Fix typo

* Fix formatting

* Update user name handling

* cljfmt

* Address review comments

---------

Co-authored-by: Braden Shepherdson <braden@metabase.com>

ed509b53

Aug 23, 2024

feat: BigQuery nested fields (#47106) · 25d30ca9

Case Nelson authored 6 months ago


* feat: BigQuery nested fields

Updates dataset definitions to allow `nested-fields`

Adds the `::sql.qp/nfc-path` to include nfc-path in the field
identifier.

* Add bigquery to nested-fields feature. Rework geographical-tips to allow adding to bigquery

* Parse RECORD type into maps

* Fix tests

* Fix tests

* Fix formatting

* Handle arrays of records

* Fix formatting

* Conditionally add nested-fields

* Fix test

* Update docs/developers-guide/driver-changelog.md

Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

* Update docs/developers-guide/driver-changelog.md

Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

* Update modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk.clj

Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

---------

Co-authored-by: metamben <103100869+metamben@users.noreply.github.com>

25d30ca9