Commits · f948ea6fa4b4079ed52f20323594d06153c5e799 · Engineering Digital Service / Metabase

This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated 7 minutes ago.

May 06, 2022

Fix :Postgres :day-of-week extracts coming back as 0..6 instead of 1..7 (#22167) · dcc0b779

Cam Saul authored 2 years ago

* Fix :Postgres :day-of-week extracts coming back as 0..6 instead of 1..7

* Test Tuesday-Saturday per @dpsutton suggestion and fix issues with those as well 

* Fix MongoDB for Tuesday

Unverified

dcc0b779

May 02, 2022

Persisted models schema (#21109) · c504a12e

dpsutton authored 2 years ago

* dir locals for api/let-404

* Driver supports persisted model

* PersistedInfo model

far easier to develop this model with the following sql:

```sql
create table persisted_info (
id serial primary key not null
,db_id int references metabase_database(id) not null
,card_id int references report_card(id) not null
,question_slug text not null
,query_hash text not null
,table_name text not null
,active bool not null
,state text not null
,UNIQUE (db_id, card_id)
)

```
and i'll make the migration later. Way easier to just dorp table, \i
persist.sql and keep developing without worrying about the migration
having changed so it can't rollback, SHAs, etc

* Persisting api (not making/deleting tables yet)

http POST "localhost:3000/api/card/4075/persist" Cookie:$COOKIE -pb
http DELETE "localhost:3000/api/card/4075/persist" Cookie:$COOKIE -pb

useful from commandline (this is httpie)

* Pull format-name into ddl.i

* Postgres ddl

* Hook up endpoints

* move schema-name into interface

* better jdbc connection management

* Hotswap peristed tables into qp

* clj-kondo fixes

* docstrings

* bad alias in test infra

* goodbye testing format-name function

left over. everything uses ddl.i/format-name and this rump was left

* keep columns in persisted info

columns that are in the persisted query. I thought about a tuple of
[col-name type] instead of just the col-name. I didn't do this this type
because I want to ensure that we compute the db-type in ONLY ONE WAY
ever and i wasn't ready to commit to that yet. I'm not sure this is
necessary in the future so it remains out now.

Context: we hot-swap the persisted table in for the original
query. Match up on query hash remaining the same. It continues to use
the metadata from the original query and just `select cols from table`

* Add migration for persisted_info table

also removes the db_id. Don't know why i was thinking that was
necessary. also means we don't need another unique constraint on (db_id,
card_id) since we can just mark the card_id as unique. no idea what i
was thinking.

* fix ns in a sad manner :(

far better to just have no alias to indicate it is required for side
effects.

* Dont hardcode a card-id :(:(:( my B

* copy the PersistedInfo

* ns cleanup, wrong alias, reflection warning

* Check that state of persisted_info is persisted

* api to enable persistence on a db

i'm not wild about POST /api/database/:id/persist and POST
/api/database/:id/unpersist but carrying on. left a note about it.

So now you can enable persistence on a db, enable persistence on a model
by posting to api/card/:id/persist and everything works.

What does not work yet is the unpersisting or re-persisting of models
when using the db toggle.

* Add refresh_begin and refresh_end to persisted_info

This information helps us with two bits:
- when we need to chunk refreshing models, this lets us order by
staleness so we can refresh a few models and pick up later
- if we desire, we can look at the previous elapsed time of refreshes
and try to gauge amount of work we want. This gives us a bit of
look-ahead. We can of course track our progress as we go but there's no
way to know if the next refresh might take an hour. This gives us a bit
of insight.

* Refresh tables every 8 hours ("0 0 0/8 * * ? *")

Tables are refreshed every 8 hours. There is one single job doing this
named "metabase.task.PersistenceRefresh.job" but it has 0 triggers by
default. Each database with persisted models will add a trigger to this
to refresh those models every 8 hours.

When you unpersist a model, it will immediately remove the table and
then delete the persisted_info record.

When you mark a database as persist false, it will immediately mark all
persisted_info rows as inactive and deleteable, and unschedule its
trigger. A background thread will then start removing the tables.

* Schedule refreshing on startup, watching for already scheduled

does not allow for schedule changes but that's a future endeavor

* appease our linter overlords

* Dynamic var to inbhit persistence when refreshing

also, it checked the state against "active" instead of "persisted" which
is really freaky. how has this worked in the past if thats the case?

* api docstrings on card persist

* docstring

* Don't sync the persisted schemas

* Fix bad sql when no deleteable rows

getting error with bad sql when there were no ids

* TaskHistory for refreshing

* Add created_at to persist_info table

helpful if this ever ends up in the audit section

* works on redshift

hooked up the hierarchy and redshift is close enought that it just works

* Remove persist_info record after deleting "deleteable"

* Better way to check that something exists

* POST /api/<card-id>/refresh

api to refresh a model's persisted record

* return a 204 from refreshing

* Add buttons to persist/unpersist a database and a model for PoC (#21344)

* Redshift and postgres report true for persist-models

there are separate notions of persistence is possible vs persistence is
enabled. Seems like we're just gonna check details for enabled and rely
on the driver multimethod for whether it is possible.

* feature for enabled, hydrate card with persisted

two features: :persist-models for which dbs support it, and
:persist-models-enabled for when that option is enabled.

POST to api/<card-id>/unpersist

hydrate persisted on cards so FE can display persist/unpersist for
models

* adjust migration number

* remove deferred-tru

* conditionally hydrate persisted on models only

* Look in right spot for persist-models-enabled

* Move persist enabled into options not details

changing details recomposes the pool, which is especially bad now that
we have refresh tasks going on reusing the same connection

* outdated comment

* Clean up source queries from persisted models

their metadata might have had [:field 19 nil] field_refs and we should
substitute just [:field "the-name" {:base-type :type/Whatever-type}
since it will be a select from a native query.

Otherwise you get the following:

```
2022-03-31 15:52:11,579 INFO api.dataset :: Source query for this query is Card 4,088
2022-03-31 15:52:11,595 WARN middleware.fix-bad-references :: Bad :field clause [:field 4070 nil] for field "category.catid" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,596 WARN middleware.fix-bad-references :: Bad :field clause [:field 4068 nil] for field "category.catgroup" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,596 WARN middleware.fix-bad-references :: Bad :field clause [:field 4071 nil] for field "category.catname" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,596 WARN middleware.fix-bad-references :: Bad :field clause [:field 4069 nil] for field "category.catdesc" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,611 WARN middleware.fix-bad-references :: Bad :field clause [:field 4070 nil] for field "category.catid" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,611 WARN middleware.fix-bad-references :: Bad :field clause [:field 4068 nil] for field "category.catgroup" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,611 WARN middleware.fix-bad-references :: Bad :field clause [:field 4071 nil] for field "category.catname" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,611 WARN middleware.fix-bad-references :: Bad :field clause [:field 4069 nil] for field "category.catdesc" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,622 WARN middleware.fix-bad-references :: Bad :field clause [:field 4070 nil] for field "category.catid" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,622 WARN middleware.fix-bad-references :: Bad :field clause [:field 4068 nil] for field "category.catgroup" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,622 WARN middleware.fix-bad-references :: Bad :field clause [:field 4071 nil] for field "category.catname" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
2022-03-31 15:52:11,623 WARN middleware.fix-bad-references :: Bad :field clause [:field 4069 nil] for field "category.catdesc" at [:fields]: clause should have a :join-alias. Unable to infer an appropriate join. Query may not work as expected.
```
I think its complaining that that table is not joined in the query and
giving up.

While doing this i see we are hitting the database a lot:

```
2022-03-31 22:52:18,838 INFO api.dataset :: Source query for this query is Card 4,111
2022-03-31 22:52:18,887 INFO middleware.fetch-source-query :: Substituting cached query for card 4,111 from metabase_cache_1e483_229.model_4111_redshift_c
2022-03-31 22:52:18,918 INFO middleware.fetch-source-query :: Substituting cached query for card 4,111 from metabase_cache_1e483_229.model_4111_redshift_c
2022-03-31 22:52:18,930 INFO middleware.fetch-source-query :: Substituting cached query for card 4,111 from metabase_cache_1e483_229.model_4111_redshift_c
```

I tried to track down why we are doing this so much but couldn't get
there.

I think I need to ensure that we are using the query store annoyingly :(

* Handle native queries

didn't nest the vector in the `or` clause correctly. that was truthy
only when the mbql-query local was truthy. Can't put the vector `[false
mbql-query]` there and rely on that behavior

* handle datetimetz in persisting

* Errors saved into persisted_info

* Reorder migrations to put v43.00-047 before 048

* correct arity mismatch in tests

* comment in refresh task

* GET localhost:3000/api/persist

Returns persisting information:
- most information from the `persist_info` table. Excludes a few
columns (query_hash, question_slug, created_at)
- adds database name and card name
- adds next fire time from quartz scheduling

```shell
❯ http GET "localhost:3000/api/persist" Cookie:$COOKIE -pb
[
{
"active": false,
"card_name": "hooking reviews to events",
"columns": [
"issue__number",
"actor__login",
"user__login",
"submitted_at",
"state"
],
"database_id": 19,
"database_name": "pg-testing",
"error": "No method in multimethod 'field-base-type->sql-type' for dispatch value: [:postgres :type/DateTimeWithLocalTZ]",
"id": 4,
"next-fire-time": "2022-04-06T08:00:00.000Z",
"refresh_begin": "2022-04-05T20:16:54.654283Z",
"refresh_end": "2022-04-05T20:16:54.687377Z",
"schema_name": "metabase_cache_1e483_19",
"state": "error",
"table_name": "model_4077_hooking_re"
},
{
"active": true,
"card_name": "redshift Categories",
"columns": [
"catid",
"catgroup",
"catname",
"catdesc"
],
"database_id": 229,
"database_name": "redshift",
"error": null,
"id": 3,
"next-fire-time": "2022-04-06T08:00:00.000Z",
"refresh_begin": "2022-04-06T00:00:01.242505Z",
"refresh_end": "2022-04-06T00:00:01.825512Z",
"schema_name": "metabase_cache_1e483_229",
"state": "persisted",
"table_name": "model_4088_redshift_c"
}
]

```

* include card_id in /api/persist

* drop table if exists

* Handle rescheduling refresh intervals

There is a single global value for the refresh interval. The API
requires it to be 1<=value<=23. There is no validation if someone
changes the value in the db or with an env variable. Setting this to a
nonsensical value could cause enormous load on the db so they shouldn't
do that.

On startup, unschedule all tasks and then reschedule them to make sure
that they have the latest value.

One thing to note: there is a single global value but i'm making a task
for each database. Seems like an obvious future enhancement so I don't
want to deal with migrations. Figure this gives us the current spec
behavior to have a trigger for each db with the same value and lets us
get more interesting using the `:options` on the database in the
future.

* Mark as admin not internal

lets it show up in `api/setting/` . I'm torn on how special this value
is. Is it the setting code's requirement to invoke the reschedule
refresh triggers or should that be on the setting itself.

It feels "special" and can do a lot of work from such just setting an
integer. There's a special endpoint to set it which is aware, and thus
would be a bit of an error to set this setting through the more
traditional setting endpoint

* Allow for "once a day" refresh interval

* Global setting to enable/disable

post api/persist/enable
post api/persist/disable

enable allows for other scheduling operations (enabling on a db, and
then on a model).

Disable will
- update each enabled database and disable in options
- update each persisted_info record and set it inactive and state
deleteable
- unschedule triggers to refresh
- schedule task to unpersist each model (deleting table and associated
pesisted_info row)

* offset and limits on persisted info list

```shell
http get "localhost:3000/api/persist?limit=1&offset=1" Cookie:$COOKIE -pb
{
"data": [
{
"active": true,
"card_id": 4114,
"card_name": "Categories from redshift",
"columns": [
"catid",
"catgroup",
"catname",
"catdesc"
],
"database_id": 229,
"database_name": "redshift",
"error": null,
"id": 12,
"next-fire-time": "2022-04-08T00:00:00.000Z",
"refresh_begin": "2022-04-07T22:12:49.209997Z",
"refresh_end": "2022-04-07T22:12:49.720232Z",
"schema_name": "metabase_cache_1e483_229",
"state": "persisted",
"table_name": "model_4114_categories"
}
],
"limit": 1,
"offset": 1,
"total": 2
}
```

* Include collection id, name, and authority level

* Include creator on persisted-info records

* Add settings to manage model persistence globally (#21546)

* Common machinery for running steps

* Add model cache refreshes monitoring page (#21551)

* don't do shenanigans

* Refresh persisted and error persisted_info rows

* Remarks on migration column

* Lint nits (sorted-ns and docstrings)

* Clean up unused function, docstring

* Use `onChanged` prop to call extra endpoints (#21593)

* Tests for persist-refresh

* Reorder requires

* Use quartz for individual refreshing for safety

switch to using one-off jobs to refresh individual tables. Required
adding some job context so we know which type to run.

Also, cleaned up the interface between ddl.interface and the
implementations. The common behaviors of advancing persisted-info state,
setting active, duration, etc are in a public `persist!` function which
then calls to the multimethod `persist!*` function for just the
individual action on the cached table.

Still more work to be done:
- do we want creating and deleting to be put into this type of system?
Quite possible
- we still don't know if a query is running against the cached table
that can prevent dropping the table. Perhaps using some delay to give
time for any running query to finish. I don't think we can easily solve
this in general because another instance in the cluster could be
querying against it and we don't have any quick pub/sub type of
information sharing. DB writes would be quite heavy.
- clean up the ddl.i/unpersist method in the same way we did refresh and
persist. Not quite clear what to do about errors, return values, etc.

* Update tests with more job-info in context

* Fix URL type conflicts

* Whoops get rid of our Thread/sleep test :)

* Some tests for the new job-data, clean up task history saving

* Fix database model persistence button states (#21636)

* Use plain database instance on form

* Fix DB model persistence toggle button state

* Add common `getSetting` selector

* Don't show caching button when turned off globally

* Fix text issue

* Move button from "Danger zone"

* Fix unit test

* Skip default setting update request for model persistence settings (#21669)

* Add a way to skip default setting update request

* Skip default setting update for persistence

* Add changes for front end persistence

- Order by refresh_begin descending
- Add endpoint /persist/:persisted-info-id for fetching a single entry.

* Move PersistInfo creation into interface function

* Hide model cache monitoring page when caching is turned off (#21729)

* Add persistence setting keys to `SettingName` type

* Conditionally hide "Tools" from admin navigation

* Conditionally hide caching Tools tab

* Add route guard for Tools

* Handle missing settings during init

* Add route for fetching persistence by card-id

* Wrangling persisted-info states

Make quartz jobs handle any changes to database.
Routes mark persisted-info state and potentially trigger jobs.
Job read persisted-info state.

Jobs

- Prune
-- deletes PersistedInfo `deleteable`
-- deletes cache table

- Refresh
-- ignores `deletable`
-- update PersistedInfo `refreshing`
-- drop/create/populate cache table

Routes

card/x/persist
- creates the PersistedInfo `creating`
- trigger individual refresh

card/x/unpersist
- marks the PersistedInfo `deletable`

database/x/unpersist
- marks the PersistedInfos `deletable`
- stops refresh job

database/x/persist
- starts refresh job

/persist/enable
- starts prune job

/persist/disable
- stops prune job
- stops refresh jobs
- trigger prune once

* Save the definition on persist info

This removes the columns and query_hash columns in favor of definition.

This means, that if the persisted understanding of the model is
different than the actual model during fetch source query we won't
substitute.

This makes sure we keep columns and datatypes in line.

* Remove columns from api call

* Add a cache section to model details sidebar (#21771)

* Extract `ModelCacheRefreshJob` type

* Add model cache section to sidebar

* Use `ModelCacheRefreshStatus` type name

* Add endpoint to fetch persistence info by model ID

* Use new endpoint at QB

* Use `CardId` from `metabase-types/api`

* Remove console.log

* Fix `getPersistedModelInfoByModelId` selector

* Use `t` instead of `jt`

* Provide seam for prune testing

- Fix spelling of deletable

* Include query hash on persisted_info

we thought we could get away with just checking the definition but that
is schema shaped. So if you changed a where clause we should invalidate
but the definition would be the same (same table name, columns with
types).

* Put random hash in PersistedInfo test defaults

* Fixing linters

* Use new endpoint for model cache refresh modal (#21742)

* Use new endpoint for cache status modal

* Update refresh timestamps on refresh

* Move migration to 44

* Dispatch on initialized driver

* Side effects get bangs!

* batch hydrate :persisted on cards

* bang on `models.persisted-info/make-ready!`

* Clean up a doc string

* Random fixes: docstrings, make private, etc

* Bangs on side effects

* Rename global setting to `persisted-models-enabled`

felt awkward (enabled-persisted-models) and renamed to make it a bit
more natural. If you are developing you need to set the new value to
true and then your state will stay the same

* Rename parameter for site-uuid-str for clarity

* Lint cleanups

interesting that the compojure one is needed for clj-kondo. But i guess
it makes sense since there is a raw `GET` in `defendpoint`.

* Docstring help

* Unify type :type/DateTimeWithTZ and :type/DateTimeWithLocalTZ

both are "TIMESTAMP WITH TIME ZONE". I had got an error and saw that the
type was timestamptz so i used that. They are synonyms although it might
require an extension.

* Make our old ns linter happy

Co-authored-by: Alexander Polyankin <alexander.polyankin@metabase.com>
Co-authored-by: Anton Kulyk <kuliks.anton@gmail.com>
Co-authored-by: Case Nelson <case@metabase.com>

Unverified

c504a12e

Apr 29, 2022
- Don't hard-code timeout in Druid driver (#22161) · 2ba9546a
  Braden Shepherdson authored 2 years ago
  
  It should come from the context, like all the other drivers.
  Unverified
  
  2ba9546a
Apr 27, 2022

Validate datasets are found when checking bigquery (#22144) · f4e49dbd

Case Nelson authored 2 years ago

* Validate datasets are found when checking bigquery

Fixes #19709

* Address PR feedback

Made a general mechanism to pass expected messages to users in
api/database via ex-info. This allows us to suppress logging for
"unexceptional" exceptions that one can expect to hit while setting up
drivers.

* Only validate when filters are set

Also removed the dataset list from the exception as it's not surfaced to
users.

Unverified

f4e49dbd

Fix errors when downgrading then upgrading to bigquery driver (#22121) · 2ada1fc4

dpsutton authored 2 years ago

This issue has a simple fix but a convoluted story. The new bigquery
driver handles multiple schemas and puts that schema (dataset-id) in the
normal spot on a table in our database. The old driver handled only a
single schema by having that dataset-id hardcoded in the database
details and leaving the schema slot nil on the table row.

```clojure
;; new driver describe database:
[{:name "table-1" :schema "a"}
 {:name "table-2" :schema "b"}]

;; old driver describe database (with dataset-id "a" on the db):
[{:name "table-1" :schema nil}]
```

So if you started on the new driver and then downgraded for some reason,
the table sync would see you had tables with schemas, but when it
enumerated the tables in the database on the next sync, would see tables
without schemas. It did not unify these two together, nor did it archive
the tables with a schema. You ended up with both copies in the
database, all active.

```clojure
[{:name "table-1" :schema "a"}
 {:name "table-2" :schema "b"}
 {:name "table-1" :schema nil}]
```

If you then tried to migrate back to the newer driver, we migrated them
as normal: since the old driver only dealt with one schema but left it
nil, put that dataset-id on all of the tables connected to this
connection.

But since the new driver and then the old driver created copies of the
same tables, you would end up with a constraint violation: tables with
the same name and, now after the migration, the same schema. Ignore this
error and the sync in more recent versions will correctly inactivate the
old tables with no schema.

```clojure
[{:name "table-1" :schema "a"}  <-|
 {:name "table-2" :schema "b"}    | constraint violation
 {:name "table-1" :schema "a"}] <-|

;; preferrable:
[{:name "table-1" :schema "a"}
 {:name "table-2" :schema "b"}
 {:name "table-1" :schema nil :active false}]
```

Unverified

2ada1fc4

Apr 26, 2022

Bump bigquery version to first version that supports SNAPSHOT tables (#22049) · 54d064fc

dpsutton authored 2 years ago

Fixes #19860

SNAPSHOT tables in bigquery hold diffs from an underlying table:
https://cloud.google.com/bigquery/docs/table-snapshots-intro. But the
support in the sdk only came in 1.135.0 :
https://github.com/googleapis/java-bigquery/blob/main/CHANGELOG.md#11350-2021-06-28

I picked the most recent 1.135 version.

Running

```shell
clj -A:dev:ee:ee-dev:drivers:drivers-dev -Stree
```

Shows conflicts on

```
X google-http-client-jackson2 1.39.2 :older-version
; using 1.39.2-sp.1 from google analytics

X com.fasterxml.jackson.core/jackson-core 2.12.3 :older-version
; from cheshire we have 2.12.4

X com.google.http-client/google-http-client 1.39.2 :superseded
; using 1.39.2-sp.1 from google-http-client-jackson2 (1.39.2-sp1)

X commons-codec/commons-codec 1.15 :use-top
; pinned to this version at top level

X com.google.guava/guava 30.1.1-android :use-top
; pinned to 31.0.1-jre top level
```

So I think this change is quite safe. After the release we should
investigate the breaking changes that come in the 2.0.0 release and look
into getting onto 2.10.10. This version worked locally for me but I
don't want to introduce that into the release just yet.

Unverified

54d064fc

Apr 22, 2022

Handle March 31st + 3 months (June 31st?) for Oracle (#21841) · b9cedccc

Cam Saul authored 2 years ago

* Handle March 31st + 3 months for Oracle (#10072)

* Optimize out some casting in Oracle

* rx util support varargs inside `opt`

* hx/ math operators like + and - should propagate type information

* Some dox tweaks

* Fix SQLite busted behavior

* Avoid unneeded casting in Vertica when adding temporal intervals

* Lint error fixes

* BigQuery fix for #21969

* Add testing context for tests for #21968 and #21971

Unverified

b9cedccc

Group-by fix for JSON columns. (#21741) · bad129cd

Howon Lee authored 2 years ago

Group-bys didn't work because you need two instances of the field and you couldn't have two instances of the field be considered by the postgres backend as the same. Ported over the BigQuery fix to this to apply to JSON columns as well.

Unverified

bad129cd

Apr 19, 2022

Make namespace aliasing consistent everywhere; enforce with clj-kondo (#21738) · 19beda53

Braden Shepherdson authored 2 years ago

* Make namespace aliasing consistent everywhere; enforce with clj-kondo

See the table of aliases in .clj-kondo/config.edn

Notable patterns:
- `[metabase.api.foo :as api.foo]`
- `[metabase.models.foo :as foo]`
- `[metabase.query-processor.foo :as qp.foo]`
- `[metabase.server.middleware.foo :as mw.foo]`
- `[metabase.util.foo :as u.foo]`
- `[clj-http.client :as http]` and `[metabase.http-client :as client]`

Fixes #19930.

Unverified

19beda53

Apr 13, 2022
- Fix Snowflake start of week (#21604) · d13da893
  Cam Saul authored 2 years ago
  
  Unverified
  
  d13da893
Apr 07, 2022
- Disallow FDW connections in SQLite (#21525) · e0c3812a
  Cam Saul authored 2 years ago
  
  Unverified
  
  e0c3812a
Apr 05, 2022

Merge `:google` driver into `:googleanalytics` (#21002) · d3357c95

Cam Saul authored 2 years ago

* Remove :google driver

* Remove unneeded stuff

* Remove another reference to :google driver

* Remove another reference to :google

Unverified

d3357c95

Apr 01, 2022
- QP middleware for download perms (#21021) · 04473fc5
  Noah Moss authored 2 years ago
  
  Unverified
  
  04473fc5
Mar 24, 2022
- Fix `:postgres` filtering by current quarter; remove a few unneeded casts in PG SQL output (#21204) · 9fdb1798
  Cam Saul authored 3 years ago
  
  * Fix Postgres filter by quarter * Fix test for MongoDB to work around #5419 * Spark SQL doesn't allow quarters in intervals either
  Unverified
  
  9fdb1798
Mar 23, 2022

Replace remaining references to `qp/query->preprocessed` with `qp/preprocess` (#21047) · 38ea154a
Cam Saul authored 3 years ago
```
* Remove references to qp/query->preprocessed

* Fix Oracle
```
Unverified

38ea154a

Postgres JSON nested field columns (#21007) · 10a527db

Howon Lee authored 3 years ago

This is the first vertical slice of #708. There is a material ways to go, including mysql implementation, plinking away at the data model stuff, and frontend. There are also big putative sync speed gains I think I should chip away at.

Unverified

10a527db

Mar 16, 2022

Google Analytics: use service accounts for auth (#21004) · 6ee13aa2

Cam Saul authored 3 years ago


* Use Service Accounts for Google Analytics auth going forward

* Update GA dox

* Dox tweak

* Fix e2e test

* Remove dead googleanalytics form code

Co-authored-by: Dalton Johnson <daltojohnso@users.noreply.github.com>

Unverified

6ee13aa2

Fix noisy backend tests (#20910) · f9f1bf41

Cam Saul authored 3 years ago


* Change u/profile from println to log/info

* Change most test printlns to log/info

* Make u/profile message a fn rather than a delay

* Apparently clj-http errors aren't wrapped in :object anymore

* Add running commentary

* Log the amount of time it took to find and run tests

* Sort namespaces

* Update test

Co-authored-by: Diogo Mendes <diogo@metabase.com>

Unverified

f9f1bf41

Mar 10, 2022

Fix BigQuery incorrectly quoting datetime-truncated field literal forms (#20907) · cc48450c

Cam Saul authored 3 years ago

* Improved error messages when query execution fails

* Pretty-print hx/identifier in a different way if it contains extra keys

* Fix #20806

* Rename key used to record that we should not qualify

* Fix line breaks

* Test fix

Unverified

cc48450c

Feb 24, 2022

Linting improvements for modules/drivers: zero errors (#20626) · f40a16b5

Michiel Borkent authored 3 years ago

* Refer all

* allow refer all in tests

* Add toucan test macro config

* wip

* Improve hooks

* Fix formatting issue

* Improve

* presto

* oracle

* mongo

* druid

* zero errors

* undo whitespace change

* ns linter satisfaction

Unverified

f40a16b5

Feb 16, 2022
- Add lots of exclusions to Spark SQL JDBC driver (#20563) · 224cbd05
  Cam Saul authored 3 years ago
  
  Unverified
  
  224cbd05
Feb 15, 2022

Upgrade Hive JDBC driver version from 1.2.2 -> 3.1.2; Bump Spark SQL from 2.1.1 to 3.2.1 (#20353) · 4dc16403

Cam Saul authored 3 years ago

* Replace AOT Spark SQL deps with a `proxy` and a `DataSource`

* Support `connection-details->spec` returning a `DataSource`

* Remove target/classes

* Don't need to deps prep drivers anymore

* Fix duplicate `this` params in `proxy` methods; add `:test` alias for Eastwood to make it be a little quieter

* Make sure to call `pool/map->properties`

* Upgrade Hive JDBC driver version from 1.2.2 -> 3.1.2; Bump Spark SQL from 2.1.1 to 3.2.1

* Clean the namespaces

* Don't need to register the a proxy JDBC driver since we're not even using it anymore

* Fix Spark SQL :schema sync for newer versions

* Remove unneeded override

* Fix day-of-week extract for new :sparksql

* Hive/Spark SQL needs to escape question marks inside QUOTED identifiers now 

* Some minor SQL generation improvements to avoid duplicate casts

* Revert change to debug test

Unverified

4dc16403

Feb 14, 2022

Add logic to truncate and uniquely-suffix column alias identifiers (#19659) · f94d5149

Cam Saul authored 3 years ago

* Add failing test for #15978

* Improved test

* Add new metabase.driver.query-processor.escape-join-aliases QP middleware

* Test fix 

* Add reference to #20307

* Add some extra dox

* Test fixes for BigQuery drivers

* revert unneeded change

* Fix :bigquery and :bigquery-cloud-sdk mixup

* Test fixes 

* Test fix 

* Remove comment I meant to remove

Unverified

f94d5149

Feb 11, 2022
- Support running Oracle tests with local Docker image (#20358) · cd7a4e46
  Cam Saul authored 3 years ago
  
  Unverified
  
  cd7a4e46
- BigQuery should qualify columns from source query with `source` in the `SELECT` clause (#20434) · dad00648
  Cam Saul authored 3 years ago
  
  * Fix BigQuery not qualifying source columns * Don't fix on 42 for the deprecated BigQuery driver * You know what I'll fix it after all. Why not. * Oops I updated the wrong part of the test
  Unverified
  
  dad00648
Feb 08, 2022

Remove :bigquery driver (#20142) · 22ebe102

Jeff Evans authored 3 years ago


* Remove :bigquery driver

Add "migration" to convert existing Database instances from :bigquery to :bigquery-cloud-sdk (with error log if using outdated OAuth mechanisms), by way of `normalize-db-details`

Removing references to :bigquery from various places in the code

* Remove from modules/drivers/deps.edn

* Remove stuff from CircleCI that is running the old driver

Remove `driver-switch-test` since there is no practical way to run this anymore (since we can't initialize the old driver)

Remove buggy redef of `isa?` from `semantic-type-migration-tests`

Co-authored-by: Cam Saul <github@camsaul.com>

Unverified

22ebe102

Feb 03, 2022

Bump backend dependencies (Feb 2022) (#19827) · cf247b68

Cam Saul authored 3 years ago

* Bump backend dependencies (Jan 2022)

* Revert java-time version upgrade for now until https://github.com/dm3/clojure.java-time/issues/77 is fixed

* Add license overrides

* Bump a few more deps (again). Revert Google/BigQuery and Vertica version bumps

* Revert MariaDB and Redshift version changes

Unverified

cf247b68

Feb 02, 2022
- 43 qp middleware overhaul part 6: convert post-processing middleware to new pattern (#20096) · b302a35b
  Cam Saul authored 3 years ago
  
  Unverified
  
  b302a35b
- Fix "source cannot be null" error in MongoDB (#20145) · 93cb6667
  Jeff Evans authored 3 years ago
  
  Break out separate util fn to return the given `authdb` or the canonical default ("admin") Add invocation to new util fn from the `mcred/create` call
  Unverified
  
  93cb6667
Jan 27, 2022
- Activation fixes for 0.42 (#19947) · 3e73e5d7
  Alexander Polyankin authored 3 years ago
  
  Unverified
  
  3e73e5d7
Jan 26, 2022

Support overriding ROWCOUNT for SQL Server (#19267) · 802cc236

Jeff Evans authored 3 years ago

* Support overriding ROWCOUNT for SQL Server

Add new "ROWCOUNT Override" connection property for `:sqlserver`, which will provide a DB-level mechanism to override the `ROWCOUNT` session level setting as needed for specific DBs

Change `max-results-bare-rows` from a hardcoded constant to a setting definition instead, which permits a DB level override, and move the former constant default to a new def instead (`default-max-results-bare-rows`)

For `:sqlserver`, set the DB-level setting override (if the connection property is set), via the `driver/normalize-db-details` impl

Add test to confirm the original scenario from #9940 works using this new override (set to `0`)

Move common computation function of overall row limit to the `metabase.query-processor.middleware.limit` namespace, and invoke it from execute now, called `determine-query-max-rows`

Add new clause to the `determine-query-max-rows` function that preferentially takes the value from `row-limit-override` (if defined)

Unverified

802cc236

Jan 25, 2022

Apply schema inclusion/exclusion filtering to sql-jdbc drivers (#19651) · b6d542f8

Jeff Evans authored 3 years ago

* Apply schema inclusion/exclusion filtering to sql-jdbc drivers

Update `sql-jdbc` namespaces to handle schema inclusion/exclusion patterns when filtering schemas

Add new generic schema inclusion/exclusion test for sql-jdbc drivers that define the property

Update Snowflake and Redshift driver manifests to include schema filtering property

Create `db-details->schema-filter-patterns` util fn to turn DB details into the inclusion/exclusion patterns

Move schema inclusion/exclusion filtering code to new namespace (since it's not strictly used by `:sql-jdbc` derived drivers)

Move existing tests accordingly

Add schema inclusion/exclusion check to the new `filtered-syncable-schemas` multimethod (and updating docstring)

Change `:redshift` impl of `filtered-syncable-schemas` to call the `:sql-jdbc` version instead

Use new multimethod instead for `filtered-syncable-schemas`, and have default impl of `syncable-schemas` call that

Mark `syncable-schemas` as deprecated and include notes on the new method (and update driver markdown file accordingly)

Unverified

b6d542f8

Jan 20, 2022

Add drive scope to BigQuery client (#19629) · d4606c47

Jeff Evans authored 3 years ago

* Add drive scope to BigQuery client

Adding new drive auth scope in order to be able to query Google Drive files created in BigQuery as external tables

Adding new test to confirm such a table can be queried successfully

Unverified

d4606c47

Add test for #7487 (#19810) · 3822d2cd
Cam Saul authored 3 years ago
```
* Add test for #7487

* Fix error message

* Update test for new error message
```
Unverified

3822d2cd
Rename sample dataset to "sample database" (#19682) · 491dbdc0
Noah Moss authored 3 years ago

Unverified

491dbdc0

Jan 19, 2022

Add test for #15538 (#19779) · 84f64cb7

Cam Saul authored 3 years ago

* Add test for #15538

* Add extra validation for native source queries

* Add test for sql-source-query validation

Unverified

84f64cb7

Remove hardcoded project-id from bigquery-cloud-sdk tests (#19780) · 2214e205

Jeff Evans authored 3 years ago

* Remove hardcoded project-id from `:bigquery-cloud-sdk` tests

Pull out common helper fn to extract project-id from credentials

Update `bigquery_cloud_sdk_test.clj` to have a var that loads the service-account-json from the env var, then uses aforementioned function to determine project-id

Change sensitive-data-redacted-test to use an arbitrary/fake project-id to remove any sort of confusion

Unverified

2214e205

Add test for #13932 (#19766) · a566bd32
Cam Saul authored 3 years ago

Unverified

a566bd32

Jan 18, 2022
- Nested Queries Overhaul 2022: Split logic for determining appropriate table &... · b9b70cd9
  Cam Saul authored 3 years ago
  
  Nested Queries Overhaul 2022: Split logic for determining appropriate table & column aliases out of SQL QP (#19384)
  Unverified
  
  b9b70cd9
Jan 14, 2022

Fix bigquery-cloud-sdk dataset-id normalization logic (#19663) · e48b21db

Jeff Evans authored 3 years ago

Move bulk of dataset-id normalization logic into new private helper fn

Perform app DB update for db-details after dataset-id has been turned into inclusion filter

Add test to confirm that normalization only happens once

Unverified

e48b21db