Commits · 880119e76bb4457f371e9db17d5abdc97c6659f5 · Engineering Digital Service / Metabase

This project is mirrored from https://github.com/metabase/metabase. Pull mirroring updated 28 minutes ago.

Oct 15, 2024

Incremental Pivot Processing for Exports (#46995) · 8d52a03f

adam-james authored 5 months ago


* Incremental Pivot Processing for Exports

WIP

Fixes pivot exports for CSV and xlsx.

The CSV export should use less memory by incrementally building up the data structure and aggregating necessary row
data right away, so the memory overhead becomes only as large as the total pivot result.

In cases where the pivot rows/cols do combine into many many columns and rows, this can still be a large set of data,
but it should behave much better now in most cases.

The Excel export is a little more straightforward: create the export rows in the same fashion, streaming one row at a
time, and just post-process the sheet to add the pivot table in one shot at the end.

* WIP adding row totals.

* aggregate totals as rows are added

Row, column, section, and grand totals are all aggregated as each row is added.
This means the final step of building pivot output becomes just an exercise of lookups/arrangement, no further
aggregation is needed.

* CSV pivot works per-row, export respects formatting

This is a big step forward; we don't need to hold the entire dataset in memory, we instead aggregate a row's data into
the pivot datastructure, which only holds onto:

- unique values for each pivot-row in a sorted set
- unique values for each pivot-col in a sorted set
- grand total for each measure N values, where N is number of measures, ususally 1 or 2
- row totals for each combination of each pivot-row * N measures
- col totals for each combination of each pivot-col * N measures
- totals for each 'section', determined by unique values of first pivot-row * N measures
- values for each measure in every 'cell'; Row Combos * Col Combos * N Measures

So, there can still be a decent amount of data to store; but it will never hold onto all of the 'raw rows' from the
dataset.

We can never completely guarantee that Row Combos * Col Combos * N Measures remains small, but two things let us move
forward anyway:

- there's now visible feedback in the app that the download is running (or if it's failed)
- Pivot table utility diminishes rapidly with huge output anyway; users still need to curate/set up their data
- effectively to improve the table's utility, so we can assume that a slow-to-download pivot table is also slow to
- use/less effective, and will likely be something the user doesn't want (as often).

* some test fixes

* now, if we export 'raw pivot rows', they don't show pivot-grouping

and they also don't include the 'extra' rows for totals/subtotals/grand totals (any row with pivot-grouping > 0).

This means that now the non-pivot version of a pivot table export will match what a user sees if they change the viz
to a regular table.

* remove old test

* re-incorporate some changes from master

* fix csv for non-pivots due to oversight in my changes

This is just a temporary change, I think I should clean up this bit of the code a little, I can probably make it a
little more readable and use some cleaner logic regarding if the rows are 'raw pivot rows' or not.

* start moving format_rows to POST bod, add pivot_results too

There's still wiring work to do, but this starts to add format_rows and pivot_results to POST body for the various API
endpoints. Also modify tests to improve coverage/consistency across downloads and alerts/subscriptions.

The tests will not pass on this commit, but fixes will be incoming

* native pivot tables in xlsx

* add precondition to pass migration linter

* try to get migrations fixed

* pasing pivot-results through api and attachments

* fix tests for format_rows in BODY vs query param

* tests!

* might have the tests all fixed now

* the pivoted export now respects col/row totals settings

* add test coverage for public questions and dashboards

* col and row totals work as expected

* build-pivot refactor for clarity

* docstring change + tiny refactor in helper fn

* see if dashcard download works with format_rows

* csv pivot handles nil values

* pass format_rows and pivot_results in :params not :body

* fix some other tests

* pivot-grouping col filtered out of xlsx

* pivot-grouping-col removed for all rows

* configurable pivot exports and attachments (#47880)

* exports fe

* specs

* ui

* specs

* format/unformatted now works for xlsx

* format test changes for xlsx formatting

* embedding endpoints accept pivot_results

* cljfmt and eslint fix

* empty

* embedding test should have formatting defaulted to true

* embed test fixes

* Use `Chip` for export settings widget

* downloads e2e test fix

* fix public download limit test

* public card download defaults

* fix public download defaults in some tests

* Fix visual test

---------

Co-authored-by: Aleksandr Lesnenko <alxnddr@users.noreply.github.com>
Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>
Co-authored-by: Anton Kulyk <kuliks.anton@gmail.com>

Unverified

8d52a03f

fix iframe dashcards crash subscriptions (#48589) · ba06b99e

Aleksandr Lesnenko authored 5 months ago


* fix iframe dashcards crash subscriptions

* add a test to ensure iframes are filtered out of subscriptions

---------

Co-authored-by: Adam James <adam.vermeer2@gmail.com>
Co-authored-by: adam-james <21064735+adam-james-v@users.noreply.github.com>

Unverified

ba06b99e

[Notification] Task history and retry (#48437) · 1a2a9c81
Ngoc Khuat authored 5 months ago

Unverified

1a2a9c81

Oct 14, 2024

[Notification] Migrate emails (#48392) · a1d8face

Ngoc Khuat authored 5 months ago

* [Notification] Migrate user invited email (#48215)

* [Notification] Migrate alert create email (#48292)

* [Notification] Migrate slack token error email (#48333)

Unverified

a1d8face

Implement inactive field removal (#48636) · 71c506a4
metamben authored 5 months ago
```
* Implement inactive field removal
```
Unverified

71c506a4

Collapse `metabase.shared.*` namespaces (#48646) · 6024bf01

Cam Saul authored 5 months ago

* Collapse `metabase.shared.*` namespaces

* Fix Kondo warnings

* Does updating the stories-data keys fix the failing tests?

* Appease msgcat

* Appease msgcat

* Fix typo

* Make the build happy

* Appease fslint

Unverified

6024bf01

Remove old `MB_API_KEY` setting (#48592) · 187e9508

John Swanson authored 5 months ago

* Remove `MB_API_KEY` env var

A bit awkwardly, we never set `:deprecated` on the setting before. We
can retroactively deprecate this as of v50.

I'm keeping the setting purely to emit the warning message on startup.

Unverified

187e9508

version and channel query params for version info (#48616) · 8aa3342e

John Swanson authored 5 months ago

* version and channel query params for version info

https://github.com/metabase/metabase/issues/48615

* omit blanks from query params for version info

Unverified

8aa3342e

SHAVE 16 MINUTES OFF OF MYSQL TESTS IN CI

(#48489) · 4a509927

Cam Saul authored 5 months ago

* Experimental: try splitting MySQL test jobs into 4 partitions intstead of 2

* user-http-request should make sure users are initialized

* Fix MySQL deadlocks in tests

* Bump init timeout to 90 seconds

* Fix metabase.api.session-test/logout-test

Unverified

4a509927

Oct 11, 2024
- keep the keys in send-pulse task history underscored (#48601) · 989f595d
  Ngoc Khuat authored 5 months ago
  
  Unverified
  
  989f595d
Oct 10, 2024

Relax concat args to allow any ExpressionArg (#48506) · c16d05a0

appleby authored 5 months ago

* Relax the arg types to ExpressionArg for concat expressions in the legacy schema

Relax the arg types to ExpressionArg for concat since many DBs allow to concatenate non-string types. This also aligns
with the corresponding MLv2 schema and with the reference docs we publish.

Fixes #39439

* Add nested concat schema tests

* Add nested-concat query-processor tests

Unverified

c16d05a0

Always startup prometheus metrics (#48547) · 2410012a

dpsutton authored 5 months ago

* Always startup prometheus metrics

previously only started up when a port was provided

```
MB_PROMETHEUS_SERVER_PORT=9191 java -jar metabase.jar
```

But these counters are useful to be included in anonymous stats. So
let's start up the collectors and then we can get their values like:

```clojure
prometheus=> (dotimes [_ 500] (inc :metabase-email/messages))
nil
;; prometheus/value is iapetos.core/value here
prometheus=> (-> system :registry :metabase-email/messages prometheus/value)
501.0
```

* rename `metabase.analytics.prometheus/inc` to `inc!`

it side effects a value and now no longer shadows `clojure.core/inc` so
we're all happy

Unverified

2410012a

[Notification] System event notification (#47857) · fc43d3cd

Ngoc Khuat authored 5 months ago

* [Notification] Notification and subscription (#47707)

* [Notification] Notification and subscription (#47707)

* [Notification] Handlers + recipients (#47759)

* [Notification] Channel template table and model (#47782)

* [Notification] Render system event emails (#47859)

* [Notification] Strict type for channel template and notification recipient (#47910)

* [Notification] Event hydration (#47953)

* [Notification] Send asynchronously (#48200)

Unverified

fc43d3cd

Oct 09, 2024
- Do not ignore-errors in pprint-native-query-with-best-strategy (#48246) · 01a9255d
  appleby authored 5 months ago
  
  Return a description of the Exception that occurred instead. This is helpful for debugging if query->raw-native-query throws an exception.
  Unverified
  
  01a9255d
- Ensure that features in stats ping can only be enabled if available (#48493) · 4a132728
  Noah Moss authored 5 months ago
  
  Unverified
  
  4a132728
Oct 08, 2024

Ensure the temp directory exists (#48488) · 5fd75e0d

dpsutton authored 5 months ago

https://github.com/metabase/metabase/issues/41919#issuecomment-2400542908

This issue goes away in 5.3.1 of apache poi. Seems like they have a
yearly release cadence so we can ensure this exists for the time being
and then remove this when we can bump to 5.3.1

```diff
index b763a1ffdf..c87992c935 100644
--- a/deps.edn
+++ b/deps.edn
@@ -122,7 +122,7 @@
   {:mvn/version "2.23.1"}             ; allows the slf4j2 API to work with log4j 2
   org.apache.logging.log4j/log4j-layout-template-json
   {:mvn/version "2.23.1"}             ; allows the custom json logging format
-  org.apache.poi/poi                        {:mvn/version "5.2.5"}              ; Work with Office documents (e.g. Excel spreadsheets) -- newer version than one specified by Docjure
+  org.apache.poi/poi                        {:mvn/version "5.3.1"}              ; Work with Office documents (e.g. Excel spreadsheets) -- newer version than one specified by Docjure
   org.apache.poi/poi-ooxml                  {:mvn/version "5.2.5"
                                              :exclusions  [org.bouncycastle/bcpkix-jdk15on
                                                            org.bouncycastle/bcprov-jdk15on]}
```

Unverified

5fd75e0d

Test partitioning for MySQL: shave ~5 minutes off of CI runs (#48422) · b8172829
Cam Saul authored 5 months ago
```
* SQUASH!

* Add another sanity checc

* Another test fix attempt

* Appease Kondo
```
Unverified

b8172829
Require explicit flag to calculate available_models for search (#48133) · 20cfb423
Chris Truter authored 5 months ago
```
Co-authored-by: Nick Fitzpatrick <nickfitz.582@gmail.com>
```
Unverified

20cfb423
Fix query_metadata not including metadata for native queries (#48459) · 3a0a9690
Alexander Polyankin authored 5 months ago
```
* Fix query_metadata not including native queries

* Fix query_metadata not including native queries

* fix tests
```
Unverified

3a0a9690
Native query drill (#48232) · 4aa67506
Alexander Polyankin authored 5 months ago

Unverified

4aa67506

SDK launch in core app (#48107) · 537ba417

Nicolò Pretto authored 5 months ago


Co-authored-by: Oisin Coveney <oisin@metabase.com>
Co-authored-by: Mahatthana (Kelvin) Nomsawadi <me@bboykelvin.dev>
Co-authored-by: bryan <bryan.maass@gmail.com>
Co-authored-by: Nicolò Pretto <info@npretto.com>

Unverified

537ba417

SHAVE 7 MINUTES OFF OF NON-CORE DRIVER TEST RUNS IN CI

(#47681) · cd4d7646

Cam Saul authored 5 months ago

* Parallel driver tests PoC

* Set fail-fast to false for now

* Try splitting up non-driver tests to see how broken tests are

* Whoops fix plain BE tests

* Ok nvm I'll test this in another branch

* Fix fail-fast

* Experiment with the improved Hawk split logic

* Fix some broken/flaky tests

* Experiment: try splitting MySQL 8 tests into FOUR jobs

* Divide other Postgres and MySQL tests up and use num-partitions = 2

* Another test fix 

* Flaky test fix 

* Try making more stuff fast

* Make athena fast??

* Fix a few more things

* Test fixes? 

* Fix configs

* Fix Mongo job syntax

* Fix busted test from #46942

* Fix Mongo config again

* wait-for-port needs to specify shell I guess

* More cleanup

* await-port can't have a timeout-minutes I guess

* Let's only parallelize MySQL for now.

* Cleanup action

* Cleanup wait-for-port action

* Fix another flaky test

* NOW driver tests will be FAST

* Need to mark driver tests too

* Fix wrong tag

* Use Hawk 1.0.5

* Fix busted metabase.public-settings-test/landing-page-setting-test

* Fix busted `metabase.api.database-test/get-database-test` etc. (hopefully)

* Fix busted `metabase.sync.sync-metadata.fields-test/sync-fks-and-fields-test` for Oracle

* Maybe this fixed `metabase.query-processor.middleware.permissions-test/e2e-ignore-user-supplied-perms-test` maybe not

* Fix busted metabase.api.dashboard-test/dependent-metadata-test because endpoint had differemt sort order than test

* Ok my test fix did not work

* Fix metabase.sync.sync-metadata.fields-test/sync-fks-and-fields-test for Redshift

* Better test name

* More test fixes 

* Schema fix

* PR feedback

* Split off test partitioning into separate PR

* Fix failing Oracle tests

* Another round of test fixes, hopefully

* Fix failing Redshift tests

* Maybe the last round of test fixes

* Fix Oracle

* Fix stray line

Unverified

cd4d7646

Oct 07, 2024

Implement better partitioning and sorting in window functions (#48028) · de50ba71
metamben authored 5 months ago
```
Implement better partitioning and sorting in window functions
```
Unverified

de50ba71

feat: move auth providers behind ee token (#48245) · eaabfa79

Case Nelson authored 5 months ago

* feat: move auth providers behind ee token

Fixes #48235

Introduces new premium feature `database-auth-providers`.
Moves fetch-auth behind defenterprise - oss will always return an empty
map.
Add metabase.util.http to test outbound http requests.

* Fix broken refs

* Drop defmethod as adhoc overrides aren't desirable outside ee

* Drop unessary require

* Fix token and tests

* Fix tests

* Fix formatting

* Fix var cast exception

* Fix connection test

* Move test to ee namespace

* Move more tests behind enterprise

* Fix checked-section hiding

Unverified

eaabfa79

Oct 04, 2024

Add Release Channel selection in-product (#48126) · 4e343c1d

Ryan Laurie authored 5 months ago

* add update channels in product

* support for changing release notes to show beta and nightly info

* dont export setting

* obey the linter and add tests

* export setting

* update e2e tests

* clojure magic

* clojure-foo

* better localization

* sorry mr linter

* add more tests

Unverified

4e343c1d

Do not clear http channels when sending pulse (#48330) · 2603f240
Ngoc Khuat authored 5 months ago

Unverified

2603f240

Oct 03, 2024

Use context for field id computation while hydrating dashbaord (#47820) · a44015ae

lbrdnk authored 5 months ago


* Use context for field id computation while hydrating dashbaord

* Update docstring

* Fix target

* Update lookup

* Fix param-target usage

* Unskip e2e

* Bind *param-id-context* in public dashboard compputation

* Fix context update

* Fix field-ids->param-field-values-ignoring-current-user

* Add tests

* Comments

* cljfmt

* Update field-ids->param-field-values-ignoring-current-user

* Update src/metabase/models/params.clj

Co-authored-by: Braden Shepherdson <braden@metabase.com>

* Update src/metabase/models/params.clj

Co-authored-by: Braden Shepherdson <braden@metabase.com>

* Use atom instead of volatile!

* Avoid dynamic function var

* Update src/metabase/models/params.clj

Co-authored-by: Braden Shepherdson <braden@metabase.com>

* Address remarks

---------

Co-authored-by: Braden Shepherdson <braden@metabase.com>

Unverified

a44015ae

Fix appending to models with existing non-ascii columns (#48237) · 9e09a6d1
Chris Truter authored 5 months ago

Unverified

9e09a6d1

Oct 02, 2024

Add tests for exporting self-joined renamed columns (#48244) · e2b74d79

appleby authored 5 months ago

* Add BE test for exporting self-joined renamed columns
* Add e2e test for exporting self-joined renamed columns

See also

- Not renamed fields in a same table join inherit the renamed name in exports #48046
- backport: Support both name & field ref-based column keys in viz settings on read and upgrade on write #48243

Unverified

e2b74d79

Oct 01, 2024
- Future-proof CSV uploads from next HoneySQL upgrade (#48220) · 4b278779
  Chris Truter authored 6 months ago
  
  Unverified
  
  4b278779
- Upload test fixes for drivers without auto mb-id (#48194) · b28aa417
  Chris Truter authored 6 months ago
  
  Unverified
  
  b28aa417
- Fix search test flake (#48187) · 7420aa60
  Chris Truter authored 6 months ago
  
  Unverified
  
  7420aa60
Sep 30, 2024

Do not cache all token check failures (#48147) · 0ef2052f

John Swanson authored 6 months ago

* Do not cache all token check failures

We want to cache token checks to avoid an issue where we repeatedly ask
the store "hey, is this token valid?? is this token valid?? is this
token valid??" for the same token.

However, transient errors can also occur. For example, maybe a network
issue causes the HTTP request to fail entirely. In this case, if we
cache the result, the user needs to restart metabase (or wait 5 minutes
until the cache is cleared) before they can attempt to validate their
token again.

This PR moves the cache logic deeper into the stack. We want to cache
"successful" responses from the store API - cases where the store has
told us categorically that the token is or is not valid. We don't want
or need to cache other things that might happen. Maybe your token isn't
the right length - we can recalculate that, it's ok. Maybe you get a 503
error from the Store - we should let you retry. Maybe your network is
having issues and you can't contact the Store at all - again, we should
let you retry.

The one potential issue I see here is that if the store goes down, we'll
massively increase the number of requests we send to the store,
potentially making it harder to recover. If this is a concern, I can add
a circuit breaker: if we repeatedly get errors back from the store, back
off and stop making requests for a while.

* Add a circuit breaker to store API requests

In the pathological case where the store goes down for >5 minutes, the
cache will expire and all instances everywhere will start repeatedly
making requests for token validation at once. This might make recovering
from an outage more difficult.

This adds a circuit breaker around the API request. If the call
repeatedly throws (5XX errors, socket timeouts, etc.) then we'll pause
for 1 minute, during which time all calls to token validation will
immediately fail without making any request to the API.

After one minute, we'll allow one request through to the API. If it
succeeds, we'll go back to normal operation. Otherwise, we'll wait
another minute.

Unverified

0ef2052f

Sep 27, 2024

Fix non-bool value in CSV upload availability check for stats ping (#48167) · e12e3604
Noah Moss authored 6 months ago

Unverified

e12e3604

[QP] Don't attach `:temporal-unit :default` to every temporal clause (#48085) · e8deecc5

Braden Shepherdson authored 6 months ago

This was old logic to support certain drivers (eg. pre-JDBC Druid) and
isn't required for most. It's perfectly sound to filter or even break
out on a datetime column without bucketing.

Adds a new `:temporal/requires-default-unit` driver feature, and enables it only for the legacy Druid driver.

Fixes #47341

Unverified

e8deecc5

Pivot Options are Properly Calculated for Questions based on Models with Joins (#47830) · 3dfcb3d5

adam-james authored 6 months ago

Fixes 46575

Creating a Pivot Table Question that is based off of a model that has at least one column derived from a join failed
to display row totals.

This is because the pivot-options map was being mis-calculated; not all column indices were correctly found/passed in
to the :pivot-rows or :pivot-cols keys, causing the pivot query not to compute all necessary data.

Here, I just modify the :lib/source key of the columns whose source is a card (as determined by the existence of
:lib/card-id). The columns being checked will all have :source/breakout, which caused, in the issue's repro example,
the "NAME" column to be missed. If it instead has :lib/source :source/card, the logic inside
`lib/find-matching-column` works.

Unverified

3dfcb3d5

Fix edge cases where uploads munged display names (#48149) · 847a3230
Chris Truter authored 6 months ago

Unverified

847a3230
Periodically recreate the search index (#48137) · 1f954473
Chris Truter authored 6 months ago

Unverified

1f954473

Sep 26, 2024

Databricks JDBC driver (#42263) · c04928d5

lbrdnk authored 6 months ago

* Databricks JDBC driver base

* Add databricks CI job

* WIP data loading -- it works, further cleanup needed

* Cleanup

* Implement ->honeysql to enable data loading

* Hardcode catalog job var

* Implement driver methods and update tests

* Derive hive instead of sql-jdbc

* Cleanup leftovers after deriving hive

* Run databricks tests on push

* Cleanp and enable set-timezone

* Disable database creation by tests

* Add Databricks to broken drivers for timezone tests

* Exclude Databricks from test

* Enable have-select-privilege?-test

* Restore sql-jdbc-drivers-using-default-describe-table-or-fields-impl post rebase

* Restore joined-date-filter-test

* Adjust to work with dataset definition tests

* Adjust alternative date tests

* Remove leftover reflecttion warning set

* Update test exts

* cljfmt vscode

* Add databricks to kondo drivers

* Update metabase-plugin.yaml

* Update databricks_jdbc.clj

* Rework test extensions

* Update general data loading code to work with Databricks

* Reset tests to orig

* Use DateTimeWithLocalTZ for TIMESTAMP database type

* Convert to LocalDateTime for set-parameter

* Update test extensions field-base-type->sql-tyoe

* Update database-type->base-type

* Enable creation of time columns in test data even though not supported

* Fix typo

* Update tests

* Udpate tests

* Update drivers.yml

* Disable dynamic dataset loading tests

* Adjust the iso-8601-text-fields-should-be-queryable-date-test

* Update load-data/row-xform

* Add time type exception to test

* Update test data loading and enable test

* Whitespace

* Enable all driver jobs

* Update comment

* Make catalog mandatory

* Remove comment

* Remove log level from spec generation

* Update sql.qp/datetime-diff

* Update read-column-thunk

* Remove comment

* Simplify date-time->results-local-date-time

* Update comment

* Move definitions

* Update test extension types mapping

* Remove now obsolete ddl/insert-rows-honeysql-form implementation

* Update sql-jdbc.conn/connection-details->spec for perturb-db-details

* Update load-data/do-insert!

* Remove ssh tunnel from driver as tests do not work with it

* Update test

* Promote ::dynamic-dataset-loading to :test/dynamic-dataset-loading and modify corresponding tests

* Adjust to broken TIMESTAMP_NTZ sync

* Update read-column-thunk to return timestamps always in Z

* Comment

* Disable tests for dynamic datasets

* Return spark jobs into drivers.yml

* Update Databricks CI catalog

* Remove vscode cljfmt tweak

* Update iso-8601-text-fields-expected-rows

* Update datetime-diff

* Formatting

* cljfmt

* Add placeholder test

* Remove comment

* cljfmt

* Use EnableArrow=0 connection param

* Remove comment

* Comment

* Update tests

* cljfmt

* Update driver's deps.edn

* Update tests

* Implement alternative `describe-table`

* WIP Workaround for timestamp_ntz sync, will be thrown away probably

* Update metabase-plugin.yaml with schema filters

* Update driver to use schema filters and remove now redundant sync implemnetations

* Update tests

* Update tests extensions

* Update test

* Add feature flags for fast sync

* Implement describe-fields

* Implement describe-fks-sql

* Enable fast sync features

* Use full_data_type

* Comment

* Add exception for timestamp_ntz columns to new sync code

* Implement db-default-timezone

* Add timestamp_ntz ignored test

* Add db-default-timezone-test

* Fix typo

* Update setReadOnly

* Add comment on setAutoCommit

* Update chunk-size

* Add timezone-in-set-and-read-functions-test

* Drop Athena from driver exceptions

* Use set/intersection instead of a filter

* Add explicit fast-sync tests

* Update describe-fields-sql and add comment

* Add preprocess-additional-options

* Add leading semicolon test

* Disable dataset creation and update comment

* Rename driver to `databricks`

* Use old secret names

* Fix wrongly copied hsql list

* Temporarily allow database creation

* Add *allow-database-deletion*

* Temporarily allow database creation

* Disable database creation

* cljfmt

* cljfmt

Unverified

c04928d5

Reorganize search namespaces to isolate legacy code (#48108) · e1d68af3
Chris Truter authored 6 months ago

Unverified

e1d68af3