Unverified Commit ce5274a5 authored 5 months ago by github-automation-metabase Committed by GitHub 5 months ago
Incremental Pivot Processing for Exports (#46995) (#48740)


* Incremental Pivot Processing for Exports

WIP

Fixes pivot exports for CSV and xlsx.

The CSV export should use less memory by incrementally building up the data structure and aggregating necessary row
data right away, so the memory overhead becomes only as large as the total pivot result.

In cases where the pivot rows/cols do combine into many many columns and rows, this can still be a large set of data,
but it should behave much better now in most cases.

The Excel export is a little more straightforward: create the export rows in the same fashion, streaming one row at a
time, and just post-process the sheet to add the pivot table in one shot at the end.

* WIP adding row totals.

* aggregate totals as rows are added

Row, column, section, and grand totals are all aggregated as each row is added.
This means the final step of building pivot output becomes just an exercise of lookups/arrangement, no further
aggregation is needed.

* CSV pivot works per-row, export respects formatting

This is a big step forward; we don't need to hold the entire dataset in memory, we instead aggregate a row's data into
the pivot datastructure, which only holds onto:

- unique values for each pivot-row in a sorted set
- unique values for each pivot-col in a sorted set
- grand total for each measure N values, where N is number of measures, ususally 1 or 2
- row totals for each combination of each pivot-row * N measures
- col totals for each combination of each pivot-col * N measures
- totals for each 'section', determined by unique values of first pivot-row * N measures
- values for each measure in every 'cell'; Row Combos * Col Combos * N Measures

So, there can still be a decent amount of data to store; but it will never hold onto all of the 'raw rows' from the
dataset.

We can never completely guarantee that Row Combos * Col Combos * N Measures remains small, but two things let us move
forward anyway:

- there's now visible feedback in the app that the download is running (or if it's failed)
- Pivot table utility diminishes rapidly with huge output anyway; users still need to curate/set up their data
- effectively to improve the table's utility, so we can assume that a slow-to-download pivot table is also slow to
- use/less effective, and will likely be something the user doesn't want (as often).

* some test fixes

* now, if we export 'raw pivot rows', they don't show pivot-grouping

and they also don't include the 'extra' rows for totals/subtotals/grand totals (any row with pivot-grouping > 0).

This means that now the non-pivot version of a pivot table export will match what a user sees if they change the viz
to a regular table.

* remove old test

* re-incorporate some changes from master

* fix csv for non-pivots due to oversight in my changes

This is just a temporary change, I think I should clean up this bit of the code a little, I can probably make it a
little more readable and use some cleaner logic regarding if the rows are 'raw pivot rows' or not.

* start moving format_rows to POST bod, add pivot_results too

There's still wiring work to do, but this starts to add format_rows and pivot_results to POST body for the various API
endpoints. Also modify tests to improve coverage/consistency across downloads and alerts/subscriptions.

The tests will not pass on this commit, but fixes will be incoming

* native pivot tables in xlsx

* add precondition to pass migration linter

* try to get migrations fixed

* pasing pivot-results through api and attachments

* fix tests for format_rows in BODY vs query param

* tests!

* might have the tests all fixed now

* the pivoted export now respects col/row totals settings

* add test coverage for public questions and dashboards

* col and row totals work as expected

* build-pivot refactor for clarity

* docstring change + tiny refactor in helper fn

* see if dashcard download works with format_rows

* csv pivot handles nil values

* pass format_rows and pivot_results in :params not :body

* fix some other tests

* pivot-grouping col filtered out of xlsx

* pivot-grouping-col removed for all rows

* configurable pivot exports and attachments (#47880)

* exports fe

* specs

* ui

* specs

* format/unformatted now works for xlsx

* format test changes for xlsx formatting

* embedding endpoints accept pivot_results

* cljfmt and eslint fix

* empty

* embedding test should have formatting defaulted to true

* embed test fixes

* Use `Chip` for export settings widget

* downloads e2e test fix

* fix public download limit test

* public card download defaults

* fix public download defaults in some tests

* Fix visual test

---------

Co-authored-by: adam-james <21064735+adam-james-v@users.noreply.github.com>
Co-authored-by: Aleksandr Lesnenko <alxnddr@users.noreply.github.com>
Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>
Co-authored-by: Anton Kulyk <kuliks.anton@gmail.com>
parent 29b8b466
No related branches found
No related tags found
No related merge requests found
Hide whitespace changes
Inline Side-by-side
Showing with 288 additions and 180 deletions
Please register or to comment