resources/sample-content.edn · e9bffd66f136cb2b98a39268d0ca056b1e6c650d · Engineering Digital Service / Metabase

9 months ago

Improve the Trash data model (#42845) · 7374e61e

John Swanson authored 9 months ago


* Improve the Trash

Ok, so I had a realization at the PERFECT time, immediately after the RC
cutoff. Great job, brain!

Here's the realization. For the Trash, we need to keep track of two
things:

- where the item actually is located in the hierarchy, and

- what collection we should look at to see what permissions apply to the
item.

For example, a Card might be in the Trash, but we need to look at
Collection 1234 to see that a user has permission to Write that card.

My implementation of this was to add a column,
`trashed_from_collection_id`, so that we could move a Card or a
Dashboard to a new `collection_id`, but keep track of the permissions we
actually needed to check.

So:

- `collection_id` was where the item was located in the collection
hierarchy, and

- `trashed_from_collection_id` was where we needed to look to check
permissions.

Today I had the realization that it's much, much more important to get
PERMISSIONS right than to get collection hierarchy right. Like if we
mess up and show something as in the Trash when it's not in the Trash,
or show something in the wrong Collection - that's not great, sure. But
if we mess up and show a Card when we shouldn't, or show a Dashboard
when we shouldn't, that's Super Duper Bad.

So the problem with my initial implementation was that we needed to
change everywhere that checked permissions, to make sure they checked
BOTH `trashed_from_collection_id` and `collection_id` as appropriate.

So... there's a much better solution. Instead of adding a column to
represent the *permissions* that we should apply to the dashboard or
card, add a column to represent the *location in the hierarchy* that
should apply to the dashboard or the card.

We can simplify further: the *only time* we want to display something in
a different place in the hierarchy than usual is when it was put
directly into the trash. If you trash a dashboard as a part of a
collection, then we should display it in that collection just like
normal.

So, we can do the following:

- add a `trashed_directly` column to Cards and Dashboards, representing
whether they should be displayed in the Trash instead of their actual
parent collection

- use the `collection_id` column of Cards and Dashboards without
modification to represent permissions.

There's one main downside of this approach. If you trash a dashboard,
and then delete the collection that the dashboard was originally in,
what do we do with `dashboard.collection_id`?

- we have to change it, because it's a foreign key

- we can't set it to null, because that represents the root collection

In this initial implementation, I've just cascaded the delete: if you
delete a dashboard and then delete a collection, the dashboard will be
deleted. This is not ideal. I'm not totally sure what we should do in
this situation.

* Rip out all the `trashed_from_collection_id`

* Migration to delete trashed_from_collection_id

* fixes

* don't move collections

And don't allow deleting collections

* only show cards/dashboards with write perms

* Show the correct archived/unarchived branch

* some cleanup

* add a todo for tomorrow

* Fix for yesterday's TODO

* more wip

* refactor

* memoize collection info

* move around memoization a bit

* fix schema migration test

* oops, delete server.middleware.memo

* Use a migration script (postgres only for now)

* Fix some tests

* remove n+1 queries in `collection-can-restore`

* fix test

* fix more tests, and x-db migration script

* fix h2 rollback

* fix mysql/mariadb migration

* lint

* fix some mariadb/mysql tests

* fix h2 rollback

* Fix mysql rollback

* Fix Postgres migration

* "Real" `trash_operation_id` UUIDs from migration

* fix mariadb migration

* Separate MySQL/MariaDB migrations

* trashed directly bit->boolean

* Remove `trashed_from_*` from migrations

* Rename `api/updates-with-trashed-directly`

Previously named `move-on-archive-or-unarchive`, which was no longer
accurate since we're not moving anything!

* Add `can_delete`

* Delete test of deleted code

* Can't move anything to the Trash

The Trash exists as a real collection so that we can list items in and
such without needing special cases. But we don't want anything to
_actually_ get moved to it.

* integrates can_delete flag on FE

* Update src/metabase/models/collection.clj

Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>

* Update src/metabase/search/impl.clj

Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>

* Better name for `fix-collection-id`

Also consolidated it into a single function. It takes the
trash-collection-id as an argument to prevent a circular dependency.

* s/trashed/archived/ on the backend

In the product, we want to move to "trashed" as the descriptor for
things that are currently-known-as-archived.

It'd be nice to switch to that on the backend as well, but that'll
require quite a lot of churn on the frontend as we'd need to change the
API (or have an awkward translation layer, where we call it `trashed` on
the backend, turn it into `archived` when sending data out via the API,
and then the FE turns around and called it "trashed" again).

For now, let's just be consistent on the backend and call it `archived`
everywhere. So:

- `archived_directly` instead of `trashed_directly`, and
- `archive_operation_id` instead of `trash_operation_id`

* Fix up a couple docstrings

* Rename `visible-collection-ids` args

for `collection/permissions-set->visible-collection-ids`:

- `:include-archived` => `:include-archived-items`

- `:include-trash?` => `:include-trash-collection`

* select affected-collection-ids in the tx

* Stop dealing with `:all` visible collection ids

This used to be a possible return value for
`permissions-set->visible-collection-ids` indicating that the user could
view all collections. Now that function always returns a set of
collection IDs (possibly including "root" for the root collection) so we
don't need to deal with that anymore.

* Don't use separate hydration keys for collections

Dispatch on the model of each item (but still keep them in batches for
efficiency.)

* vars for collectable/archived-directly models

* Round up loose `trashed_directly`s

* Fix a test

* Use `clojure.core.memoize` instead of handrolled

It's slightly different (TTL instead of exactly per-request) but that
should be fine.

* FE lint fixes

* 

 e2e test fix

---------

Co-authored-by: Sloan Sparger <sloansparger@gmail.com>
Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>

Unverified

7374e61e

History

Improve the Trash data model (#42845)

John Swanson authored 9 months ago


* Improve the Trash

Ok, so I had a realization at the PERFECT time, immediately after the RC
cutoff. Great job, brain!

Here's the realization. For the Trash, we need to keep track of two
things:

- where the item actually is located in the hierarchy, and

- what collection we should look at to see what permissions apply to the
item.

For example, a Card might be in the Trash, but we need to look at
Collection 1234 to see that a user has permission to Write that card.

My implementation of this was to add a column,
`trashed_from_collection_id`, so that we could move a Card or a
Dashboard to a new `collection_id`, but keep track of the permissions we
actually needed to check.

So:

- `collection_id` was where the item was located in the collection
hierarchy, and

- `trashed_from_collection_id` was where we needed to look to check
permissions.

Today I had the realization that it's much, much more important to get
PERMISSIONS right than to get collection hierarchy right. Like if we
mess up and show something as in the Trash when it's not in the Trash,
or show something in the wrong Collection - that's not great, sure. But
if we mess up and show a Card when we shouldn't, or show a Dashboard
when we shouldn't, that's Super Duper Bad.

So the problem with my initial implementation was that we needed to
change everywhere that checked permissions, to make sure they checked
BOTH `trashed_from_collection_id` and `collection_id` as appropriate.

So... there's a much better solution. Instead of adding a column to
represent the *permissions* that we should apply to the dashboard or
card, add a column to represent the *location in the hierarchy* that
should apply to the dashboard or the card.

We can simplify further: the *only time* we want to display something in
a different place in the hierarchy than usual is when it was put
directly into the trash. If you trash a dashboard as a part of a
collection, then we should display it in that collection just like
normal.

So, we can do the following:

- add a `trashed_directly` column to Cards and Dashboards, representing
whether they should be displayed in the Trash instead of their actual
parent collection

- use the `collection_id` column of Cards and Dashboards without
modification to represent permissions.

There's one main downside of this approach. If you trash a dashboard,
and then delete the collection that the dashboard was originally in,
what do we do with `dashboard.collection_id`?

- we have to change it, because it's a foreign key

- we can't set it to null, because that represents the root collection

In this initial implementation, I've just cascaded the delete: if you
delete a dashboard and then delete a collection, the dashboard will be
deleted. This is not ideal. I'm not totally sure what we should do in
this situation.

* Rip out all the `trashed_from_collection_id`

* Migration to delete trashed_from_collection_id

* fixes

* don't move collections

And don't allow deleting collections

* only show cards/dashboards with write perms

* Show the correct archived/unarchived branch

* some cleanup

* add a todo for tomorrow

* Fix for yesterday's TODO

* more wip

* refactor

* memoize collection info

* move around memoization a bit

* fix schema migration test

* oops, delete server.middleware.memo

* Use a migration script (postgres only for now)

* Fix some tests

* remove n+1 queries in `collection-can-restore`

* fix test

* fix more tests, and x-db migration script

* fix h2 rollback

* fix mysql/mariadb migration

* lint

* fix some mariadb/mysql tests

* fix h2 rollback

* Fix mysql rollback

* Fix Postgres migration

* "Real" `trash_operation_id` UUIDs from migration

* fix mariadb migration

* Separate MySQL/MariaDB migrations

* trashed directly bit->boolean

* Remove `trashed_from_*` from migrations

* Rename `api/updates-with-trashed-directly`

Previously named `move-on-archive-or-unarchive`, which was no longer
accurate since we're not moving anything!

* Add `can_delete`

* Delete test of deleted code

* Can't move anything to the Trash

The Trash exists as a real collection so that we can list items in and
such without needing special cases. But we don't want anything to
_actually_ get moved to it.

* integrates can_delete flag on FE

* Update src/metabase/models/collection.clj

Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>

* Update src/metabase/search/impl.clj

Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>

* Better name for `fix-collection-id`

Also consolidated it into a single function. It takes the
trash-collection-id as an argument to prevent a circular dependency.

* s/trashed/archived/ on the backend

In the product, we want to move to "trashed" as the descriptor for
things that are currently-known-as-archived.

It'd be nice to switch to that on the backend as well, but that'll
require quite a lot of churn on the frontend as we'd need to change the
API (or have an awkward translation layer, where we call it `trashed` on
the backend, turn it into `archived` when sending data out via the API,
and then the FE turns around and called it "trashed" again).

For now, let's just be consistent on the backend and call it `archived`
everywhere. So:

- `archived_directly` instead of `trashed_directly`, and
- `archive_operation_id` instead of `trash_operation_id`

* Fix up a couple docstrings

* Rename `visible-collection-ids` args

for `collection/permissions-set->visible-collection-ids`:

- `:include-archived` => `:include-archived-items`

- `:include-trash?` => `:include-trash-collection`

* select affected-collection-ids in the tx

* Stop dealing with `:all` visible collection ids

This used to be a possible return value for
`permissions-set->visible-collection-ids` indicating that the user could
view all collections. Now that function always returns a set of
collection IDs (possibly including "root" for the root collection) so we
don't need to deal with that anymore.

* Don't use separate hydration keys for collections

Dispatch on the model of each item (but still keep them in batches for
efficiency.)

* vars for collectable/archived-directly models

* Round up loose `trashed_directly`s

* Fix a test

* Use `clojure.core.memoize` instead of handrolled

It's slightly different (TTL instead of exactly per-request) but that
should be fine.

* FE lint fixes

* 

 e2e test fix

---------

Co-authored-by: Sloan Sparger <sloansparger@gmail.com>
Co-authored-by: Noah Moss <32746338+noahmoss@users.noreply.github.com>

Code owners

Assign users and groups as approvers for specific file changes. Learn more.