Skip to content
Snippets Groups Projects
  • John Swanson's avatar
    7374e61e
    Improve the Trash data model (#42845) · 7374e61e
    John Swanson authored
    
    * Improve the Trash
    
    Ok, so I had a realization at the PERFECT time, immediately after the RC
    cutoff. Great job, brain!
    
    Here's the realization. For the Trash, we need to keep track of two
    things:
    
    - where the item actually is located in the hierarchy, and
    
    - what collection we should look at to see what permissions apply to the
    item.
    
    For example, a Card might be in the Trash, but we need to look at
    Collection 1234 to see that a user has permission to Write that card.
    
    My implementation of this was to add a column,
    `trashed_from_collection_id`, so that we could move a Card or a
    Dashboard to a new `collection_id`, but keep track of the permissions we
    actually needed to check.
    
    So:
    
    - `collection_id` was where the item was located in the collection
    hierarchy, and
    
    - `trashed_from_collection_id` was where we needed to look to check
    permissions.
    
    Today I had the realization that it's much, much more important to get
    PERMISSIONS right than to get collection hierarchy right. Like if we
    mess up and show something as in the Trash when it's not in the Trash,
    or show something in the wrong Collection - that's not great, sure. But
    if we mess up and show a Card when we shouldn't, or show a Dashboard
    when we shouldn't, that's Super Duper Bad.
    
    So the problem with my initial implementation was that we needed to
    change everywhere that checked permissions, to make sure they checked
    BOTH `trashed_from_collection_id` and `collection_id` as appropriate.
    
    So... there's a much better solution. Instead of adding a column to
    represent the *permissions* that we should apply to the dashboard or
    card, add a column to represent the *location in the hierarchy* that
    should apply to the dashboard or the card.
    
    We can simplify further: the *only time* we want to display something in
    a different place in the hierarchy than usual is when it was put
    directly into the trash. If you trash a dashboard as a part of a
    collection, then we should display it in that collection just like
    normal.
    
    So, we can do the following:
    
    - add a `trashed_directly` column to Cards and Dashboards, representing
    whether they should be displayed in the Trash instead of their actual
    parent collection
    
    - use the `collection_id` column of Cards and Dashboards without
    modification to represent permissions.
    
    There's one main downside of this approach. If you trash a dashboard,
    and then delete the collection that the dashboard was originally in,
    what do we do with `dashboard.collection_id`?
    
    - we have to change it, because it's a foreign key
    
    - we can't set it to null, because that represents the root collection
    
    In this initial implementation, I've just cascaded the delete: if you
    delete a dashboard and then delete a collection, the dashboard will be
    deleted. This is not ideal. I'm not totally sure what we should do in
    this situation.
    
    * Rip out all the `trashed_from_collection_id`
    
    * Migration to delete trashed_from_collection_id
    
    * fixes
    
    * don't move collections
    
    And don't allow deleting collections
    
    * only show cards/dashboards with write perms
    
    * Show the correct archived/unarchived branch
    
    * some cleanup
    
    * add a todo for tomorrow
    
    * Fix for yesterday's TODO
    
    * more wip
    
    * refactor
    
    * memoize collection info
    
    * move around memoization a bit
    
    * fix schema migration test
    
    * oops, delete server.middleware.memo
    
    * Use a migration script (postgres only for now)
    
    * Fix some tests
    
    * remove n+1 queries in `collection-can-restore`
    
    * fix test
    
    * fix more tests, and x-db migration script
    
    * fix h2 rollback
    
    * fix mysql/mariadb migration
    
    * lint
    
    * fix some mariadb/mysql tests
    
    * fix h2 rollback
    
    * Fix mysql rollback
    
    * Fix Postgres migration
    
    * "Real" `trash_operation_id` UUIDs from migration
    
    * fix mariadb migration
    
    * Separate MySQL/MariaDB migrations
    
    * trashed directly bit->boolean
    
    * Remove `trashed_from_*` from migrations
    
    * Rename `api/updates-with-trashed-directly`
    
    Previously named `move-on-archive-or-unarchive`, which was no longer
    accurate since we're not moving anything!
    
    * Add `can_delete`
    
    * Delete test of deleted code
    
    * Can't move anything to the Trash
    
    The Trash exists as a real collection so that we can list items in and
    such without needing special cases. But we don't want anything to
    _actually_ get moved to it.
    
    * integrates can_delete flag on FE
    
    * Update src/metabase/models/collection.clj
    
    Co-authored-by: default avatarNoah Moss <32746338+noahmoss@users.noreply.github.com>
    
    * Update src/metabase/search/impl.clj
    
    Co-authored-by: default avatarNoah Moss <32746338+noahmoss@users.noreply.github.com>
    
    * Better name for `fix-collection-id`
    
    Also consolidated it into a single function. It takes the
    trash-collection-id as an argument to prevent a circular dependency.
    
    * s/trashed/archived/ on the backend
    
    In the product, we want to move to "trashed" as the descriptor for
    things that are currently-known-as-archived.
    
    It'd be nice to switch to that on the backend as well, but that'll
    require quite a lot of churn on the frontend as we'd need to change the
    API (or have an awkward translation layer, where we call it `trashed` on
    the backend, turn it into `archived` when sending data out via the API,
    and then the FE turns around and called it "trashed" again).
    
    For now, let's just be consistent on the backend and call it `archived`
    everywhere. So:
    
    - `archived_directly` instead of `trashed_directly`, and
    - `archive_operation_id` instead of `trash_operation_id`
    
    * Fix up a couple docstrings
    
    * Rename `visible-collection-ids` args
    
    for `collection/permissions-set->visible-collection-ids`:
    
    - `:include-archived` => `:include-archived-items`
    
    - `:include-trash?` => `:include-trash-collection`
    
    * select affected-collection-ids in the tx
    
    * Stop dealing with `:all` visible collection ids
    
    This used to be a possible return value for
    `permissions-set->visible-collection-ids` indicating that the user could
    view all collections. Now that function always returns a set of
    collection IDs (possibly including "root" for the root collection) so we
    don't need to deal with that anymore.
    
    * Don't use separate hydration keys for collections
    
    Dispatch on the model of each item (but still keep them in batches for
    efficiency.)
    
    * vars for collectable/archived-directly models
    
    * Round up loose `trashed_directly`s
    
    * Fix a test
    
    * Use `clojure.core.memoize` instead of handrolled
    
    It's slightly different (TTL instead of exactly per-request) but that
    should be fine.
    
    * FE lint fixes
    
    * :sob:
    
     e2e test fix
    
    ---------
    
    Co-authored-by: default avatarSloan Sparger <sloansparger@gmail.com>
    Co-authored-by: default avatarNoah Moss <32746338+noahmoss@users.noreply.github.com>
    Improve the Trash data model (#42845)
    John Swanson authored
    
    * Improve the Trash
    
    Ok, so I had a realization at the PERFECT time, immediately after the RC
    cutoff. Great job, brain!
    
    Here's the realization. For the Trash, we need to keep track of two
    things:
    
    - where the item actually is located in the hierarchy, and
    
    - what collection we should look at to see what permissions apply to the
    item.
    
    For example, a Card might be in the Trash, but we need to look at
    Collection 1234 to see that a user has permission to Write that card.
    
    My implementation of this was to add a column,
    `trashed_from_collection_id`, so that we could move a Card or a
    Dashboard to a new `collection_id`, but keep track of the permissions we
    actually needed to check.
    
    So:
    
    - `collection_id` was where the item was located in the collection
    hierarchy, and
    
    - `trashed_from_collection_id` was where we needed to look to check
    permissions.
    
    Today I had the realization that it's much, much more important to get
    PERMISSIONS right than to get collection hierarchy right. Like if we
    mess up and show something as in the Trash when it's not in the Trash,
    or show something in the wrong Collection - that's not great, sure. But
    if we mess up and show a Card when we shouldn't, or show a Dashboard
    when we shouldn't, that's Super Duper Bad.
    
    So the problem with my initial implementation was that we needed to
    change everywhere that checked permissions, to make sure they checked
    BOTH `trashed_from_collection_id` and `collection_id` as appropriate.
    
    So... there's a much better solution. Instead of adding a column to
    represent the *permissions* that we should apply to the dashboard or
    card, add a column to represent the *location in the hierarchy* that
    should apply to the dashboard or the card.
    
    We can simplify further: the *only time* we want to display something in
    a different place in the hierarchy than usual is when it was put
    directly into the trash. If you trash a dashboard as a part of a
    collection, then we should display it in that collection just like
    normal.
    
    So, we can do the following:
    
    - add a `trashed_directly` column to Cards and Dashboards, representing
    whether they should be displayed in the Trash instead of their actual
    parent collection
    
    - use the `collection_id` column of Cards and Dashboards without
    modification to represent permissions.
    
    There's one main downside of this approach. If you trash a dashboard,
    and then delete the collection that the dashboard was originally in,
    what do we do with `dashboard.collection_id`?
    
    - we have to change it, because it's a foreign key
    
    - we can't set it to null, because that represents the root collection
    
    In this initial implementation, I've just cascaded the delete: if you
    delete a dashboard and then delete a collection, the dashboard will be
    deleted. This is not ideal. I'm not totally sure what we should do in
    this situation.
    
    * Rip out all the `trashed_from_collection_id`
    
    * Migration to delete trashed_from_collection_id
    
    * fixes
    
    * don't move collections
    
    And don't allow deleting collections
    
    * only show cards/dashboards with write perms
    
    * Show the correct archived/unarchived branch
    
    * some cleanup
    
    * add a todo for tomorrow
    
    * Fix for yesterday's TODO
    
    * more wip
    
    * refactor
    
    * memoize collection info
    
    * move around memoization a bit
    
    * fix schema migration test
    
    * oops, delete server.middleware.memo
    
    * Use a migration script (postgres only for now)
    
    * Fix some tests
    
    * remove n+1 queries in `collection-can-restore`
    
    * fix test
    
    * fix more tests, and x-db migration script
    
    * fix h2 rollback
    
    * fix mysql/mariadb migration
    
    * lint
    
    * fix some mariadb/mysql tests
    
    * fix h2 rollback
    
    * Fix mysql rollback
    
    * Fix Postgres migration
    
    * "Real" `trash_operation_id` UUIDs from migration
    
    * fix mariadb migration
    
    * Separate MySQL/MariaDB migrations
    
    * trashed directly bit->boolean
    
    * Remove `trashed_from_*` from migrations
    
    * Rename `api/updates-with-trashed-directly`
    
    Previously named `move-on-archive-or-unarchive`, which was no longer
    accurate since we're not moving anything!
    
    * Add `can_delete`
    
    * Delete test of deleted code
    
    * Can't move anything to the Trash
    
    The Trash exists as a real collection so that we can list items in and
    such without needing special cases. But we don't want anything to
    _actually_ get moved to it.
    
    * integrates can_delete flag on FE
    
    * Update src/metabase/models/collection.clj
    
    Co-authored-by: default avatarNoah Moss <32746338+noahmoss@users.noreply.github.com>
    
    * Update src/metabase/search/impl.clj
    
    Co-authored-by: default avatarNoah Moss <32746338+noahmoss@users.noreply.github.com>
    
    * Better name for `fix-collection-id`
    
    Also consolidated it into a single function. It takes the
    trash-collection-id as an argument to prevent a circular dependency.
    
    * s/trashed/archived/ on the backend
    
    In the product, we want to move to "trashed" as the descriptor for
    things that are currently-known-as-archived.
    
    It'd be nice to switch to that on the backend as well, but that'll
    require quite a lot of churn on the frontend as we'd need to change the
    API (or have an awkward translation layer, where we call it `trashed` on
    the backend, turn it into `archived` when sending data out via the API,
    and then the FE turns around and called it "trashed" again).
    
    For now, let's just be consistent on the backend and call it `archived`
    everywhere. So:
    
    - `archived_directly` instead of `trashed_directly`, and
    - `archive_operation_id` instead of `trash_operation_id`
    
    * Fix up a couple docstrings
    
    * Rename `visible-collection-ids` args
    
    for `collection/permissions-set->visible-collection-ids`:
    
    - `:include-archived` => `:include-archived-items`
    
    - `:include-trash?` => `:include-trash-collection`
    
    * select affected-collection-ids in the tx
    
    * Stop dealing with `:all` visible collection ids
    
    This used to be a possible return value for
    `permissions-set->visible-collection-ids` indicating that the user could
    view all collections. Now that function always returns a set of
    collection IDs (possibly including "root" for the root collection) so we
    don't need to deal with that anymore.
    
    * Don't use separate hydration keys for collections
    
    Dispatch on the model of each item (but still keep them in batches for
    efficiency.)
    
    * vars for collectable/archived-directly models
    
    * Round up loose `trashed_directly`s
    
    * Fix a test
    
    * Use `clojure.core.memoize` instead of handrolled
    
    It's slightly different (TTL instead of exactly per-request) but that
    should be fine.
    
    * FE lint fixes
    
    * :sob:
    
     e2e test fix
    
    ---------
    
    Co-authored-by: default avatarSloan Sparger <sloansparger@gmail.com>
    Co-authored-by: default avatarNoah Moss <32746338+noahmoss@users.noreply.github.com>
Code owners
Assign users and groups as approvers for specific file changes. Learn more.