-
- Downloads
Initial refingerprinting (#13687)
* If scheduled analyze task is short, refingerprint tables - make the sync steps return their values rather than log and nil - dynamic var in fingerprint.clj to swap out query clauses for fields - update the fingerprint runner to not consume whole list when refingerprinting - algorith for fields to refingerprint: shuffle table and do up to 1000 fields - we refingerprint after analyzing in the task if two conditions hold: 1. the analysis lasted under 5 minutes Don't want to hog our CPU or connections 2. no fields were fingerprinted. The first analysis will fingerprint everything and analyze fields base on that. Seems subsequently we almost never fingerprint unless its a new field. TODO for the future to make it better: - manual overrides to prevent these refingerprinting (might be necessary before this goes live. I lean towards actually doing this and making it opt IN so that people can enable. We verify that its working frequently enough to be helpful but not causing problems. Then in 38 or 39 we flip it and make it opt OUT) - better strategies for what to refingerprint. Right now just picks tables at random. We don't have a place to write down frequency of use of tables nor if the fingerprints are changing substantially (for some notion of substantial). Also, only date and number fingerprints are used by the app at the moment. Could just bias to these fields for the moment. - Our analysis doesn't override if there's already a special_type (Or other field things). We don't capture if special_type and other aspects of a field are manually computed (and therefore a candidate to use ongoing fingerprint results (state fields based on percentage of state values, etc). If this becomes the case and our analysis can become more mature to improving insights and knowing its not clobbering a human override/input we could just make the initial fingerprint smarter. As it stands this step is after the normal fingerprinting so that we don't accidentally do too much work and because we can't really use the information in the analysis/classify steps yet. Docstrings for linter * Add refingerprint column to Database its nullable now so that people can opt in and we can migrate to opt out in the future with the following strategy: if null, set to True, set default to True. This allows us to respect people who have turned it off and enabling in a future release when we are sure the performance ramifications are not too severe. * Add tests for refingerprinting * Test for refingerprinting being bounded * Update UI verbiage for refingerprinting
Showing
- frontend/src/metabase/entities/databases/forms.js 7 additions, 0 deletionsfrontend/src/metabase/entities/databases/forms.js
- resources/migrations/000_migrations.yaml 19 additions, 0 deletionsresources/migrations/000_migrations.yaml
- src/metabase/api/database.clj 3 additions, 1 deletionsrc/metabase/api/database.clj
- src/metabase/query_processor/middleware/binning.clj 2 additions, 2 deletionssrc/metabase/query_processor/middleware/binning.clj
- src/metabase/sync.clj 18 additions, 8 deletionssrc/metabase/sync.clj
- src/metabase/sync/analyze.clj 16 additions, 2 deletionssrc/metabase/sync/analyze.clj
- src/metabase/sync/analyze/fingerprint.clj 65 additions, 19 deletionssrc/metabase/sync/analyze/fingerprint.clj
- src/metabase/sync/util.clj 6 additions, 5 deletionssrc/metabase/sync/util.clj
- src/metabase/task/sync_databases.clj 24 additions, 1 deletionsrc/metabase/task/sync_databases.clj
- test/metabase/api/table_test.clj 1 addition, 0 deletionstest/metabase/api/table_test.clj
- test/metabase/sync/analyze/fingerprint_test.clj 39 additions, 1 deletiontest/metabase/sync/analyze/fingerprint_test.clj
- test/metabase/sync/analyze_test.clj 11 additions, 0 deletionstest/metabase/sync/analyze_test.clj
- test/metabase/sync_test.clj 6 additions, 1 deletiontest/metabase/sync_test.clj
- test/metabase/task/sync_databases_test.clj 27 additions, 0 deletionstest/metabase/task/sync_databases_test.clj
Loading
Please register or sign in to comment