Skip to content
Snippets Groups Projects
Commit 722649aa authored by Cam Saul's avatar Cam Saul
Browse files

Metadata for the Sample Dataset.

New functionality to sync data included in a _metabase_metadata table.
parent 14123646
No related branches found
No related tags found
No related merge requests found
No preview for this file type
(ns metabase.sample-dataset.generate
(:require [clojure.math.numeric-tower :as math]
(:require [clojure.java.io :as io]
[clojure.math.numeric-tower :as math]
[clojure.string :as s]
(faker [address :as address]
[company :as company]
......@@ -73,7 +74,7 @@
(+ min))))
(def ^:private ^:const product-names
{:adjective '[Small, Ergonomic, Rustic, Intelligent, Gorgeous, Incredible, Fantastic, Practical, Sleek, Awesome, Enormous, Mediocre, Synergistic, Heavy Duty, Lightweight, Aerodynamic, Durable]
{:adjective '[Small, Ergonomic, Rustic, Intelligent, Gorgeous, Incredible, Fantastic, Practical, Sleek, Awesome, Enormous, Mediocre, Synergistic, Heavy-Duty, Lightweight, Aerodynamic, Durable]
:material '[Steel, Wooden, Concrete, Plastic, Cotton, Granite, Rubber, Leather, Silk, Wool, Linen, Marble, Iron, Bronze, Copper, Aluminum, Paper]
:product '[Chair, Car, Computer, Gloves, Pants, Shirt, Table, Shoes, Hat, Plate, Knife, Bottle, Coat, Lamp, Keyboard, Bag, Bench, Clock, Watch, Wallet]})
......@@ -323,13 +324,56 @@
:field "PRODUCT_ID"
:dest-table "PRODUCTS"}])
(def ^:private ^:const annotations
{:orders {:description "This is a confirmed order for a product from a user."
:columns {:created_at {:description "The date and time an order was submitted."}
:id {:description "This is a unique ID for the product. It is also called the “Invoice number” or “Confirmation number” in customer facing emails and screens."}
:product_id {:description "The product ID. This is an internal identifier for the product, NOT the SKU."}
:subtotal {:description "The raw, pre-tax cost of the order. Note that this might be different in the future from the product price due to promotions, credits, etc."}
:tax {:description "This is the amount of local and federal taxes that are collected on the purchase. Note that other governmental fees on some products are not included here, but instead are accounted for in the subtotal."}
:total {:description "The total billed amount."}
:user_id {:description "The id of the user who made this order. Note that in some cases where an order was created on behalf of a customer who phoned the order in, this might be the employee who handled the request."}}}
:people {:description "This is a user account. Note that employees and customer support staff will have accounts."
:columns {:address {:description "The street address of the account’s billing address"}
:birth_date {:description "The date of birth of the user"}
:city {:description "The city of the account’s billing address"}
:created_at {:description "The date the user record was created. Also referred to as the user’s \"join date\""}
:email {:description "The contact email for the account."}
:id {:description "A unique identifier given to each user."}
:latitude {:description "This is the latitude of the user on sign-up. It might be updated in the future to the last seen location."}
:longitude {:description "This is the longitude of the user on sign-up. It might be updated in the future to the last seen location."}
:name {:description "The name of the user who owns an account"}
:password {:description "This is the salted password of the user. It should not be visible"
:field_type :sensitive}
:source {:description "The channel through which we acquired this user. Valid values include: Affiliate, Facebook, Google, Organic and Twitter"}
:state {:description "The state or province of the account’s billing address"}
:zip {:description "The postal code of the account’s billing address"
:special_type :zip_code}}}
:products {:description "This is our product catalog. It includes all products ever sold by the Sample Company."
:columns {:category {:description "The type of product, valid values include: Doohicky, Gadget, Gizmo and Widget"}
:created_at {:description "The date the product was added to our catalog."}
:ean {:description "The international article number. A 13 digit number uniquely identifying the product."}
:id {:description "The numerical product number. Only used internally. All external communication should use the title or EAN."}
:price {:description "The list price of the product. Note that this is not always the price the product sold for due to discounts, promotions, etc."}
:rating {:description "The average rating users have given the product. This ranges from 1 - 5"}
:title {:description "The name of the product as it should be displayed to customers."}
:vendor {:description "The source of the product."}}}
:reviews {:description "These are reviews our customers have left on products. Note that these are not tied to orders so it is possible people have reviewed products they did not purchase from us."
:columns {:body {:description "The review the user left. Limited to 2000 characters."
:special_type :desc}
:created_at {:description "The day and time a review was written by a user."}
:id {:description "A unique internal identifier for the review. Should not be used externally."}
:product_id {:description "The product the review was for"}
:rating {:description "The rating (on a scale of 1-5) the user left."}
:reviewer {:description "The user who left the review"}}}})
(defn create-h2-db
([filename]
(create-h2-db filename (create-random-data)))
([filename data]
(println "Deleting existing db...")
(clojure.java.io/delete-file (str filename ".mv.db") :silently)
(clojure.java.io/delete-file (str filename ".trace.db") :silently)
(io/delete-file (str filename ".mv.db") :silently)
(io/delete-file (str filename ".trace.db") :silently)
(println "Creating db...")
(let [db (kdb/h2 {:db (format "file:%s;UNDO_LOG=0;CACHE_SIZE=131072;QUERY_CACHE_SIZE=128;COMPRESS=TRUE;MULTI_THREADED=TRUE;MVCC=TRUE;DEFRAG_ALWAYS=TRUE;MAX_COMPACT_TIME=5000;ANALYZE_AUTO=100"
filename)
......@@ -358,10 +402,24 @@
[(s/upper-case (name k)) v])
(into {})))))))
;; Insert the _metabase_metadata table
(println "Inserting _metabase_metadata...")
(k/exec-raw db (format "CREATE TABLE \"_METABASE_METADATA\" (\"keypath\" VARCHAR(255), \"value\" VARCHAR(255), PRIMARY KEY (\"keypath\"));"))
(-> (k/create-entity "_METABASE_METADATA")
(k/database db)
(k/insert (k/values (reduce concat (for [[table-name {table-description :description, columns :columns}] annotations]
(let [table-name (s/upper-case (name table-name))]
(conj (for [[column-name kvs] columns
[k v] kvs]
{:keypath (format "%s.%s.%s" table-name (s/upper-case (name column-name)) (name k))
:value (name v)})
{:keypath (format "%s.description" table-name)
:value table-description})))))))
;; Create the 'GUEST' user
(println "Preparing database for export...")
(k/exec-raw db "CREATE USER GUEST PASSWORD 'guest';")
(doseq [table (keys data)]
(doseq [table (conj (keys data) "_METABASE_METADATA")]
(k/exec-raw db (format "GRANT SELECT ON %s TO GUEST;" (s/upper-case (name table)))))
(println "Done."))))
......
......@@ -96,6 +96,11 @@
(fetch-chunk 0 field-values-lazy-seq-chunk-size
max-sync-lazy-seq-results)))
(defn- table-rows-seq [_ database table-name]
(k/select (-> (k/create-entity table-name)
(k/database (db->korma-db database)))))
(defn- table-fks [_ table]
(with-jdbc-metadata [^java.sql.DatabaseMetaData md @(:db table)]
(->> (.getImportedKeys md nil nil (:name table))
......@@ -144,7 +149,8 @@
:active-table-names active-table-names
:active-column-names->type active-column-names->type
:table-pks table-pks
:field-values-lazy-seq field-values-lazy-seq})
:field-values-lazy-seq field-values-lazy-seq
:table-rows-seq table-rows-seq})
(def ^:const GenericSQLISyncDriverTableFKsMixin
"Generic SQL implementation of the `ISyncDriverTableFKs` protocol."
......
......@@ -10,7 +10,9 @@
[metabase.driver.query-processor :as qp]
[metabase.driver.generic-sql.interface :as i]))
(defn- db->connection-spec [{{:keys [short-lived?]} :details, :as database}]
(defn- db->connection-spec
"Return a JDBC connection spec for a Metabase `Database`."
[{{:keys [short-lived?]} :details, :as database}]
(let [driver (driver/engine->driver (:engine database))
database->connection-details (partial i/database->connection-details driver)
connection-details->connection-spec (partial i/connection-details->connection-spec driver)]
......
......@@ -56,6 +56,14 @@
"Return a lazy sequence of all values of Field.
This is used to implement `mark-json-field!`, and fallback implentations of `mark-no-preview-display-field!` and `mark-url-field!`
if drivers *don't* implement `ISyncDriverFieldAvgLength` or `ISyncDriverFieldPercentUrls`, respectively.")
(table-rows-seq [this database table-name]
"Return a sequence of all the rows in a table with a given TABLE-NAME.
Currently, this is only used for iterating over the values in a `_metabase_metadata` table. As such, the results are not expected to be returned lazily.
(table-rows-seq driver (Database 2) \"_metabase_metadata\")
-> [{:keypath \"people.description\"
:value \"...\"}
...]")
;; Query Processing
(process-query [this query]
......
......@@ -28,6 +28,7 @@
set-field-display-name-if-needed!
sync-database-active-tables!
sync-field!
sync-metabase-metadata-table!
sync-table-active-fields-and-pks!
sync-table-fks!
sync-table-fields-metadata!
......@@ -72,17 +73,55 @@
(when (seq new-table-names)
(log/debug (u/format-color 'blue "Found new tables: %s" new-table-names))
(doseq [new-table-name new-table-names]
(ins Table :db_id (:id database), :active true, :name new-table-name)))))
;; If it's a _metabase_metadata table then we'll handle later once everything else has been synced
(when-not (= (s/lower-case new-table-name) "_metabase_metadata")
(ins Table :db_id (:id database), :active true, :name new-table-name))))))
;; Now sync the active tables
(->> (sel :many Table :db_id (:id database) :active true)
(map #(assoc % :db (delay database))) ; replace default delays with ones that reuse database (and don't require a DB call)
(sync-database-active-tables! driver))
;; Ok, now if we had a _metabase_metadata table from earlier we can go ahead and sync from it
(sync-metabase-metadata-table! driver database)
(events/publish-event :database-sync-end {:database_id (:id database) :custom_id tracking-hash :running_time (- (System/currentTimeMillis) start-time)})
(log/info (u/format-color 'magenta "Finished syncing %s database %s. (%d ms)" (name (:engine database)) (:name database)
(- (System/currentTimeMillis) start-time))))))))
(defn- sync-metabase-metadata-table!
"Databases may include a table named `_metabase_metadata` (case-insentive) which includes descriptions or other metadata about the `Tables` and `Fields`
it contains. This table is *not* synced normally, i.e. a Metabase `Table` is not created for it. Instead, *this* function is called, which reads the data it
contains and updates the relevant Metabase objects.
The table should have the following schema:
column | type | example
--------+---------+-------------------------------------------------
keypath | varchar | \"products.created_at.description\"
value | varchar | \"The date the product was added to our catalog.\"
`keypath` is of the form `table-name.key` or `table-name.field-name.key`, where `key` is the name of some property of `Table` or `Field`.
This functionality is currently only used by the Sample Dataset."
[driver database]
(doseq [table-name (active-table-names driver database)]
(when (= (s/lower-case table-name) "_metabase_metadata")
(doseq [{:keys [keypath value]} (table-rows-seq driver database table-name)]
(let [[_ table-name field-name k] (re-matches #"^([^.]+)\.(?:([^.]+)\.)?([^.]+)$" keypath)]
(try (when (not= 1 (if field-name
(k/update Field
(k/where {:name field-name, :table_id (k/subselect Table
(k/fields :id)
(k/where {:db_id (:id database), :name table-name}))})
(k/set-fields {(keyword k) value}))
(k/update Table
(k/where {:name table-name, :db_id (:id database)})
(k/set-fields {(keyword k) value}))))
(log/error (u/format-color "Error syncing _metabase_metadata: no matching keypath: %s" keypath)))
(catch Throwable e
(log/error (u/format-color 'red "Error in _metabase_metadata: %s" (.getMessage e))))))))))
(defn sync-table!
"Sync a *single* TABLE by running all the sync steps for it.
This is used *instead* of `sync-database!` when syncing just one Table is desirable."
......@@ -501,7 +540,7 @@
[#"^active$" bool-or-int :category]
[#"^city$" text :city]
[#"^country$" text :country]
[#"^countrycode$" text :country]
[#"^countryCode$" text :country]
[#"^currency$" int-or-text :category]
[#"^first_name$" text :name]
[#"^full_name$" text :name]
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment