Skip to content
Snippets Groups Projects
Commit 38658568 authored by Sameer Al-Sakran's avatar Sameer Al-Sakran
Browse files

Merge pull request #313 from metabase/retarget_readme

retarget the readme from being targetted to contributors to being target...
parents aefb040d dd521cc5
No related branches found
No related tags found
No related merge requests found
......@@ -12,7 +12,7 @@ pom.xml.asc
/.lein-repl-history
/.nrepl-port
.idea/
/docs
/docs/uberdoc.html
profiles.clj
/*.h2.db
/*.mv.db
......
[![Circle CI](https://circleci.com/gh/metabase/metabase-init.svg?style=svg&circle-token=3ccf0aa841028af027f2ac9e8df17ce603e90ef9)](https://circleci.com/gh/metabase/metabase-init)
## Install Prerequisites
# Overview
1. Oracle JDK 8 (http://www.oracle.com/technetwork/java/javase/downloads/index.html)
2. Node.js for npm (http://nodejs.org/)
3. Leiningen (http://leiningen.org/)
Metabase Report server is an easy way to generate charts and dashboards, ask simple ad hoc queries without using SQL, and see detailed information about rows in your Database. You can set it up in under 5 minutes, and then give yourself and others a place to ask simple questions and understand the data your application is generating. It is not tied to any specific framework and can be used out of the box with minimal configuration.
With a bit of tagging and annotation of what the tables and fields in your database mean, it can be used to provide a rich, humanized version analytics server and administration interface.
## Build
# What it isn't
Install clojure + npm/bower requirements with
The Report Server does not deal with getting data into a database or data warehouse or with transforming your data into a representation that lets you answer specific questions. Most sophisticated installations will have separate Ingestion processes that get data from third parties, event collectors or database snapshots into a Data Warehouse as well as Transformation Processes that join, denormalize, enrich or otherwise get your data into a shape that more convenient for use in analytics.
lein deps
lein npm
The report server does not collect web page views or mobile events, though it can help you understand conversion funnels, cohort retention and use behavior in general once you have collected these events into a database.
Build the application JS and CSS with
See the [Data Warehouse Guide](docs/DATAWAREHOUSING.md) for more information and advice.
lein gulp
# Security Disclosure
When developing the frontend client, you'll want to watch for changes,
so run the default gulp task.
Security is very important to us. If discover any issue regarding security, please disclose the information responsibly by sending an email to security@metabase.com and not by creating a github issue.
./node_modules/gulp/bin/gulp.js
# Installation
To run the Report server you will need to have a Java Runtime installed. As a quick check to see if you system already has one, try
## Usage
java -version
Then run the HTTP server with
If you see something like
lein ring server
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
you are good to go. Otherwise, download the Java Runtime Environment at http://java.com/
## Unit Tests / Linting
To install the Query Server, go to the [Metabase Download Page](http://www.metabase.com/download) and download the current build. Place the downloaded jar into a newly created directory (as it will create some files when it is run), and run it on the command line:
Check that the project can compile successfully with
java -jar metabase.jar
lein uberjar
On the first run of the Report Server, the command line invocation will output a line like
Run the linters with
http://localhost:3000/setup/init/XXXXX
lein eastwood # Clojure linters
lein bikeshed --max-line-length 240
./lint_js.sh # JavaScript linter
where XXXXX is a randomly generated token that can only be used to set up your first account for that particular installation. Once you have created that account, the token (and that URL) will no longer work.
Run unit tests with
On logging in, you will be asked a set of questions that will set up a user account, and then you can add a database connection. For this to work you will need to get some information about which database you want to connect to, such as the Host Name and Port that it is running on, the Database Name and the User and Password that you will be using.
lein test
Once you have added this connection, you will be taken into the app and you'll be ready to ask your first question.
By default, the tests only run against the `generic-sql` dataset (an H2 test database).
You can run specify which datasets/drivers to run tests against with the env var `MB_TEST_DATASETS`:
For more information or troubleshooting, check out the [Installation Guide](docs/INSTALLATION.md)
MB_TEST_DATASETS=generic-sql,mongo lein test
# Getting Started
At the time of this writing, the valid datasets are `generic-sql` and `mongo`.
Follow our [Getting Started](docs/GETTINGSTARTED.md) guide to learn how to use the Report Server.
# Contributing
## Documentation
To get started with a development installation of the Query Server and learn more about contributing, please follow the instructions at our [Developers Guide](docs/DEVELOPERS.md).
#### Instant Cheatsheet
# Extending and Deep Integrations
Start up an instant cheatsheet for the project + dependencies by running
Metabase also allows you to hit our Query API directly from Javascript to integrate the simple analytics we provide with your own application or third party services to do things like:
lein instant-cheatsheet
* Build moderation interfaces
* Export subsets of your users to third party marketing automation software
* Provide a specialized customer lookup application for the people in your company
#### Marginalia
Available at http://metabase.github.io/metabase-init/.
# License
You can generate and view documentation with
Unless otherwise noted, all Metabase Report Server source files are made available under the terms of the GNU Affero General Public License (AGPL).
lein marg
open ./docs/uberdoc.html
See individual files for details.
You can update the GitHub pages documentation using
make dox
You should be on the `master` branch without any uncommited local changes before doing so. Also, make sure you've fetched the branch `gh-pages` and can push it back to `origin`.
## Migration Summary
lein migration-summary
Will give you a list of all tables + fields in the Metabase DB.
## Bootstrapping (for Development)
To quickly get your dev environment set up, use the `bootstrap` function to create a new User and Organization.
Open a REPL in Emacs or with `lein repl` and enter the following:
```clojure
(use 'metabase.db)
(setup-db)
(use 'metabase.bootstrap)
(bootstrap)
```
You'll be walked through the steps to get started.
## API Client (for Development)
You can make API calls from the REPL using `metabase.http-client`:
```clojure
(use 'metabase.http-client)
(defn cl [& args]
(-> (apply client {:email "crowberto@metabase.com", :password "blackjet"} args)
clojure.pprint/pprint))
(cl :get "user/current")
;; -> {:email "crowbetro@metabase.com",
;; :first_name "Crowbero",
;; :last_login #inst "2015-03-13T22:55:05.390000000-00:00",
;; ...}
```
## Developing with Emacs
`.dir-locals.el` contains some Emacs Lisp that tells `clojure-mode` how to indent Metabase macros and which arguments are docstrings. Whenever this file is updated,
Emacs will ask you if the code is safe to load. You can answer `!` to save it as safe.
By default, Emacs will insert this code as a customization at the bottom of your `init.el`.
You'll probably want to tell Emacs to store customizations in a different file. Add the following to your `init.el`:
```emacs-lisp
(setq custom-file (concat user-emacs-directory ".custom.el")) ; tell Customize to save customizations to ~/.emacs.d/.custom.el
(ignore-errors ; load customizations from ~/.emacs.d/.custom.el
(load-file custom-file))
```
## Checking for Out-of-Date Dependencies
lein ancient # list all out-of-date dependencies
lein ancient latest lein-ring # list latest version of artifact lein-ring
Will give you a list of out-of-date dependencies.
Once's this repo is made public, this Clojars badge will work and show the status as well:
[![Dependencies Status](http://jarkeeper.com/metabase/metabase-init/status.png)](http://jarkeeper.com/metabase/metabase-init)
## License
Copyright © 2015 Metabase, Inc.
Copyright © 2015 Metabase, Inc
# Overview
Metabase allows you to optionally annotation the data in your database or datawarehouse. These annotations provide metabase with an understanding of what the data actually means and allows it to more intelligently process and display it for you. We currently allow you to annotate tables and columns.
All of these settings are editable via the metadata editing page.
# Types of Metadata
## Tables
### Table type
A table can be marked as one of the below types.
* Business Entity Table
* Rollup or Metrics Table
* System Table - this is something that is only used
* Intermediate Table
Typically, only Business Entities and Metrics tables are displayed in list, and they will be colored differently to allow you to quickly find the table of interest.
## Fields
A field is a representation of either a Column (when using a SQL based database, like PostgreSQL) or a field in a document (when using a document or json based database like MongoDB).
### Name
Clicking on the name of the field allows you to change how the field name is displayed. For example, if your ORM produces table names like “auth.user", you can replace this with “User” to make it more readable.
### Description
This is a human readable description of what the field is and how it is meant to be used. Any caveats about interpretation can go here as well.
### Visibility
Fields are always displayed in “long form” spots like the detail pages for a specific row. By default, any column with an average length of longer than 50 characters is clipped. If you wish to toggle this, click on the checkbox next to a field name.
### Position
A field has a default position, which is used whenever a row is displayed. Some views allow you to rearrange the order of column. Cases where you might want to use this are if you have a clear primary identifier for a table that for whatever reason is not the first column, or to move variable length columns to the end to make it easier to scan a table.
### Database Representation
This refers to how the basic representation of the field in the database. It is not editable as it represents how things are stored. It is useful to see if say “1” refers to a number or a string in the underlying database.
### Basic Types
* Metric - A metric is a number that you expect to plot, sum, take averages of, etc. Basically anything that would end up being plotted on the Y-Axis of a graph.
* Dimension - This is any field that you expect to use as an X-Axis of a graph or as part of a pivot table.
* Information - This is any other information that is not expected to be used in any kind of aggregate metrics but contains other information. Examples include descriptions, names, emails
### Semantic Types
A field’s semantic type is used to determine how to display it as well as providing information to users of the data about the underlying meaning. For example, by marking a fields in a table as Latitude and Longitude, you allow the table to be used to power pin and heat maps. Similarly, marking a field as a URL allows users to click on it and go to that url.
Semantic types include
* Avatar Image URL
* Category
* City
* Country
* Description
* Foreign Key
* Entity Key
* Image URL
* Field containing JSON
* Latitude
* Longitude
* Entity Name
* Number
* State
* URL
* Zip Code
\ No newline at end of file
It is rare that your applications database will have all the data you need and be structured in a way that lets you ask all of the questions you are interested in. Typically an application database will have a schema optimized for small reads and updates, while most analytics queries typically touch a large fraction of a table.
# Ingestion
## From other databases
If you database is small enough, then it is generally easy enough to dump the whole database and then ingest it into your datawarehouse.
### Postgres
### MySQL
### Heroku
## Events
## Third party data
# Transformation
## Uniques
## Event Enrichment
## Denormalization
## Working backwards from Metrics Example
[![Circle CI](https://circleci.com/gh/metabase/metabase-init.svg?style=svg&circle-token=3ccf0aa841028af027f2ac9e8df17ce603e90ef9)](https://circleci.com/gh/metabase/metabase-init)
## Install Prerequisites
1. Oracle JDK 8 (http://www.oracle.com/technetwork/java/javase/downloads/index.html)
2. Node.js for npm (http://nodejs.org/)
3. Leiningen (http://leiningen.org/)
## Build
Install clojure + npm/bower requirements with
lein deps
lein npm
Build the application JS and CSS with
lein gulp
When developing the frontend client, you'll want to watch for changes,
so run the default gulp task.
./node_modules/gulp/bin/gulp.js
## Usage
Then run the HTTP server with
lein ring server
## Unit Tests / Linting
Check that the project can compile successfully with
lein uberjar
Run the linters with
lein eastwood # Clojure linters
lein bikeshed --max-line-length 240
./lint_js.sh # JavaScript linter
Run unit tests with
lein test
By default, the tests only run against the `generic-sql` dataset (an H2 test database).
You can run specify which datasets/drivers to run tests against with the env var `MB_TEST_DATASETS`:
MB_TEST_DATASETS=generic-sql,mongo lein test
At the time of this writing, the valid datasets are `generic-sql` and `mongo`.
## Documentation
#### Instant Cheatsheet
Start up an instant cheatsheet for the project + dependencies by running
lein instant-cheatsheet
#### Marginalia
Available at http://metabase.github.io/metabase-init/.
You can generate and view documentation with
lein marg
open ./docs/uberdoc.html
You can update the GitHub pages documentation using
make dox
You should be on the `master` branch without any uncommited local changes before doing so. Also, make sure you've fetched the branch `gh-pages` and can push it back to `origin`.
## Migration Summary
lein migration-summary
Will give you a list of all tables + fields in the Metabase DB.
## Bootstrapping (for Development)
To quickly get your dev environment set up, use the `bootstrap` function to create a new User and Organization.
Open a REPL in Emacs or with `lein repl` and enter the following:
```clojure
(use 'metabase.db)
(setup-db)
(use 'metabase.bootstrap)
(bootstrap)
```
You'll be walked through the steps to get started.
## API Client (for Development)
You can make API calls from the REPL using `metabase.http-client`:
```clojure
(use 'metabase.http-client)
(defn cl [& args]
(-> (apply client {:email "crowberto@metabase.com", :password "squawk"} args)
clojure.pprint/pprint))
(cl :get "user/current")
;; -> {:email "crowbetro@metabase.com",
;; :first_name "Crowbero",
;; :last_login #inst "2015-03-13T22:55:05.390000000-00:00",
;; ...}
```
## Developing with Emacs
`.dir-locals.el` contains some Emacs Lisp that tells `clojure-mode` how to indent Metabase macros and which arguments are docstrings. Whenever this file is updated,
Emacs will ask you if the code is safe to load. You can answer `!` to save it as safe.
By default, Emacs will insert this code as a customization at the bottom of your `init.el`.
You'll probably want to tell Emacs to store customizations in a different file. Add the following to your `init.el`:
```emacs-lisp
(setq custom-file (concat user-emacs-directory ".custom.el")) ; tell Customize to save customizations to ~/.emacs.d/.custom.el
(ignore-errors ; load customizations from ~/.emacs.d/.custom.el
(load-file custom-file))
```
## Checking for Out-of-Date Dependencies
lein ancient # list all out-of-date dependencies
lein ancient latest lein-ring # list latest version of artifact lein-ring
Will give you a list of out-of-date dependencies.
Once's this repo is made public, this Clojars badge will work and show the status as well:
[![Dependencies Status](http://jarkeeper.com/metabase/metabase-init/status.png)](http://jarkeeper.com/metabase/metabase-init)
# Contributing
In general, we like to have an open issue for every pull request as a place to discuss the nature of any bug or proposed improvement. Each pull request should address a single issue, and contain both the fix as well as a description of how the pull request and tests that validate that the PR fixes the issue in question.
For significant feature additions, it is expected that discussion will have taken place in the attached issue. Any feature that requires a major decision to be reached will need to have an explicit design document written. The goals of this document are to make explicit the assumptions, constraints and tradeoffs any given feature implementation will contain. The point is not to generate documentation but to allow discussion to reference a specific proposed design and to allow others to consider the implications of a given design.
We don't like getting sued, so for every commit we require a Linux Kernel style developer certificate. If you agree to the below terms (from http://developercertificate.org/)
```
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```
Then you just add a line to every git commit message:
Signed-off-by: Helpful Contributor <helpful.contributor@email.com>
All contributions need to be signed with your real name.
## License
Copyright © 2015 Metabase, Inc
Distributed under the terms of the GNU Affero General Public License (AGPL) except as otherwise noted. See individual files for details.
# Before you start
This guide assumes you have a database you have access to and it is set up correctly. If not, please follow the instructions in the [Installation Guide](docs/INSTALLATION.md)
# Understanding what data you have
Initially, let's see what data you have available. The Explore section of the app allows you to see which tables you have available, look at all the rows in a given table, and drill down to individual rows.
* Click `explore`
* Note that all of your tables are there
* Click on one
* Note the pagination
* Try getting the next page
* Note that you can filter these pages
* Try to filter by a column
* if it’s a date
* if it’s a category
* Note that any IDs or Foreign keys are clickable
* Click on one
* Note that all fields are present
* We can click on any FKs
* Any urls are clickable
* Note the `Linked Entites` on the bottom
* Click on one of these and note that below are a bunch of that entities linked objects
# Asking a Question
When you have a specific question you are trying to answer, you can use the Card section of the application. Here you can ask a specific question of a given table of data you have. We'll start with the simplest possible question you can ask, "How many X are there?".
* Click `Cards`
* Click `Create New`
* Select a database
* if you only have a single database, this step happens automatically
* Select a table
* See the bare rows
* click run
* note that this allows you to see all of the rows in a table
* Select `total count`
# Saving a Question to a Dashboard
Assuming this is something you'll want to keep tabs on regularly, or share regularly, you can add it to a dashboard. Dashboards are collections of questions you have saved that you expect to look at as a group or that everyone in your organization can look at.
* Save
* Add it to a dashboard
* Give it a name
* Go to your newly created dashboard
* click `Dashboards`
* click your new dashboard
* Note that your card is there
\ No newline at end of file
# Application database
By default, Metabase uses an embedded database ([H2](http://www.h2database.com/)). If you want to use another database (for ease of administration, backup, or any other reason) you can inject the alternative database vis environment variables. For example
export MB_DB_TYPE=postgres
export MB_DB_DBNAME=metabase
export MB_DB_PORT=5432
export MB_DB_USER=username
export MB_DB_PASS=password
export MB_DB_HOST=localhost
java -jar metabase.jar
would run the application using a local postgres server instead of the default embedded database.
# Backing up
The application will create file named "metabase.db.h2.db" in the directory it is being run in. This can be backed up by either stopping the application server and backing up this file. Alternatively to backup the application data while it is running, you can follow the methods described at the relevant [H2 documentation](http://www.h2database.com/html/tutorial.html#upgrade_backup_restore)
# Database connection strings
If you need to access connections over SSL, you should set an environment variable MB_POSTGRES_SSL to true in the environment that you use to run the application, eg
MB_POSTGRES_SSL=true java -jar ./metabase.jar
# Scaling
Typically, you'll want to evaluate the application on any database you have access to. If you want to expose the application to other users, you should carefully consider how you access your database. In addition as the data sizes grow, there will be a number of options in how you should setup your overall analytics infrastructure.
## Starting out
It is typical to point this to a production database of a small application (or a large application with a small number of users). This typically works for periods before launch or when the database is either static, or has a small number of users (like internal applications or low volume but high value paid applications). Eventually, as usage of the Query Server grows, and the load on the production database increases a couple of things happen
* Expensive queries can slow down the database for production users
* The occasional scans (like on first installation) the Query Server runs to keep its internal representations of your database sync'd might add significant load
* Any recurring queries you run might start to add significant load
* You might need to import third party data for analysis, which typically should not live on your main database
At some point, you should separate out your main application database and your analytics database. There are a number of ways to do this.
## Read Replica
Assuming you do not need to do a lot of transformation or ingest lots of third party data sources, this can be a good stopgap to setting up a complete data/analytics infrastructure. For MySQL or Postgres, just set up a read replica and make sure to not let production application servers hit it for normal queries.
## Dedicated analytics database
Typically once enough data is in the system and/or the tranformation needs are complex enough, a dedicated analytics database is used. There are many options ranging from a normal general purpose database (MySQL, Postgres, SQL Server, etc), to a dedicated Analytics database (Vertica, Redshift, GreenPlum, Terredata, etc), the new generation of SQL on Hadoop databases (Spark, Presto) or NoSQL databases (Druid, Cassandra, etc).
Typically, once there is a dedicated analytics database or a datawarehouse, ETL processes become important. Learn more at See the [Data Warehouse Guide](docs/DATAWAREHOUSING.md).
# Database Drivers
Metabase currently has drivers for
* H2
* MySQL
* PostgreSQL
On our roadmap are
* [Druid](www.github.com/metabase/metabase-init/issues/X)
* [MongoDB](www.github.com/metabase/metabase-init/issues/X)
* [Presto](www.github.com/metabase/metabase-init/issues/X)
If you are interested in the status of any of these drivers, click through to the issues to see what work is being done. If you are interested in a driver to another database, please open an issue!
# Annotating Data
[Data Annotations](docs/ANNOTATIONS.md)
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment