Skip to content
Snippets Groups Projects
Unverified Commit 74208f7c authored by Natalie's avatar Natalie Committed by GitHub
Browse files

docs - countif (#29104)

parent 27ceae1d
No related branches found
No related tags found
No related merge requests found
......@@ -11,7 +11,7 @@ For an introduction to expressions, check out [Writing expressions in the notebo
- [Aggregations](#aggregations)
- [Average](#average)
- [Count](#count)
- [CountIf](#countif)
- [CountIf](./expressions/countif.md)
- [CumulativeCount](#cumulativecount)
- [CumulativeSum](#cumulativesum)
- [Distinct](#distinct)
......@@ -86,7 +86,7 @@ Syntax: `Count`
Example: `Count` If a table or result returns 10 rows, `Count` will return `10`.
### CountIf
### [CountIf](./expressions/countif.md)
Only counts rows where the condition is true.
......
---
title: CountIf
---
# CountIf
`CountIf` counts the total number of rows in a table that match a condition. `CountIf` counts every row, not just unique rows.
Syntax: `CountIf(condition)`.
Example: in the table below, `CountIf([Plan] = "Basic")` would return 3.
| ID | Plan |
|-----|-------------|
| 1 | Basic |
| 2 | Basic |
| 3 | Basic |
| 4 | Business |
| 5 | Premium |
> [Aggregations](../expressions-list.md#aggregations) like `CountIf` should be added to the query builder's [**Summarize** menu](../../query-builder/introduction.md#summarizing-and-grouping-by) > **Custom Expression** (scroll down in the menu if needed).
## Parameters
`CountIf` accepts a [function](../expressions-list.md#functions) or [conditional statement](../expressions.md#conditional-operators) that returns a boolean value (`true` or `false`).
## Multiple conditions
We'll use the following sample data to show you `CountIf` with [required](#required-conditions), [optional](#optional-conditions), and [mixed](#some-required-and-some-optional-conditions) conditions.
| ID | Plan | Active Subscription |
|-----|-------------| --------------------|
| 1 | Basic | true |
| 2 | Basic | true |
| 3 | Basic | false |
| 4 | Business | false |
| 5 | Premium | true |
### Required conditions
To count the total number of rows in a table that match multiple required conditions, combine the conditions using the `AND` operator:
```
CountIf(([Plan] = "Basic" AND [Active Subscription] = true))
```
This expression will return 2 on the sample data above (the total number of Basic plans that have an active subscription).
### Optional conditions
To count the total rows in a table that match multiple optional conditions, combine the conditions using the `OR` operator:
```
CountIf(([Plan] = "Basic" OR [Active Subscription] = true))
```
Returns 4 on the sample data: there are three Basic plans, plus one Premium plan has an active subscription.
### Some required and some optional conditions
To combine required and optional conditions, group the conditions using parentheses:
```
CountIf(([Plan] = "Basic" OR [Plan] = "Business") AND [Active Subscription] = "false")
```
Returns 2 on the sample data: there are only two Basic or Business plans that lack an active subscription.
> Tip: make it a habit to put parentheses around your `AND` and `OR` groups to avoid making required conditions optional (or vice versa).
## Conditional counts by group
In general, to get a conditional count for a category or group, such as the number of inactive subscriptions per plan, you'll:
1. Write a `CountIf` expression with your conditions.
2. Add a [**Group by**](../../query-builder/introduction.md#summarizing-and-grouping-by) column in the query builder.
Using the sample data:
| ID | Plan | Active Subscription |
|-----|-------------| --------------------|
| 1 | Basic | true |
| 2 | Basic | true |
| 3 | Basic | false |
| 4 | Business | false |
| 5 | Premium | true |
Count the total number of inactive subscriptions per plan:
```
CountIf([Active Subscription] = false)
```
Alternatively, if your **Active Subscription** column contains `null` (empty) values that represent inactive plans, you could use:
```
CountIf([Payment], [Plan] != true)
```
> The "not equal" operator `!=` should be written as !=.
To view your conditional counts by plan, set the **Group by** column to "Plan".
| Plan | Total Inactive Subscriptions |
|-----------|------------------------------|
| Basic | 1 |
| Business | 1 |
| Premium | 0 |
> Tip: when sharing your work with other people, it's helpful to use the `OR` filter, even though the `!=` filter is shorter. The inclusive `OR` filter makes it easier to understand which categories (e.g., plans) are included in your conditional count.
## Accepted data types
| [Data type](https://www.metabase.com/learn/databases/data-types-overview#examples-of-data-types) | Works with `CountIf` |
| ------------------------------------------------------------------------------------------------ | ------------------------- |
| String | ❌ |
| Number | ❌ |
| Timestamp | ❌ |
| Boolean | ✅ |
| JSON | ❌ |
`CountIf` accepts a [function](../expressions-list.md#functions) or [conditional statement](../expressions.md#conditional-operators) that returns a boolean value (`true` or `false`).
## Related functions
Different ways to do the same thing, because it's fun to try new things.
**Metabase**
- [case](#case)
- [CumulativeCount](#cumulativecount)
**Other tools**
- [SQL](#sql)
- [Spreadsheets](#spreadsheets)
- [Python](#python)
### case
You can combine [`Count`](../expressions-list.md#count) with [`case`](./case.md):
```
Count(case([Plan] = "Basic", [ID]))
```
to do the same thing as `CountIf`:
```
CountIf([Plan] = "Basic")
```
The `case` version lets you count a different column when the condition isn't met. For example, if you've got data from different sources:
| ID: Source A | Plan: Source A | ID: Source B | Plan: Source B |
|---------------|----------------|---------------| ---------------------|
| 1 | Basic | | |
| | | B | basic |
| | | C | basic |
| 4 | Business | D | business |
| 5 | Premium | E | premium |
To count the total number of Basic plans across both sources, you could create a `case` expression to:
- Count the rows in "ID: Source A" where "Plan: Source A = "Basic"
- Count the rows in "ID: Source B" where "Plan: Source B = "basic"
```
Count(case([Plan: Source A] = "Basic", [ID: Source A],
case([Plan: Source B] = "basic", [ID: Source B])))
```
### CumulativeCount
`CountIf` doesn't do running counts. You'll need to combine [CumulativeCount](../expressions-list.md#cumulativecount) with [`case`](./case.md).
If our sample data is a time series:
| ID | Plan | Active Subscription | Created Date |
|-----|-------------| --------------------|------------------|
| 1 | Basic | true | October 1, 2020 |
| 2 | Basic | true | October 1, 2020 |
| 3 | Basic | false | October 1, 2020 |
| 4 | Business | false | November 1, 2020 |
| 5 | Premium | true | November 1, 2020 |
And we want to get the running count of active plans like this:
| Created Date: Month | Total Active Plans to Date |
|---------------------|----------------------------|
| October 2020 | 2 |
| November 2020 | 3 |
Create an aggregation from **Summarize** > **Custom expression**:
```
CumulativeCount(case([Active Subscription] = true, [ID]))
```
You'll also need to set the **Group by** column to "Created Date: Month".
### SQL
When you run a question using the [query builder](https://www.metabase.com/glossary/query_builder), Metabase will convert your query builder settings (filters, summaries, etc.) into a SQL query, and run that query against your database to get your results.
If our [sample data](#multiple-conditions) is stored in a PostgreSQL database, the SQL query:
```sql
SELECT COUNT(CASE WHEN plan = "Basic" THEN id END) AS total_basic_plans
FROM accounts
```
is equivalent to the Metabase expression:
```
CountIf([Plan] = "Basic")
```
If you want to get [conditional counts broken out by group](#conditional-counts-by-group), the SQL query:
```sql
SELECT
plan,
COUNT(CASE WHEN active_subscription = false THEN id END) AS total_inactive_subscriptions
FROM accounts
GROUP BY
plan
```
The `SELECT` part of the SQl query matches the Metabase expression:
```
CountIf([Active Subscription] = false)
```
The `GROUP BY` part of the SQL query matches a Metabase [**Group by**](../../query-builder/introduction.md#summarizing-and-grouping-by) set to the "Plan" column.
### Spreadsheets
If our [sample data](#multiple-conditions) is in a spreadsheet where "ID" is in column A, the spreadsheet formula:
```
=CountIf(B:B, "Basic")
```
produces the same result as the Metabase expression:
```
CountIf([Plan] = "Basic")
```
### Python
If our [sample data](#multiple-conditions) is in a `pandas` dataframe column called `df`, the Python code:
```python
len(df[df['Plan'] == "Basic"])
```
uses the same logic as the Metabase expression:
```
CountIf([Plan] = "Basic")
```
To get a [conditional count with a grouping column](#conditional-counts-by-group):
```python
## Add your conditions
df_filtered = df[df['Active subscription'] == false]
## Group by a column, and count the rows within each group
len(df_filtered.groupby('Plan'))
```
The Python code above will produce the same result as the Metabase `CountIf` expression (with the [**Group by**](../../query-builder/introduction.md#summarizing-and-grouping-by) column set to "Plan").
```
CountIf([Active Subscription] = false)
```
## Further reading
- [Custom expressions documentation](../expressions.md)
- [Custom expressions tutorial](https://www.metabase.com/learn/questions/custom-expressions)
......@@ -22,11 +22,13 @@ Example: in the table below, `SumIf([Payment], [Plan] = "Basic")` would return 2
## Parameters
- `column` can be the name of a numeric column, or an expression that returns a numeric column.
- `condition` is an expression that returns a boolean value (`true` or `false`), like the expression `[Payment] > 100`.
- `column` can be the name of a numeric column, or a [function](../expressions-list.md#functions) that returns a numeric column.
- `condition` is a [function](../expressions-list.md#functions) or [conditional statement](../expressions.md#conditional-operators) that returns a boolean value (`true` or `false`), like the conditional statement `[Payment] > 100`.
## Multiple conditions
We'll use the following sample data to show you `SumIf` with [required](#required-conditions), [optional](#optional-conditions), and [mixed](#some-required-and-some-optional-conditions) conditions.
| Payment | Plan | Date Received |
|----------|-------------| ------------------|
| 100 | Basic | October 1, 2020 |
......@@ -35,15 +37,19 @@ Example: in the table below, `SumIf([Payment], [Plan] = "Basic")` would return 2
| 200 | Business | November 1, 2020 |
| 400 | Premium | November 1, 2020 |
To sum a column based on multiple _mandatory_ conditions, combine the conditions using the `AND` operator:
### Required conditions
To sum a column based on multiple required conditions, combine the conditions using the `AND` operator:
```
SumIf([Payment], ([Plan] = "Basic" AND month([Date Received]) = 10))
```
This expression would return 200 on the sample data above, as it sums all of the payments received for Basic Plans in October.
This expression would return 200 on the sample data above: the sum of all of the payments received for Basic Plans in October.
To sum a column with multiple _optional_ conditions, combine the conditions using the `OR` operator:
### Optional conditions
To sum a column with multiple optional conditions, combine the conditions using the `OR` operator:
```
SumIf([Payment], ([Plan] = "Basic" OR [Plan] = "Business"))
......@@ -51,7 +57,9 @@ SumIf([Payment], ([Plan] = "Basic" OR [Plan] = "Business"))
Returns 600 on the sample data.
To combine mandatory and optional conditions, group the conditions using parentheses:
### Some required and some optional conditions
To combine required and optional conditions, group the conditions using parentheses:
```
SumIf([Payment], ([Plan] = "Basic" OR [Plan] = "Business") AND month([Date Received]) = 10)
......@@ -59,7 +67,7 @@ SumIf([Payment], ([Plan] = "Basic" OR [Plan] = "Business") AND month([Date Recei
Returns 400 on the sample data.
> Tip: make it a habit to put parentheses around your `AND` and `OR` groups to avoid making mandatory conditions optional (or vice versa).
> Tip: make it a habit to put parentheses around your `AND` and `OR` groups to avoid making required conditions optional (or vice versa).
## Conditional subtotals by group
......@@ -85,10 +93,10 @@ SumIf([Payment], [Plan] = "Business" OR [Plan] = "Premium")
Or, sum payments for all plans that aren't "Basic":
```
{% raw %}SumIf([Payment], [Plan] != "Basic"){% endraw %}
SumIf([Payment], [Plan] != "Basic")
```
> The "not equal" operator `!=` should be written as "!=".
> The "not equal" operator `!=` should be written as !=.
To view those payments by month, set the **Group by** column to "Date Received: Month".
......@@ -106,11 +114,15 @@ To view those payments by month, set the **Group by** column to "Date Received:
| String | ❌ |
| Number | ✅ |
| Timestamp | ❌ |
| Boolean | |
| Boolean | |
| JSON | ❌ |
See [parameters](#parameters).
## Related functions
Different ways to do the same thing, because CSV files still make up 40% of the world's data.
**Metabase**
- [case](#case)
- [CumulativeSum](#cumulativesum)
......@@ -122,13 +134,13 @@ To view those payments by month, set the **Group by** column to "Date Received:
### case
You can combine the `Sum` and [`case`](./case.md) formulas
You can combine [`Sum`](../expressions-list.md#sum) and [`case`](./case.md):
```
Sum(case([Plan] = "Basic", [Payment]))
```
to do the same thing as the `SumIf` formula:
to do the same thing as `SumIf`:
```
SumIf([Payment], [Plan] = "Basic")
......@@ -164,9 +176,9 @@ Don't forget to set the **Group by** column to "Date Received: Month".
### SQL
When you run a question using the [query builder](https://www.metabase.com/glossary/query_builder), Metabase will convert your graphical query settings (filters, summaries, etc.) into a query, and run that query against your database to get your results.
When you run a question using the [query builder](https://www.metabase.com/glossary/query_builder), Metabase will convert your query builder settings (filters, summaries, etc.) into a SQL query, and run that query against your database to get your results.
If our [payment sample data](#sumif) is stored in a PostgreSQL database:
If our [payment sample data](#sumif) is stored in a PostgreSQL database, the SQL query:
```sql
SELECT
......@@ -174,13 +186,13 @@ SELECT
FROM invoices
```
is equivalent to the Metabase `SumIf` expression:
is equivalent to the Metabase expression:
```
SumIf([Payment], [Plan] = "Basic")
```
To add [multiple conditions with a grouping column](#conditional-subtotals-by-group):
To add [multiple conditions with a grouping column](#conditional-subtotals-by-group), use the SQL query:
```sql
SELECT
......@@ -191,23 +203,23 @@ GROUP BY
DATE_TRUNC("month", date_received)
```
The SQL `SELECT` statement matches the Metabase `SumIf` expression:
The `SELECT` part of the SQl query matches the Metabase `SumIf` expression:
```
SumIf([Payment], [Plan] = "Business" OR [Plan] = "Premium")
```
The SQL `GROUP BY` statement maps to a Metabase [**Group by**](../../query-builder/introduction.md#summarizing-and-grouping-by) column set to "Date Received: Month".
The `GROUP BY` part of the SQL query maps to a Metabase [**Group by**](../../query-builder/introduction.md#summarizing-and-grouping-by) column set to "Date Received: Month".
### Spreadsheets
If our [payment sample data](#sumif) is in a spreadsheet where "Payment" is in column A and "Date Received" is in column B:
If our [payment sample data](#sumif) is in a spreadsheet where "Payment" is in column A and "Date Received" is in column B, the spreadsheet formula:
```
=SUMIF(B:B, "Basic", A:A)
```
produces the same result as:
produces the same result as the Metabase expression:
```
SumIf([Payment], [Plan] = "Basic")
......@@ -217,13 +229,13 @@ To add additional conditions, you'll need to switch to a spreadsheet **array for
### Python
If our [payment sample data](#sumif) is in a `pandas` dataframe column called `df`:
If our [payment sample data](#sumif) is in a `pandas` dataframe column called `df`, the Python code:
```python
df.loc[df['Plan'] == "Basic", 'Payment'].sum()
```
is equivalent to
is equivalent to the Metabase expression:
```
SumIf([Payment], [Plan] = "Basic")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment