# 4. Filtering Data
---[SIDEBAR]---
"type":"objectives",
"title":"Learning Objectives",
"text":"After completing this lesson you can:
---[SIDEBAR]---
## Using the WHERE Clause
All the the example queries you have encountered to this point in this resource have retrieved all of the rows from a database table. Usually, you will not need all of the rows from a table to meet your data needs. It is possible to create a SELECT statement that retrieves specific rows from a database table. These statements require the WHERE clause.
---[SIDEBAR]---
"type":"term",
"title":"WHERE",
"text":"The clause in a SELECT statement that is used to filter the rows retrieved by a query."
---[SIDEBAR]---
The WHERE clause is used to define a condition or set of conditions that must be met for a row to be returned in a query result. These criteria are generally designed as a logical test that compares values for a column on a table to target values. Those values that meet the conditions are returned in the query result. Suppose you want an alphabetized of the artists, and their nationality, who were born in the year 1861. A SELECT statement that retrieves this data would be:
*SELECT full_name, nationality
FROM artist
WHERE birth=1861
ORDER BY full_name*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas|visible
owner:atlas-query
result:museum-name
---[SQL]---
SELECT full_name, nationality
FROM artist
WHERE birth = 1861
ORDER BY full_name;
---[SQL]---
The query returns an alphabetized list of the artists and their nationality. There are nine artists retrieved by this query, even though there are 421 artists on the artist table. The nine artists retrieved all meet the condition set by the WHERE clause that the artist was born in 1861.
---[SIDEBAR]---
"type":"note",
"title":"The Position of the WHERE Clause",
"text":"The WHERE clause must appear third in the SELECT statement. Since the SELECT and FROM clauses are required in all SELECT statements, they appear first and second. All other clauses appear after the WHERE clause (notice the position of the ORDER BY clause in the example query above)."
---[SIDEBAR]---
Consider the syntax of the WHERE clause. Generally a WHERE clause creates a logical test comparing values using a mathematical operator. In this example, the WHERE clause uses a logical test comparing the values in the birth column of the artist table with the value 1861. Those values on the table that are equal to 1861 meet the condition and will be retrieved by the query. Those that do not meet the condition are excluded from the query result. Only nine of the 421 artists in the table.
## The WHERE Clause Operators
You can use a number of different mathematical operators to compare values in the logical tests in the WHERE clause. Generally, these are constructed to compare the values in a column on the left side of an operator to a value on the right side (column = value). The example above uses the "=" operator, which requires that the values on the right side of the operator (in this case the values in the birth column) exactly match the value on the right side (the year 1861). The table below describes operators that can be used to compare values in the WHERE clause.
#### Where Clause Operators
| Operator | Description | Example |
|:--------:|--------------------------------------------------------------------------------|-------------------------------|
| = | The values in the specified column match a stated value | birth = 1861 |
| <> | The values in the specified column do not match a stated value | birth <> 1861 |
| != | The values in the specified column do not match a stated value | birth != 1861 |
| < | The values in the specified column are less than a stated value | birth < 1861 |
| <= | The values in the specified column are less than or equal to a stated value | birth <= 1861 |
| !< | The values in the specified column are not less than a stated value | birth !< 1861 |
| > | The values in the specified column are greater than a stated value | birth > 1861 |
| >= | The values in the specified column are greater than or equal to a stated value | birth >= 1861 |
| !> | The values in the specified column are not greater than a stated value | birth !> 1861 |
| BETWEEN | The values in the specified column are within the range of two stated values | birth BETWEEN 1860 AND 1861 |
| IS NULL | The values in the specified column are NULL (without data) | birth IS NULL |
## Checking Against a Single Value
The most basic WHERE clauses compare the values in a column to a single value. This is demonstrated in the last example. In the example, the value in the birth column had be 1861 for the row to be included in the query result. Consider two more examples using different operators.
Assume you want to artists, their nationality, and their birth for artists born before 1400. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE birth < 1400*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-birth-before-1400
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
where birth < 1400;
---[SQL]---
The query retrieves the name, nationality, and birth for artists born before 1400. There is only one artist born before 1400. Note that for an artist to be born before 1400, their birth year would be less than 1400.
You may want to also modify the last query to include any artists that were born in 1400 as well as those born before 1400. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE birth <= 1400*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-before-or-on-1400
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
where birth <= 1400;
---[SQL]---
In this query, the operator in the WHERE clause was modified include those artists born in 1400. The <= operator will select those artists born during or before 1400. Since there is an artist born in 1400 and an artist born before 1400, there are now two artists retrieved by the query.
## Checking for Nonmatches
In the previous examples, you have applied filters that select data that meet desired conditions. Sometimes it is easier to define the data you need by describing the values that should be excluded. For example, you may want to list the all artists who are not French. If you had to set up the filter only using matches, you would need to know all of the nationalities in the data and add conditions to include each nationality that isn't French. If there are French, English, and Canadian artists; the where clause would need conditions to include the English and Canadian artists and not the French. Imagine how difficult this would be if there are artists from 100 countries in the list. The WHERE clause would also need to be modified when artists with a new nationality is added to the query.
In this case, it is much easier to create a criterion that excludes the French artists. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE nationality <> 'French'*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-sorted
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
WHERE nationality <> 'French';
---[SQL]---
The condition in the WHERE clauses uses the <> operator. This means that when the value in the nationality column is not French, it will be retrieved by the query. There are 306 artists with 17 different nationalities who are not French. Notice the single quotes around the word French in the WHERE clause. Anytime the value you wish to compare is text, it must be enclosed within single quotes. Quotes are not used when the values are numbers.
---[SIDEBAR]---
"type":"note",
"title":"When to Use Single Quotes in a Condition",
"text":"When a value used in a condition in the WHERE clause is not a number, it must be enclosed within single quotes."
---[SIDEBAR]---
## Checking for a Range of Values
There are times that you want to retrieve data within a range of values. To do this, you use the BETWEEN and AND keywords in place of operators in the WHERE clause. For this condition to work, you need beginning and ending values for the range. When you use the BETWEEN and AND keywords, the DBMS will include data that matches the beginning and ending values. In other words, the condition BETWEEN 10 AND 20 is logically the same as the combined conditions >=10 *and* <=20.
---[SIDEBAR]---
"type":"note",
"title":"BETWEEN and AND",
"text":"The logical keywords BETWEEN and AND include the beginning and ending values. The condition BETWEEN 45 AND 50 is logically the same as the combined conditions >=45 *and* <=50"
---[SIDEBAR]---
Suppose that you want list the artists, their nationality, and their birth who were born between 1400 and 1500. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE birth BETWEEN 1400 AND 1500*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-sorted
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
where birth BETWEEN 1400 and 1500;
---[SQL]---
This example uses the keywords BETWEEN and AND to define a criterion that includes values for the birth column that fall in the range of 1400 to 1500. The query returns 17 artists whose birth is in years within this range. The query will include the beginning and ending values for the birth column. In other words, it will retrieve artists whose birth falls in 1400 or 1500 if there are any.
## Checking for No Value
There are times when some of the entries in a column contain no values. There is a special term for this situation, that entry is said to contain a NULL value. A NULL value is not 0, a space, or even an blank block of text. It is just the absence of data.
---[SIDEBAR]---
"type":"term",
"title":"NULL",
"text":"When an entry for a column of data is empty, it is called a NULL value. NULL values are not 0, spaces, or even an blank collection of text. It is the absence of data for that column."
---[SIDEBAR]---
When creating a criterion to filter for NULL values, you need special syntax. The logical statement column = NULL will not work. To check if a value is NULL, use the keywords IS NULL.
Suppose that you want list the artist's first and middle names for those entries on the artist table with a NULL value for the artist's middle names. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE middle_names IS NULL*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-sorted
---[SQL]---
SELECT full_name, middle_names
FROM artist
where middle_names IS NULL
---[SQL]---
This query lists all of the artists without middle names. Notice that 273 of the 421 artists on the artist table meet this criteria. NULL values do not always behave as you expect they might. For example, if you were to create a query to list all artists whose middle name is not "Van", you would expect to see all of the artists with a middle name other than "Van" - including those artists without a middle name. NULL values don't work that way, though. NULL values are not retrieved when you filter for non-matches. The syntax for this query is:
*SELECT full_name, middle_names
FROM artist
WHERE middle_names <> 'Van'*
---[SIDEBAR]---
"type":"objectives",
"title":"Learning Objectives",
"text":"After completing this lesson you can:
- Use the WHERE clause of a SELECT statement to filter the results of a query.
- Filter the rows of a query result to match against a single value.
- Filter the rows of a query result to look for non-matched values.
- Filter the rows of a query result to check against a range of values.
- Filter the rows of a query result find rows without a value.
---[SIDEBAR]---
## Using the WHERE Clause
All the the example queries you have encountered to this point in this resource have retrieved all of the rows from a database table. Usually, you will not need all of the rows from a table to meet your data needs. It is possible to create a SELECT statement that retrieves specific rows from a database table. These statements require the WHERE clause.
---[SIDEBAR]---
"type":"term",
"title":"WHERE",
"text":"The clause in a SELECT statement that is used to filter the rows retrieved by a query."
---[SIDEBAR]---
The WHERE clause is used to define a condition or set of conditions that must be met for a row to be returned in a query result. These criteria are generally designed as a logical test that compares values for a column on a table to target values. Those values that meet the conditions are returned in the query result. Suppose you want an alphabetized of the artists, and their nationality, who were born in the year 1861. A SELECT statement that retrieves this data would be:
*SELECT full_name, nationality
FROM artist
WHERE birth=1861
ORDER BY full_name*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas|visible
owner:atlas-query
result:museum-name
---[SQL]---
SELECT full_name, nationality
FROM artist
WHERE birth = 1861
ORDER BY full_name;
---[SQL]---
The query returns an alphabetized list of the artists and their nationality. There are nine artists retrieved by this query, even though there are 421 artists on the artist table. The nine artists retrieved all meet the condition set by the WHERE clause that the artist was born in 1861.
---[SIDEBAR]---
"type":"note",
"title":"The Position of the WHERE Clause",
"text":"The WHERE clause must appear third in the SELECT statement. Since the SELECT and FROM clauses are required in all SELECT statements, they appear first and second. All other clauses appear after the WHERE clause (notice the position of the ORDER BY clause in the example query above)."
---[SIDEBAR]---
Consider the syntax of the WHERE clause. Generally a WHERE clause creates a logical test comparing values using a mathematical operator. In this example, the WHERE clause uses a logical test comparing the values in the birth column of the artist table with the value 1861. Those values on the table that are equal to 1861 meet the condition and will be retrieved by the query. Those that do not meet the condition are excluded from the query result. Only nine of the 421 artists in the table.
## The WHERE Clause Operators
You can use a number of different mathematical operators to compare values in the logical tests in the WHERE clause. Generally, these are constructed to compare the values in a column on the left side of an operator to a value on the right side (column = value). The example above uses the "=" operator, which requires that the values on the right side of the operator (in this case the values in the birth column) exactly match the value on the right side (the year 1861). The table below describes operators that can be used to compare values in the WHERE clause.
#### Where Clause Operators
| Operator | Description | Example |
|:--------:|--------------------------------------------------------------------------------|-------------------------------|
| = | The values in the specified column match a stated value | birth = 1861 |
| <> | The values in the specified column do not match a stated value | birth <> 1861 |
| != | The values in the specified column do not match a stated value | birth != 1861 |
| < | The values in the specified column are less than a stated value | birth < 1861 |
| <= | The values in the specified column are less than or equal to a stated value | birth <= 1861 |
| !< | The values in the specified column are not less than a stated value | birth !< 1861 |
| > | The values in the specified column are greater than a stated value | birth > 1861 |
| >= | The values in the specified column are greater than or equal to a stated value | birth >= 1861 |
| !> | The values in the specified column are not greater than a stated value | birth !> 1861 |
| BETWEEN | The values in the specified column are within the range of two stated values | birth BETWEEN 1860 AND 1861 |
| IS NULL | The values in the specified column are NULL (without data) | birth IS NULL |
## Checking Against a Single Value
The most basic WHERE clauses compare the values in a column to a single value. This is demonstrated in the last example. In the example, the value in the birth column had be 1861 for the row to be included in the query result. Consider two more examples using different operators.
Assume you want to artists, their nationality, and their birth for artists born before 1400. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE birth < 1400*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-birth-before-1400
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
where birth < 1400;
---[SQL]---
The query retrieves the name, nationality, and birth for artists born before 1400. There is only one artist born before 1400. Note that for an artist to be born before 1400, their birth year would be less than 1400.
You may want to also modify the last query to include any artists that were born in 1400 as well as those born before 1400. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE birth <= 1400*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-before-or-on-1400
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
where birth <= 1400;
---[SQL]---
In this query, the operator in the WHERE clause was modified include those artists born in 1400. The <= operator will select those artists born during or before 1400. Since there is an artist born in 1400 and an artist born before 1400, there are now two artists retrieved by the query.
## Checking for Nonmatches
In the previous examples, you have applied filters that select data that meet desired conditions. Sometimes it is easier to define the data you need by describing the values that should be excluded. For example, you may want to list the all artists who are not French. If you had to set up the filter only using matches, you would need to know all of the nationalities in the data and add conditions to include each nationality that isn't French. If there are French, English, and Canadian artists; the where clause would need conditions to include the English and Canadian artists and not the French. Imagine how difficult this would be if there are artists from 100 countries in the list. The WHERE clause would also need to be modified when artists with a new nationality is added to the query.
In this case, it is much easier to create a criterion that excludes the French artists. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE nationality <> 'French'*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-sorted
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
WHERE nationality <> 'French';
---[SQL]---
The condition in the WHERE clauses uses the <> operator. This means that when the value in the nationality column is not French, it will be retrieved by the query. There are 306 artists with 17 different nationalities who are not French. Notice the single quotes around the word French in the WHERE clause. Anytime the value you wish to compare is text, it must be enclosed within single quotes. Quotes are not used when the values are numbers.
---[SIDEBAR]---
"type":"note",
"title":"When to Use Single Quotes in a Condition",
"text":"When a value used in a condition in the WHERE clause is not a number, it must be enclosed within single quotes."
---[SIDEBAR]---
## Checking for a Range of Values
There are times that you want to retrieve data within a range of values. To do this, you use the BETWEEN and AND keywords in place of operators in the WHERE clause. For this condition to work, you need beginning and ending values for the range. When you use the BETWEEN and AND keywords, the DBMS will include data that matches the beginning and ending values. In other words, the condition BETWEEN 10 AND 20 is logically the same as the combined conditions >=10 *and* <=20.
---[SIDEBAR]---
"type":"note",
"title":"BETWEEN and AND",
"text":"The logical keywords BETWEEN and AND include the beginning and ending values. The condition BETWEEN 45 AND 50 is logically the same as the combined conditions >=45 *and* <=50"
---[SIDEBAR]---
Suppose that you want list the artists, their nationality, and their birth who were born between 1400 and 1500. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE birth BETWEEN 1400 AND 1500*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-sorted
---[SQL]---
SELECT full_name, nationality, birth
FROM artist
where birth BETWEEN 1400 and 1500;
---[SQL]---
This example uses the keywords BETWEEN and AND to define a criterion that includes values for the birth column that fall in the range of 1400 to 1500. The query returns 17 artists whose birth is in years within this range. The query will include the beginning and ending values for the birth column. In other words, it will retrieve artists whose birth falls in 1400 or 1500 if there are any.
## Checking for No Value
There are times when some of the entries in a column contain no values. There is a special term for this situation, that entry is said to contain a NULL value. A NULL value is not 0, a space, or even an blank block of text. It is just the absence of data.
---[SIDEBAR]---
"type":"term",
"title":"NULL",
"text":"When an entry for a column of data is empty, it is called a NULL value. NULL values are not 0, spaces, or even an blank collection of text. It is the absence of data for that column."
---[SIDEBAR]---
When creating a criterion to filter for NULL values, you need special syntax. The logical statement column = NULL will not work. To check if a value is NULL, use the keywords IS NULL.
Suppose that you want list the artist's first and middle names for those entries on the artist table with a NULL value for the artist's middle names. The SELECT statement to retrieve this data is:
*SELECT full_name, nationality, birth
FROM artist
WHERE middle_names IS NULL*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-sorted
---[SQL]---
SELECT full_name, middle_names
FROM artist
where middle_names IS NULL
---[SQL]---
This query lists all of the artists without middle names. Notice that 273 of the 421 artists on the artist table meet this criteria. NULL values do not always behave as you expect they might. For example, if you were to create a query to list all artists whose middle name is not "Van", you would expect to see all of the artists with a middle name other than "Van" - including those artists without a middle name. NULL values don't work that way, though. NULL values are not retrieved when you filter for non-matches. The syntax for this query is:
*SELECT full_name, middle_names
FROM artist
WHERE middle_names <> 'Van'*
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:museum-name-sorted
---[SQL]---
SELECT full_name, middle_names
FROM artist
where middle_names <> 'Van'
---[SQL]---
Notice that all of the artists retrieved by the query have middle names. None of the artists with a null value for the middle_names column are included in the query result, even though those artist's middle name is also not "Van".
---[SIDEBAR]---
"type":"caution",
"title":"NULL Values and Non-Matches",
"text":"NULL values are not retrieved when you filter for non-matches using the <> or != operators."
---[SIDEBAR]---
## Summary
In this lesson, you learned how to use the WHERE clause to filter the data retrieved by a SELECT statement. In particular, you learned how to filter the rows of a query result to match against a single value, filter the rows of a query result to look for non-matched values, filter the rows of a query result to check against a range of values, and filter the rows of a query result find rows without a value.