Implementing COUNT(DISTINCT) with Fast Refresh Materialized Views
Author: Dallas Willett | 11 min read | December 24, 2010
Last week when we talked about how to implement a fast refresh materialized view with a MAX aggregate function. This week we will tackle another restriction on fast refresh materialized views. We will work through an example on how to modify a materialized view that includes a COUNT(DISTINCT) in order to get it to fast refresh.
An example COUNT(DISTINCT) query
One of the first SQL queries you probably learned to write was to get the number of rows using COUNT(*). If you want to get the number of distinct values from a set of rows, you’ll want to change this COUNT(*) into a COUNT(DISTINCT). Let’s code up an example with both COUNT(*) and COUNT(DISTINCT) using the same orders table that we used last week. Here is that table with some sample data.
CREATE TABLE orders ( order_id NUMBER(38) NOT NULL, customer_id NUMBER(38) NOT NULL, order_date DATE NOT NULL, state CHAR(1) NOT NULL, order_amount NUMBER(18,2) NULL, CONSTRAINT orders_pk PRIMARY KEY (order_id) ); SQL> SELECT * FROM orders ORDER BY TRUNC(order_date), customer_id; ORDER_ID CUSTOMER_ID ORDER_DAT S ORDER_AMOUNT ---------- ----------- --------- - ------------ 5 1010 06-DEC-10 C 42.42 3 1010 07-DEC-10 C 123.41 4 1010 07-DEC-10 C 55.01 7 1020 07-DEC-10 C 99.15 8 1020 07-DEC-10 C 74.85 2 1010 08-DEC-10 O 12.99 1 1010 08-DEC-10 C 5.05 6 1020 08-DEC-10 O 9 1030 08-DEC-10 O 13.33
For our COUNT(DISTINCT) example, let’s group by each day and get both the number of orders for that day as well as the number of distinct customers who placed orders on that day. We’ll use this example SQL query as the basis for our materialized view (mview).
SQL> SELECT TRUNC(order_date) as order_day , COUNT(*) as num_daily_orders , COUNT(DISTINCT customer_id) as num_daily_customers FROM orders GROUP BY TRUNC(order_date) ORDER BY TRUNC(order_date); ORDER_DAY NUM_DAILY_ORDERS NUM_DAILY_CUSTOMERS --------- ---------------- ------------------- 06-DEC-10 1 1 07-DEC-10 4 2 08-DEC-10 4 3
We can see in our example data that on December 8th, there were four orders placed by three different customers.
What happens when we try to use COUNT(DISTINCT) in a fast refresh materialized view?
Let’s try to create a fast refresh mview with the example query we came up with above. We will need a materialized view log (mlog) on the orders table, so let’s create that first and see what happens when we try to create the mview.
CREATE MATERIALIZED VIEW LOG on orders WITH PRIMARY KEY, ROWID, SEQUENCE (order_date, customer_id) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW orders_by_day_mv REFRESH FAST AS SELECT TRUNC(order_date) as order_day , COUNT(*) as num_daily_orders , COUNT(DISTINCT customer_id) as num_daily_customers FROM orders GROUP BY TRUNC(order_date); ERROR at line 7: ORA-12015: cannot create a fast refresh materialized view from a complex query
We get an ORA-12015 error, telling us that the mview cannot be created. Maybe DBMS_ADVISOR has something to offer us in terms of fixing this?
SQL> set serveroutput on SQL> DECLARE v_task_name varchar2(30); BEGIN DBMS_ADVISOR.TUNE_MVIEW(v_task_name,'CREATE MATERIALIZED VIEW orders_by_day_mv REFRESH FAST AS SELECT TRUNC(order_date) as order_day, COUNT(*) as num_daily_orders, COUNT(DISTINCT customer_id) as num_daily_customers FROM orders GROUP BY TRUNC(order_date)'); dbms_output.put_line(v_task_name); END; / TASK_433 PL/SQL procedure successfully completed. SELECT * FROM user_tune_mview WHERE task_name = 'TASK_433' ORDER BY action_id; TASK_NAME ACTION_ID SCRIPT_TYPE ------------------------------ ---------- -------------- STATEMENT -------------------------------------------------------------------------------- TASK_433 3 IMPLEMENTATION CREATE MATERIALIZED VIEW EXAMPLE.ORDERS_BY_DAY_MV REFRESH FAST WITH ROWID DISA BLE QUERY REWRITE AS SELECT EXAMPLE.ORDERS.ORDER_DATE C1, EXAMPLE.ORDERS.CUSTOME R_ID C2, COUNT(*) M1 FROM EXAMPLE.ORDERS GROUP BY EXAMPLE.ORDERS.ORDER_DATE, EXA MPLE.ORDERS.CUSTOMER_ID TASK_433 4 UNDO DROP MATERIALIZED VIEW EXAMPLE.ORDERS_BY_DAY_MV
What Oracle is basically telling us to do here is to remove the COUNT(DISTINCT) from our mview query and to change the GROUP BY clause from TRUNC(order_date) to a combination of order_date and customer_id. Not what we wanted, but somewhat helpful, as we will see later on.
It looks like we will be unable to implement a fast refresh mview that has a COUNT(DISTINCT) in it. So is this the end of the line? Are we stuck? No. We can create the fast refresh mview with the structure and view of the data the way we want, we’ll just need to find another way without the COUNT(DISTINCT).
A workaround for COUNT(DISTINCT) in a fast refresh materialized view
If we cannot create a fast refresh mview that includes a COUNT(DISTINCT), then perhaps we can rewrite the mview to get rid of the COUNT(DISTINCT). We still want the data in the same format that COUNT(DISTINCT) provided us, we just want to change the implementation. So how can we do this? The trick will be to create an mview on top of an mview.
For our two mview solution, we will want one mview that looks like the original SQL query. Let’s call this the “top” mview and this will be the mview that our end users will query. The “bottom” mview will look similar to the materialized view that DBMS_ADVISOR suggested. The two mviews will have different primary keys. The top mview will have a PK of order_day and the bottom mview will have a PK of order_day and customer_id. Notice how the bottom mview has the additional customer_id in it’s primary key and customer_id was also the column we were doing the COUNT(DISTINCT) on.
Let’s create this bottom mview first and name it “orders_by_day_and_cust_mv”. We’ll build the primary key for this mview based on the columns we are grouping by.
CREATE MATERIALIZED VIEW orders_by_day_and_cust_mv REFRESH FAST AS SELECT TRUNC(order_date) as order_day , customer_id , COUNT(*) as num_daily_customer_orders FROM orders GROUP BY TRUNC(order_date) , customer_id; ALTER TABLE orders_by_day_and_cust_mv ADD CONSTRAINT pk_orders_by_day_and_cust_mv PRIMARY KEY (order_day, customer_id);
The top mview will query this bottom mview we just created. We will implement the COUNT(*) from our original query as a SUM of num_daily_customer_orders and the COUNT(DISTINCT) as a COUNT of num_daily_customer_orders. The bottom mview will need a materialized view log as well, so we’ll create that first.
CREATE MATERIALIZED VIEW LOG on orders_by_day_and_cust_mv WITH PRIMARY KEY, ROWID, SEQUENCE (num_daily_customer_orders) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW orders_by_day_mv REFRESH FAST AS SELECT order_day , SUM(num_daily_customer_orders) as num_daily_orders , COUNT(num_daily_customer_orders) as num_daily_customers FROM orders_by_day_and_cust_mv GROUP BY order_day; SQL> SELECT * FROM orders_by_day_mv ORDER BY order_day; ORDER_DAY NUM_DAILY_ORDERS NUM_DAILY_CUSTOMERS --------- ---------------- ------------------- 07-DEC-10 1 1 08-DEC-10 4 2 09-DEC-10 4 3
So far so good. We have removed the COUNT(DISTINCT) from our mviews and the data looks exactly like the output of our example query. We missed something though, which causes us to get an error when we UPDATE some data and then try to refresh the mviews. Let’s see what that looks like and then we’ll fix that in a minute.
SQL> UPDATE orders SET order_date = order_date+1 WHERE customer_id = 1030; 1 row updated. SQL> COMMIT; Commit complete. SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv','FAST'); PL/SQL procedure successfully completed. SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST'); BEGIN DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST'); END; * ERROR at line 1: ORA-32314: REFRESH FAST of "EXAMPLE"."ORDERS_BY_DAY_MV" unsupported after deletes/updates ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2251 ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2457 ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2426 ORA-06512: at line 1
This ORA-32314 is easy enough to resolve. We just forgot to add an addition column because of the SUM aggregate function. Oracle requires us to have both COUNT(*) and COUNT(column_name) on our fast refresh mviews if we have SUM(column_name). In this example, we have SUM(num_daily_customer_orders), and had already added COUNT(num_daily_customer_orders), so let’s just add the missing COUNT(*) and try our UPDATE again.
CREATE MATERIALIZED VIEW orders_by_day_mv REFRESH FAST AS SELECT order_day , SUM(num_daily_customer_orders) as num_daily_orders , COUNT(num_daily_customer_orders) as num_daily_customers , COUNT(*) as prevent_errors1 FROM orders_by_day_and_cust_mv GROUP BY order_day; SQL> UPDATE orders SET order_date = order_date+1 WHERE customer_id = 1030; 1 row updated. SQL> COMMIT; Commit complete. SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv','FAST'); PL/SQL procedure successfully completed. SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST'); PL/SQL procedure successfully completed. SQL> SELECT * FROM orders_by_day_mv ORDER BY order_day; ORDER_DAY NUM_DAILY_ORDERS NUM_DAILY_CUSTOMERS PREVENT_ERRORS1 --------- ---------------- ------------------- --------------- 06-DEC-10 1 1 1 07-DEC-10 4 2 2 08-DEC-10 3 2 2 09-DEC-10 1 1 1
It works just the way we wanted. The only thing left to decide is how we are going to refresh both the top and bottom mviews together.
Your choice on how to fast refresh the two mview solution
We need a plan to refresh both mviews in our two mview approach to this problem. Last week when we talked about how to get an mview to fast refresh with a MAX aggregate function, we ended up creating a PL/SQL procedure to do our refresh. We had some custom code in that example, so a PL/SQL procedure was the way to go. We could use that same approach here, or we could use the built-in functionality in the DBMS_MVIEW.REFRESH procedure to handle this. The choice is yours.
PL/SQL procedure approach
This is fairly straight forward. We want to refresh the bottom mview before refreshing the top mview. This could be something as simple as:
CREATE OR REPLACE PROCEDURE ref_orders_by_day_mv AS BEGIN DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv','FAST'); DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST'); END; /
Or we might modify the procedure to optionally support a complete refresh of both mviews:
CREATE OR REPLACE PROCEDURE ref_orders_by_day_mv (p_refresh_mode IN VARCHR2 DEFAULT 'FAST') AS BEGIN DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv',p_refresh_mode); DBMS_MVIEW.REFRESH('orders_by_day_mv',p_refresh_mode); END; /
Nested refresh approach
Last week we were executing a MERGE statement inside our PL/SQL refresh procedure. In this example, we are using only the functionality of mviews as provided by Oracle, and Oracle has an optional parameter in the DBMS_MVIEW.REFRESH procedure that will refresh both mviews for us. If we supply a TRUE value to the “nested” parameter while executing a refresh of the top mview, Oracle will refresh the bottom mview first, followed by the top mview. Exactly what we want.
EXEC DBMS_MVIEW.REFRESH(list=>'orders_by_day_mv',method=>'FAST',nested=>TRUE);
A final look at the code
This code was tested on both 10g and 11g versions of Oracle. We started with the following materialized view definition that failed to create an mview.
CREATE MATERIALIZED VIEW LOG on orders WITH PRIMARY KEY, ROWID, SEQUENCE (order_date, customer_id) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW orders_by_day_mv REFRESH FAST AS SELECT TRUNC(order_date) as order_day , COUNT(*) as num_daily_orders , COUNT(DISTINCT customer_id) as num_daily_customers FROM orders GROUP BY TRUNC(order_date);
And we ended with solution that used two materialized views to remove the COUNT(DISTINCT) from our query without changing the final structure or data in the query. For the refresh of the mview, we changed our call to DBMS_MVIEW.REFRESH to pass in a TRUE value to the “nested” parameter.
CREATE MATERIALIZED VIEW LOG on orders WITH PRIMARY KEY, ROWID, SEQUENCE (order_date, customer_id) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW orders_by_day_and_cust_mv REFRESH FAST AS SELECT TRUNC(order_date) as order_day , customer_id , COUNT(*) as num_daily_customer_orders FROM orders GROUP BY TRUNC(order_date) , customer_id; ALTER TABLE orders_by_day_and_cust_mv ADD CONSTRAINT pk_orders_by_day_and_cust_mv PRIMARY KEY (order_day, customer_id); CREATE MATERIALIZED VIEW LOG on orders_by_day_and_cust_mv WITH PRIMARY KEY, ROWID, SEQUENCE (num_daily_customer_orders) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW orders_by_day_mv REFRESH FAST AS SELECT order_day , SUM(num_daily_customer_orders) as num_daily_orders , COUNT(num_daily_customer_orders) as num_daily_customers , COUNT(*) as prevent_errors1 FROM orders_by_day_and_cust_mv GROUP BY order_day; EXEC DBMS_MVIEW.REFRESH(list=>'orders_by_day_mv',method=>'FAST',nested=>TRUE);
Dallas Willett
Datavail – Remote DBA Services