Caching Expensive Queries with Materialized Views in PostgreSQL
PostgreSQL 9.3 introduced a new feature referred to as materialized views. This article attempts to explain when to use it (based on a contrived example).
Let’s assume you are working on a small database to keep track of your customers and the invoices you are sending to them. Currently, your database consists of the following three tables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
CREATE TABLE customers ( id INT PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE invoices ( id INT PRIMARY KEY, customer_id INT NOT NULL REFERENCES customers (id), date DATE NOT NULL ); CREATE TABLE invoice_items ( id INT PRIMARY KEY, invoice_id INT NOT NULL REFERENCES invoices (id), amount_in_cents INTEGER NOT NULL );
As you can see, a customer has many invoices, which in turn has many invoice items.
With that in place, you now want a convenient way to calculate your customers’ total revenue. After thinking about the requirement for a second or two, you come up with the following query:
1 2 3 4 5 6 7 8 9
SELECT customers.*, SUM(invoice_items.amount_in_cents) AS revenue FROM customers INNER JOIN invoices ON customers.id = invoices.customer_id INNER JOIN invoice_items ON invoices.id = invoice_items.invoice_id GROUP BY customers.id ORDER BY revenue DESC;
While this query certainly works, it is pretty tedious to type repeatedly and may also be too slow for your needs. The easiest way to navigate around both of these problems is a materialized view:
1 2 3 4 5 6 7 8 9 10
CREATE MATERIALIZED VIEW customers_with_revenue AS SELECT customers.*, SUM(invoice_items.amount_in_cents) AS revenue FROM customers INNER JOIN invoices ON customers.id = invoices.customer_id INNER JOIN invoice_items ON invoices.id = invoice_items.invoice_id GROUP BY customers.id ORDER BY revenue DESC;
Firstly, it’s very convenient to use, because it hides the complex query from you – just like an ordinary view would. Secondly, it’s also improving performance because it’s caching the query’s result.
How does it work? Well, when you create the materialized view, it’s actually
computing the given
SELECT statement immediately and persisting its result
set. Later on, when you select something from the view, it’s using that
pre-computed result set, instead of computing the defining
There’s only one drawback: If the underlying data changes, you have to remember to refresh the materialized view manually. This can be accomplished as follows:
REFRESH MATERIALIZED VIEW customers_with_revenue;
If your data doesn’t change very frequently, you might want to consider using a
TRIGGER to execute the refresh-statement automatically.
Software: PostgreSQL v9.3.4