Christoph Schiessl's Blog

Caching Expensive Queries with Materialized Views in PostgreSQL

PostgreSQL 9.3 introduced a new feature referred to as materialized views. This article attempts to explain when to use it (based on a contrived example).


Let’s assume you are working on a small database to keep track of your customers and the invoices you are sending to them. Currently, your database consists of the following three tables:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
CREATE TABLE customers (
  id INT PRIMARY KEY,
  name TEXT NOT NULL
);
CREATE TABLE invoices (
  id INT PRIMARY KEY,
  customer_id INT NOT NULL REFERENCES customers (id),
  date DATE NOT NULL
);
CREATE TABLE invoice_items (
  id INT PRIMARY KEY,
  invoice_id INT NOT NULL REFERENCES invoices (id),
  amount_in_cents INTEGER NOT NULL
);

As you can see, a customer has many invoices, which in turn has many invoice items.

With that in place, you now want a convenient way to calculate your customers’ total revenue. After thinking about the requirement for a second or two, you come up with the following query:

1
2
3
4
5
6
7
8
9
SELECT
  customers.*,
  SUM(invoice_items.amount_in_cents) AS revenue
FROM customers
  INNER JOIN invoices
    ON customers.id = invoices.customer_id
  INNER JOIN invoice_items
    ON invoices.id = invoice_items.invoice_id
GROUP BY customers.id ORDER BY revenue DESC;

While this query certainly works, it is pretty tedious to type repeatedly and may also be too slow for your needs. The easiest way to navigate around both of these problems is a materialized view:

1
2
3
4
5
6
7
8
9
10
CREATE MATERIALIZED VIEW customers_with_revenue AS
  SELECT
    customers.*,
    SUM(invoice_items.amount_in_cents) AS revenue
  FROM customers
    INNER JOIN invoices
      ON customers.id = invoices.customer_id
    INNER JOIN invoice_items
      ON invoices.id = invoice_items.invoice_id
  GROUP BY customers.id ORDER BY revenue DESC;

Firstly, it’s very convenient to use, because it hides the complex query from you – just like an ordinary view would. Secondly, it’s also improving performance because it’s caching the query’s result.

How does it work? Well, when you create the materialized view, it’s actually computing the given SELECT statement immediately and persisting its result set. Later on, when you select something from the view, it’s using that pre-computed result set, instead of computing the defining SELECT-query again and again.

There’s only one drawback: If the underlying data changes, you have to remember to refresh the materialized view manually. This can be accomplished as follows:

1
REFRESH MATERIALIZED VIEW customers_with_revenue;

If your data doesn’t change very frequently, you might want to consider using a TRIGGER to execute the refresh-statement automatically.


Notes
Software: PostgreSQL v9.3.4

comments powered by Disqus