This is a me too post inspired by a post by BarneyB. It drives me nuts the number of weird APIs that exist to “abstract” SQL when SQL is already one of the most elegant and widely known DSL’s around.
OK, usually I’d just cheer from the sidelines/comments thread, but I might actually be able to add some value here by fleshing out Barney’s point about the Structured aspect of Structured Query Language. The natural home for Barney’s Lego pieces is in SQL views (unless you’re using an old version of MySQL, in which case – sorry). How to best organize views? I define these sorts of objects I might use in a query:
- A base table
- A view that embodies a business rule.
- A view that pulls some data for display
And the rules are these:
- Business rule views must return only primary keys
- Data views must not be re-used by other views
Plus a couple of supporting observations regarding performance. What reallly stumps the query optimizer (at least for SQL Server) is:
- A large number of joins in a single statement. The optimizer copes well with very deeply nested views, as long as each view is relatively small (3-4 joins).
- Text data – whether as join criteria, in WHERE clauses, or even just in the SELECT list
Time for a fully worked example. I won’t write out the DDL, but hopefully you can fill in the blanks. Let’s say we have an access control system with these entities:
Person
Group
GroupMember (joins Person with Group)
MembershipStatus (pending, denied or approved)
Just to add spice, we’re using soft deletes for the Person table, so we have a deleted flag as well. Our task is to write a query that will give us the email address of all the people whose membership has been approved.
First, we need to sort out the soft deletes:
create view vwActivePerson
as
select
personID
from
Person
where
deleted = 0
Approved memberships look like this:
create view vwApprovedMemberships as
select
personID,
groupID
from
GroupMembership
inner join Status
on GroupMembership.statusID = Status.statusID
where
Status.code = "approved"
We use the text status code here so we have a human-readable query, but we limit its damage by only using it in this one view. Every other query that needs this concept simply joins to this view. So this is a business-rule view that defines the concept of an approved membership.
Finally, to pull our email addresses we do this:
create view vwData_approvedMembershipEmails
as
select
Person.email
from
Person
inner join vwApprovedMemberships
on Person.personID = vwApprovedMemberships.personID
inner join vwActivePerson
on Person.personID = vwActivePerson.personID
Ok, seeing as I’m having such fun with the ul tag, here’s a list of things to note about this query:
- There’s no WHERE clause. WHERE clauses embodying business rules often end up sprinkled around ad hoc query text, but we’ve avoided this by using business-rule views as filtering mechanisms. In SQL terms, I’m setting up a bunch of set intersections.
- Similarly, note that no fields from the business-rule views appear in the select list. There’s no reason they shouldn’t, but this does highlight the fact that those views are there for filtering, not for data extraction.
- The Person table is involved in this query in two places, once directly and once via the view. This is the typical tension between DRY and encapsulation, and in this case encapsulation has prevailed. However, I can reassure you that if your business rules return only foreign and primary keys, you can have dozens of them in a query without much impacting performance. You could have another view that filters for e.g. people with more than one denied group membership if that’s a useful concept. Maybe it’s a bit of a stretch, but I think of this as multiple inheritance for SQL.
- I’ve named this as a data view to remind programmers not to base other views on this one.
The no-reuse rule for data views has a couple of nice bonus side-effects. Because the view contains no intrinsic business logic all it is doing is marshalling the exact data needed for a specific widget or batch process. That means you can refactor these data views with gay abandon – if you delete the widget, just delete the view – and you can tune the data returned by the view for maximum efficiency without regard to generality.
Barney used mostly subquery syntax instead of joins, but the basic idea doesn’t change. You’ll notice Barney’s subqueries all return primary keys only. I like joins because they highlight the set-based nature of SQL whereas subqueries look like a for-loop in disguise, but that’s just personal preference.
Finally, what about stored procedures? I don’t consider stored procs to be part of the SQL DML (Data Manipulation Language) – they just provide somewhere for DML to live, plus that nasty procedural stuff that you sometimes can’t avoid. So if you’re a stored proc shop, everything above still holds. You just might be invoking your data views via a proc rather than directly.