Speaking at cfObjective(ANZ) 2011

I’ve had the honour of being accepted as a speaker at this year’s cfObjective(ANZ) conference in Melbourne. My topic will be “Why bother with OOP?”, which is a question that needs to be asked from time to time. By the way, in case you think I might be either a procedural Luddite or a functional zealot, I think we should bother with OOP – but we should know why we are doing it.

It’s a live issue for ColdFusion in a way that doesn’t apply to, say, Java, because in ColdFusion we have some very effective ways to write simple but powerful apps without writing any OO code at all. Object orientation, like most software design techniques, is a way to manage complexity. What if your platform has abstracted away so much of the complexity that there’s not much left to manage? That’s the situation some simple ColdFusion apps are in.

If you can’t make it to the conference, I’ll blog a bit more about the talk after the fact (i.e. once I’ve written it).

Media organizer blues

Both Windows Media Player and iTunes have some fundamental flaws that make them unsuitable for managing my music teacher wife’s music library.

A media organizer is in essence a pretty simple beast. There are some amazing bells and whistles out there, but basically a media organizer is just a way to manage file metadata (I consider playlist membership to be file metadata). Modern OS’s now will rip, burn and perform (some) device synchronization out of the box but still drop the ball when it comes to metadata management, despite having all the required support structures under the hood. If you’ve ever tried to manage playlists using nothing but Windows Explorer and shortcuts you’ll know what I mean.

Fortunately just about every media organizer does a great job of managing playlists. Where the big two fall down massively is in an area that really should be the absolute bedrock functionality, which is the way they interact with the file system.

Filesystem synchronization

Windows Media Player (WMP) as of Windows 7 still does not provide any sensible way to keep its library consistent with the file system. The state of the art is to delete your entire library and re-import it. There are a variety of 3rd party add-ons to do things like directory watching and orphan pruning. If you got the right set of those installed and working together (and trojan-free), you’d have a workable system.

iTunes does a decent job of keeping itself consistent with its own special area of the file system, but you’re on your own if you want to have any say over how your music is filed. You can update the iTunes library by re-importing your set of folders, but on Windows this has one fatal flaw. iTunes will convert any .wma files it finds to some more Mac-ish format (mp4 maybe? can’t remember). Not only does this take forever, but iTunes does not remember which files it has already converted, meaning if you import once a week for five weeks, you end up with five versions of every wma file you have.

So the notion that the media library should actually reflect the filesystem on which it is based seems to be beyond both Microsoft and Apple. It gets worse though.

Title tags

Both of these packages seem to assume that every piece of music you have has been bought in a store and arrives fully tagged with title, artist, album and genre. This isn’t always the case, especially for musicians. There is one piece of “metadata” that every file reliably must have – its filename. Unfortunately this is the one piece that both WMP and iTunes decline to notice. When burning a CD or synchronizing to a device, these packages will use the title field. If that’s missing, they simply number tracks sequentially, so the file “my great accompiment in C major.mp3″ becomes “track 17″ when it gets to the iPod. Worse than that, neither package provides a way to use the filename to fill in missing title fields.

A solution

No doubt this tale of woe is a well-trodden path and you’re all muttering “Just get [real media organizer brand X] and stop whining, for Pete’s sake”. Anyway, in our case brand X is J. River’s Media Center. Is this the best media organizer? I have no idea. After beating my head against WMP and iTunes I didn’t have the energy to do the full comparison. Is it free? Nope.

It does take a sensible attitude to keeping the library in sync (i.e. it works). It does still have the bad attitude about even a blank title tag being preferable to a full filename, but at least it provides tools to transfer filenames to titles in bulk. Incidentally, the bad attitude seems to be a new “feature”, as I’m pretty sure older versions used to happily use the filename. And it has a strange habit of importing MIDI files into the video section of the library, even though the .mid file extension has been configured as an audio type. Once again, the tools are there to bulk transfer them back into the audio section.

The real solution

This entire problem would just go away if we could do one simple thing – have a file in more than one directory, which is all a playlist really is. The file system structures are all there, only the UI is lacking. My next step (once the blood pressure has subsided a bit) is to investigate 3rd party file managers. I’ll keep you posted.

Javascript’s attire considered insufficient

Perhaps the emperor isn’t entirely starkers, but he isn’t dressed in much more than a grubby loincloth. Which is to say that I’ve often been bemused lately by the enthusiasm with which web developers have taken up the notion of coding entire applications in javascript. The word “notion” is important here – I don’t know if it’s actually being done to any significant degree, but lots of people seem to be reading a book or writing a toy app. Anyway, the fact that this seems to me to be a very odd thing to do I’ve just put down to my own ignorance and lack of insight, so I was somewhat heartened (and amused) to come across this interview with Gilad Bracha who had some choice comments on the subject.

In response to a question from Markus:

“…negative influences, languages you wouldn’t want to influence your [language] – I’m thinking about javascript in particular…”

Gilad replies:

“…javascript is a fairly poorly thought-out language, considering its influences were Scheme and Smalltalk it’s rather sad what came out…”
“…we are relatively lucky ’cause a lot worse [than javascript as the default in-browser language] could have happened, I mean javascript is a wonderful assembly language…people should not be programming the web directly in javascript, they should program in whatever they want and they should compile it down…”

More in this vein can be found on Gilad’s blog.

Interestingly, javascript as a compilation target is an essential part of the recent crop of cross-platform mobile development tools.

View/model coupling

How important is it to decouple view code from the model?

Jeffry Houser and I contradicted each other on this point recently. The fact that we were both right (IMHO) led me to re-examine the issue and write this post.

My point: If I have a view that’s drawing, say, this page, it’s dependent on a blog data model. The view has to know about comments, replies, tags – all that blog stuff. Therefore the view is coupled to the blog data model, at least in as far as it has to know about all the same concepts. So there’s a semantic coupling. There’s also the more practical considerations of field types and data lengths. You have to know stuff about the model to draw the view.

Therefore: why not just have the view invoke the model’s API directly? I contend this does not actually increase the coupling, which is already extensive. There are only two reasons not to take this approach. Firstly, you genuinely expect to use this same view with another model. So, for example, I could take this page from WordPress and use it on top of BlogCFC. (Does that ever happen? I’ve never heard of it.) Secondly, you have a 2-tier or n-tier architecture and remote invocation of model objects is problematic. This is a great argument for 1-tier architectures, but that’s a whole other discussion.

Jeffry’s point: reusable UI components are reusable precisely because they are decoupled from any model. Imagine if we needed a different text input control for each application!

Well, he’s right, isn’t he? Succinct, too. I could argue that there’s some fundamental difference between reusable components and full views, but the dividing line is so blurry that it becomes a circular argument – a full view is something that is dependant on a particular model, a component is something that is not dependent.

More fruitfully, let’s take a look at Smalltalk-80 MVC. I’d strongly encourage anyone who uses the term “MVC” to read this in full, but I’ll summarize the main points here:

  • the view invokes the model directly when it needs data
  • the controller fields user input and updates the model and/or the view accordingly
  • the model may broadcast change events, which the view can register to receive (observer pattern)
  • there may be a nested hierarchy of views (composite pattern); in this case there will be an exactly parallel nested hierarchy of controllers

(Lest you suspect that this only applies to weird dead programming languages, here’s a modern reference to the same concepts).

With this in mind, I’ll accomodate Jeffry’s argument by saying there’s also a nested hierarchy of models, one for each view. So for the area on this page that displays comments, I can just pass the comments list for this post. The comments list, along with related objects, is the model for that sub-view.

Let’s dive right down to the level of a text input component. This is a very simple sub-view, eminently reusable. Its sub-model is defined as a single object of type “string”. The text input is very tightly coupled to that model – it simply doesn’t make sense if you give it an array, for example. Fortunately, my overall model has lots of instances of that sub-model – i.e. it has fields of type “string”. Because the text input’s model is literally embedded in my overall model, I don’t even have to do any conversion. This might not be the case if, for instance, the text input wanted unicode strings and my model only had ASCII. But essentially, I get to reuse that component because I have already reused its related model.

So, to generalize:

  • Every view is tightly coupled to its model. This is necessary and desirable.
  • A generic view will need a generic model
  • Reusability of a generic view is dependent on the reusability of its related generic model.

In conclusion: if you’re busy copying model data into value objects to pass to a view, and you are not explicitly creating a reuseable component (and you don’t have remoting issues) – you’re wasting your time! Just pass the model object and be done with it.

Model-Glue

This is a response to Jeffry Houser’s critique of Model-Glue. You should read Jeffry’s post before this one, as I directly respond to some of his points. To cut to the chase – me too, Jeffry, me too!

I’ve used M-G for two medium size projects. Like Jeffry, I can’t see a case where I’d use it again.

I couldn’t agree more about the event/view structure – this is just a global scope by another name, which as a way of passing variables takes us back about 40 years in programming language evolution. Yes, there are intelligent ways to use it, but the fact is that a robust mechanism for defining APIs already exists in the language (public function parameter lists) and a really great argument needs to be mounted for disregarding it. So does the rest of M-G mount that argument?

It seems to me that the heart of M-G is the implicit invocation mechanism. Essentially this is an event-driven programming model, and like all event-driven programming it supports very strong decoupling. At the point where you raise an event, you have no control [but see Brian's comment below] over who will handle the event or what they will do with it. This is a powerful technique with applicability where system behaviour needs to remain loosely specified until load-time or even run-time (this is why you can change the menu structure in Microsoft Word while it is running). The tradeoff is increased opacity, increased debugging difficulty, and greater emphasis on good design – or to put it conversely, it’s much easier to make a mess of it.

As a programming model, it absolutely is not what I want when I’m setting up an average web app. 99% of the time I know exactly what controller function I want to invoke, I know exactly what data I need, and stating that with clarity is good design. Adding several layers of indirection adds no value at all – rather it greatly increases the risk of regression during future changes. As mentioned above, M-G tends to obscure the APIs of the various components rather than help define them. This is not to say everything should be hardwired. My beef with M-G is that it pervades the whole application, unlike techniques such as dependency injection and aspect-oriented programming, which let me introduce extra abstraction and complexity only where I get the payoff.

As a piece of software, M-G is a great achievement. It’s just the wrong tool for pretty much every job I have. The tragic thing is that, even if I did have to write a complex event-driven GUI, I’m pretty sure I wouldn’t be using ColdFusion to do it.

P.S.
A minor disagreement – I don’t think there’s anything wrong with the view having a dependency on the model. In fact it’s kind of absurd to think that a view can avoid having a dependency on the data it is representing. The important thing is that the model doesn’t have a dependency on the view. (Trygve Reenskaug’s original MVC pattern is instructive in this regard, although it’s not directly applicable to the web). So having to pipe all data via the controller is another layer of useless indirection. Having said that, there’s a fair bit of confusion about where the boundaries between the controller and the view are, so maybe this is just an issue of definition.

Windows update firewall issue

Just putting this out there in case someone else is stuck. The symptom is that Windows Update just stops working. You may not find out about this until your PC complains that it hasn’t been updated for x weeks. In fact, depending on your version of Windows, you may not know unless you actually check the date of the last update.

I get a variety of error codes, all of which boil down to some networking problem (check DNS, etc.) and none of which are actually helpful. The real problem is that the windows firewall is blocking traffic from my router to my PC. For reasons that are beyond my discovery, windows update (and microsoft update) generates traffic from my router to my PC. The source of the traffic actually is the router, not just outside traffic passed through.

So, the fix is:

  • Prepare your geek resources. If you’re not comfortable poking around in firewall rules, go out to the forest and capture a geek.
  • Find out the IP address of your router. Often this is printed on the bottom of an ADSL router and will be something like 192.168.1.1
  • Enable logging of dropped packets in your firewall. I’m not going to tell you how to do this as there are too many variations, so you’ll have to look it up. Just a tip, though – if you’re using Windows built-in firewall, make sure you enable logging for the active profile (usually the private profile).
  • Kick off an update
  • Look in the logs for dropped packets with the router’s address as the source address. Make a note of the port and protocol (e.g. UDP port 2048).
  • Add a rule to the firewall (again, use the active profile) to allow that traffic.

The final twist is that the port may change when the router is rebooted. So unless you want to just allow all traffic from the router, you need to keep an eye on this. For a long time I only saw ports 2048 and 2049, but just lately it’s flipped over to 2051.

Not an especially straightforward fix for something as fundamental as windows update. It’s a disturbing thought that for a user without reasonable tech skills, this problem basically just disables updates, invisibly and permanently. I have found absolutely no mention of this anywhere on the net. Maybe nobody else has this problem – but I’ve seen it with two different routers, four different PCs and three different Windows versions.

I’d be intrigued to know if anyone has any insight into the cause. I can only guess there’s some sort of link monitoring, QoS heartbeat or some such going on. I have found port 2048 mentioned in a list of well-known ports as “dls-monitor”, but no luck finding out what that means.

Online/offline data sync with Adobe AIR

This is a bit of a progress report. Bit light on actual wisdom, sorry – check back later for that.

I’ve been playing with Adobe AIR as a way to develop cross-platform intermittently online applications. I’ve already got a fully fledged Java domain model using Spring and Hibernate, so for the server side Spring’s BlazeDS integration is a real godsend – and it really is as easy to set up as it looks in the Adobe evangelist’s easy as falling off a log video. Flash Builder is pretty neat too, so all is good.

Local data storage is no drama, so one box left to tick – syncing offline data when we get back online.

There are some products that help with this:
Adobe’s LiveCycle Data Services – a vast, enterprise-grade JEE app. Not ruling it out, but hey, I just want to sync a couple of records – this is going to be a very lightweight app.
WebORB – OK, I’m a Tomcat noob, but I failed to get this running in two of three possible installation modes. The one that did work, the prepackaged install, doesn’t even let you change the server port. Guess what – 8080 is already taken (who would have thought?). I don’t want to diss WebORB, but between my install problems and the fact that I really don’t need or want another server, I’m moving on for now.
Farata’s Clear Toolkit. Does anyone else find Sourceforge to be a complete PITA? Every second download fails. In this case *every* download failed. Life is too short to spend it looking at a page full of blinking ads.

OK, so I’ve parked the off-the-shelf idea until I’ve built up a bit more desperation. So how hard can it be to code this up from scratch? The server side is taken care of, all I need is the client-side code.

With SQLite on the client side, the term “database replication” springs to mind. On reflection, though, what I have on the client side is really just a cache, not a database in the business sense of the word. A write-though fault-tolerant cache with pluggable fetch and store strategies would fit the bill nicely. There must be a ton of those already out there, right?

Wrong. Not for AS3, anyway. Please ping me if you know of one.

So that leaves me with two options:

  1. Futz around with native process calls and try to package something like ehCache with my app. Is ehCache really going to work on a phone? I’m nervous.
  2. Write an AS3 cache myself. Now, I have the deepest respect for anyone who has written a bug-free caching system – that’s a very select bunch of people. So I’m still nervous.

Of course the third option is that my Google chops are rubbish and some kind reader will post one of those humiliating “if you looked for 0.7 seconds you would have found this” comments. Here’s hoping.

Slides and code from cf.objective(ANZ) 2010

Slides are here for download. Associated code samples are here.

Speaking at cf.objective (ANZ)

I’m delighted to have been accepted to speak at cf.objective (ANZ) again. The topic will be that timeless old favourite, design patterns. Timeless? Well, since about turns around to check jacket of GoF book 1994, anyway. Not entirely coincidentally, the centre where I work has got into design patterns in the education space, so I’ll be able to draw on that to show just how widely relevant the design pattern approach is.
Excited! Come and heckle if you’re in Melbourne Nov 18 and 19.

Embrace your SQL – use your views

This is a me too post inspired by a post by BarneyB. It drives me nuts the number of weird APIs that exist to “abstract” SQL when SQL is already one of the most elegant and widely known DSL’s around.

OK, usually I’d just cheer from the sidelines/comments thread, but I might actually be able to add some value here by fleshing out Barney’s point about the Structured aspect of Structured Query Language. The natural home for Barney’s Lego pieces is in SQL views (unless you’re using an old version of MySQL, in which case – sorry). How to best organize views? I define these sorts of objects I might use in a query:

  • A base table
  • A view that embodies a business rule.
  • A view that pulls some data for display

And the rules are these:

  • Business rule views must return only primary keys
  • Data views must not be re-used by other views

Plus a couple of supporting observations regarding performance. What reallly stumps the query optimizer (at least for SQL Server) is:

  • A large number of joins in a single statement. The optimizer copes well with very deeply nested views, as long as each view is relatively small (3-4 joins).
  • Text data – whether as join criteria, in WHERE clauses, or even just in the SELECT list

Time for a fully worked example. I won’t write out the DDL, but hopefully you can fill in the blanks. Let’s say we have an access control system with these entities:

Person
Group
GroupMember (joins Person with Group)
MembershipStatus (pending, denied or approved)

Just to add spice, we’re using soft deletes for the Person table, so we have a deleted flag as well. Our task is to write a query that will give us the email address of all the people whose membership has been approved.

First, we need to sort out the soft deletes:

create view vwActivePerson
as
select
  personID
from
  Person
where
  deleted = 0

Approved memberships look like this:

create view vwApprovedMemberships as
select
 personID,
 groupID
from
 GroupMembership
 inner join Status
   on GroupMembership.statusID = Status.statusID
where
 Status.code = "approved"

We use the text status code here so we have a human-readable query, but we limit its damage by only using it in this one view. Every other query that needs this concept simply joins to this view. So this is a business-rule view that defines the concept of an approved membership.

Finally, to pull our email addresses we do this:

create view vwData_approvedMembershipEmails
as
select
 Person.email
from
 Person
 inner join vwApprovedMemberships
   on Person.personID = vwApprovedMemberships.personID
 inner join vwActivePerson
   on Person.personID = vwActivePerson.personID

Ok, seeing as I’m having such fun with the ul tag, here’s a list of things to note about this query:

  • There’s no WHERE clause. WHERE clauses embodying business rules often end up sprinkled around ad hoc query text, but we’ve avoided this by using business-rule views as filtering mechanisms. In SQL terms, I’m setting up a bunch of set intersections.
  • Similarly, note that no fields from the business-rule views appear in the select list. There’s no reason they shouldn’t, but this does highlight the fact that those views are there for filtering, not for data extraction.
  • The Person table is involved in this query in two places, once directly and once via the view. This is the typical tension between DRY and encapsulation, and in this case encapsulation has prevailed. However, I can reassure you that if your business rules return only foreign and primary keys, you can have dozens of them in a query without much impacting performance. You could have another view that filters for e.g. people with more than one denied group membership if that’s a useful concept. Maybe it’s a bit of a stretch, but I think of this as multiple inheritance for SQL.
  • I’ve named this as a data view to remind programmers not to base other views on this one.

The no-reuse rule for data views has a couple of nice bonus side-effects. Because the view contains no intrinsic business logic all it is doing is marshalling the exact data needed for a specific widget or batch process. That means you can refactor these data views with gay abandon – if you delete the widget, just delete the view – and you can tune the data returned by the view for maximum efficiency without regard to generality.

Barney used mostly subquery syntax instead of joins, but the basic idea doesn’t change. You’ll notice Barney’s subqueries all return primary keys only. I like joins because they highlight the set-based nature of SQL whereas subqueries look like a for-loop in disguise, but that’s just personal preference.

Finally, what about stored procedures? I don’t consider stored procs to be part of the SQL DML (Data Manipulation Language) – they just provide somewhere for DML to live, plus that nasty procedural stuff that you sometimes can’t avoid. So if you’re a stored proc shop, everything above still holds. You just might be invoking your data views via a proc rather than directly.