A tale of utilization vs performance

I once worked on a SaaS product that’s typical of a lot of B2B and niche B2C products.  It had a userbase in the thousands or tens of thousands, and provided business functions that involved heavyweight, complex workflows.  Not many users, lots of compute.  I would say that this is the norm for line of business applications, outside of the industry rockstars that are in the news all the time.

We didn’t need docker.  Docker’s claim to fame is that it’s lightweight.  Once you deploy a multi-gigabyte JVM app that can keep lots of cores busy, it really doesn’t matter if you’ve deployed it onto a VM or a container.  The host overhead is in the noise.   In a containers-on-VMs scenario, no VM is ever going to run more than one instance of the container, so the container layer is just management and skillset overhead.

We didn’t need autoscaling.  This application could keep a server farm busy, but it really only needed the one server farm.  It was easily sharded, if we did hit those limits.  It wasn’t bursty, and it used enough queuing that it could handily absorb the bursts that did happen.

We didn’t need Kubernetes because we didn’t need docker or autoscaling.  Again, unnecessary overhead in skillset and tooling.

We didn’t need these things because we weren’t twitter.  We weren’t even one micro-twitter (if that’s a unit of scale).  We might have got to one million users total, eventually (although I don’t think they ever did). We weren’t ever going to get to one million users a month, let alone a day.

We did have performance problems.  Of course we did.  They were caused by bad SQL and bad algorithms.  We know those were the causes, because we could see the issues staring us in the face.  Every time we did a PoC optimisation exercise, we could easily find improvements of multiple orders of magnitude.  But we never committed to regression testing those and getting them through to production.

Instead, we addressed our performance problems by spending so much on infrastructure that we could have hired two or three more developers.  You have to buy a lot of infrastructure to make your application 1000 times faster.  We settled for a bit less than that.

But it didn’t matter how fast or slow our application was, because we fixated on utilization.  The faster we made our application, the worse our utilization looked.  To a server admin, an average CPU utilization of 20% looks like a healthy server.  To an accountant, it looks like an 80% cost saving waiting to be had.

So we took our slow, unoptimized application, and moved it to docker and Kubernetes.  We didn’t get our extra developers, so we never got to optimize at all.  We took a big hit in training and migration, so productivity dipped.  Reliability got worse for a while, because we made some mistakes in ops.  And our application still had performance problems, because any one request by any one user was still running massively underperforming algorithms and overwhelming the database with unoptimisable queries and deadlocks.

However, our utilization figures were immaculate.  As for the performance issues: when I left, they were talking about putting those bad algorithms into lambdas.

 

CQS and atomicity

I’d summarise Command Query Separation as:

  • divide methods into queries and commands
  • only mutate state in commands
  • never mutate state in queries
  • make it easy to tell which is which

All of which I agree with.  The part I think is silly is:

  • command methods should always return void

The argument is that this makes it easy to identify which methods have side effects.  The downside is that if you want to get some information on how your command fared, you have to make a second call.  That’s not an issue in itself.  The issue is that the object has to keep the information about the last command in case you want it.  You’ve taken an operation that you think should be atomic, and in order to honour CQS you’ve made it into two coupled operations.  This is the worst form of coupling, because it’s hidden.

There are ways to restore atomicity to our newly dual operation.  You can make the command and the status request transactional in some way via locking or a transaction manager.  This implies that the status request is either undefined or unreliable outside of the transaction.  Things are getting worse, not better.

In practice, everybody gives themselves an out.  Tim Curry and Martin Fowler write in support of CQS yet both convince themselves that returning a value isn’t that bad as long as you do it for the right reasons.  As do many others.

Let’s look at what Betrand Meyer himself said:  “Asking a question should not change the answer”.

That’s a pretty succinct argument against writing query methods with side effects.  It doesn’t have much to say about writing command methods with return values.  Yes, always returning void from command methods makes it easy to see which methods have side effects.   But there are other ways to do that (naming conventions anyone?) which avoid the serious problems that the “always return void” rule creates.

In conclusion, I’d replace that last rule with these ones:

  • use a naming convention to identify command methods
  • the return value of a command method should only contain information about the command, not about the system state

P.S.
It might be claimed that the difficulties introduced by strict adherence to CQS are a symptom of deep problems with OO as a paradigm, not of any issue with CQS as a principle.  I’m sympathetic to that view; functional and reactive programming both address the objections raised above, and do so at a paradigm level.  However, this article is aimed at developers sitting squarely in an OO paradigm and wondering if they should feel guilty for returning a status code from a command function.

Software engineering reading list

Gang of Four Design patterns

http://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented-ebook/dp/B000SEIBB8

This book must be read and understood in detail by every developer.  Don’t learn the patterns.  Learn the thought process.

 

Eric Evans DDD

http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215

Although Evans does lay out a methodology in this book, that’ s not where the books real value lies (as Evans himself now says).    The real message is about the role of good design, with an emphasis on particular design styles, in managing software complexity.

 

Scott Ambler Database refactoring

http://www.amazon.com/Refactoring-Databases-Evolutionary-paperback-Addison-Wesley/dp/0321774515

Read this to cure yourself of “don’t touch the database” disease.

 

Fowler PoEAA

http://www.amazon.com/Patterns-Enterprise-Application-Architecture-Martin/dp/0321127420

Fowler IMHO is the only true successor to the GoF, in that his pattern catalog is invariably interesting in detail.  In particular, Fowler’s set of ORM patterns are essential reading for anyone using an ORM.

 

Fowler refactoring

http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672

 

Larsen Applying UML and patterns

http://www.amazon.com/Applying-UML-Patterns-Introduction-Object-Oriented/dp/0131489062

This is the best and clearest demonstration of how the concepts OOP and design pattern actually play out in a project.

 

Jim Highsmith Agile ecosystems

https://books.google.com.au/books/about/Agile_Software_Development_Ecosystems.html?id=uE4FGFOHs2EC&redir_esc=y

This book and the next one are IMHO all you ever need to read about agile methodologies.

 

Cockburn Crystal Clear

http://www.amazon.com/Crystal-Clear-Human-Powered-Methodology-Small/dp/0201699478

 

Beck TDD

http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530

TDD is another concept that many developers get weird ideas about.  Some people think that the point of TDD is to end up with lots of tests.  The guy who invented the concept sets the record straight.

 

Linda Rising Fearless Change

http://www.amazon.com/Fearless-Change-Patterns-Introducing-Ideas/dp/0201741571

This isn’t really a technical book, but it’s one of the best demonstrations of the generality of the design pattern concept that I’ve seen.  The idea of design patterns is one that many developers find hard to grasp (and in fact it’s common to get the concepts completely backwards).  See how the concepts apply to a related but dissimilar field is very useful.  And also this is a great book on change management.

 

Kerievsky Refactoring to Patterns

https://www.amazon.com.au/Refactoring-Patterns-Joshua-Kerievsky/dp/0321213351

This is a bit of a bonus read.  It’s not as important as the core patterns books and Fowler’s refactoring, but it is excellent example of applying higher level thought process to detailed program structures.

 

Adele Goldberg Smalltalk-80

http://www.amazon.com/dp/0201113716/?tag=stackoverfl08-20

OK, really nobody is going to read a 45 year old book about a dead programming language.  But this is a sentimental favourite from the most prolific group of visionaries ever to grace computer science.  It’s refreshing to look back past all the decades of nonsense that has been written about OOP and realize that in 1975 these guys really got it.

AspectJ for generating custom compiler errors

One of my favourite uses of AspectJ is to generate compile-time error messages.  This allows you to provide guidance in the IDE for developers writing new code within a framework or library. 

Here’s a quick example. BaseDTO is a base class that developers will extend. It’s used with a framework that requires a no-arg constructor (Jackson in this case, but it’s a common requirement), but when constructed explicitly, the UriInfo parameter is mandatory.

	// No-arg constructor for unmarshalling, but otherwise don't call this one
	public BaseDTO() {}
	
	public BaseDTO(UriInfo uriInfo) {
		this._links = new Links(uriInfo);
	}

We can’t express that requirement in normal Java. As a result, developers can waste a lot of time debugging a new subclass. AspectJ to the rescue!

public aspect DTOChecker {
	
	pointcut dtoConstructor(): call(BaseDTO+.new(..)) 
	&& !call(BaseDTO+.new(javax.ws.rs.core.UriInfo, ..));
	
	declare error : dtoConstructor() :  "DTOChecker: Constructors for subclasses of BaseDTO must include a UriInfo parameter.";

}

Then, if we try to call a constructor for any subclass of BaseDTO without including a parameter of type UriInfo, we get a compile error. In Eclipse, that looks like this:

What’s a domain model for?

Intro from 2018: I wrote this article in 2009 and it’s been sitting in  my drafts ever since.  But I was inspired to look it up again by https://dzone.com/articles/the-secret-life-of-objects-information-hiding, and negatively inspired by https://medium.com/@cscalfani/goodbye-object-oriented-programming-a59cda4c0e53.  So here it is all these years later.

(Prompted by a discussion with Ben Nadel)

There’s a bit of a debate in the CF OO community. OO is good. OK, what’s it good for? You can have an OO domain model to capture all your business logic. What business logic? All I’m doing is inserting and updating records. Etc.

Then there’s all discussion about the “anemic domain model” antipattern. I want to make my beans less anemic, but I just can’t find anything to put in them!

Maybe domain models are only useful for sophisticated, simulation-based apps? CRUD apps don’t have enough business logic. Right?

Maybe not so right. My CRUD apps have lots of business logic. If I trawl through my database schema and pull out all of the constraints, defaults, foreign keys etc, that adds up to a lot of business logic. If I went the whole hog and added triggers to enforce all the more complex invariants, I would have a complex, rich domain model implemented in my database schema. And that’s without any of the personified simulation-style objects that we think of as being the sweet spot for complex domain models.

Some of the data modelling people insist that this is the only way to implement a domain model. Use database constructs for invariants, and put all the calculation logic into stored procedures. Maybe that’s the way to go for a pure CRUD application. The database will throw an exception if I violate any constraint, so my CRUD application just needs to catch those and react. However, any SQL database is such a miserable development environment that I really don’t want to lock myself into that scenario.

Let’s go to the other extreme and implement all of these invariants in our OO application. In practice we would duplicate some of the constraints in the schema, but we’ll say that our app doesn’t rely on that. In the OO world, we have a much richer programming model, so we should be able to go better than just throwing exceptions. We should be able to design our model so that many invalid operations simply aren’t available, and others return sensible defaults, nulls or result codes.

Here’s an example. I need to be able to create and update user records. My invariant is that usernames must be unique.

In a SQL domain model, I would put a uniqueness constraint on my username column. Any attempt to INSERT or UPDATE with an existing username would throw an exception. In theory this should be enough. In practice we tend to write application code to predict whether or not we are going to get an SQL exception. Not quite sure why we do this extra work, but the end result is the same.

In an OO domain model, I can constrain the available operations to make violation of the constraint impossible. First, I create a Users object that represents the set of all users. Then I make the constructor for the User object private. I can’t actually create a new user. If I want a new user, I have to ask the Users object for it.

// me = new User("jmetcher") <--- operation does not exist!!
me = Users.create("jmetcher");

This gives the Users object a chance to enforce the invariant. If there is already a user with username “jmetcher”, it can return that object, or return a null object, or return false, or even throw an exception. Probably I’d return the existing object. So that takes care of the INSERT.

What about the UPDATE? I require that the User object does not have a setter for “username”. Username is part of the logical identity of the User object, so it must be immutable. I may provide a utility method (say, on Users) to change a username, but that will be a maintenance activity – low-level, stop the world, reorganize my data kind of thing. It’s not part of the defined behaviour of a User.

me.setUsername("notjmetcher"); <--- operation does not exist!!

The domain model’s main purpose is to enforce those invariants. The lightbulb realization is that

A good domain model enforces invariants as much by its design as by its code

In this example, I’ve made my User object “richer” by hiding the constructor and taking away a setter – not by adding stuff.

There’s also a lot of discussion about validation. This cycle is taken for granted:

  • load
  • manipulate
  • validate
  • save

and then we talk a lot about where to put these responsibilities. My assertion is that we should be able to just:

  • load
  • manipulate

Save should just be automatic. Save should be the default. You should do something extra if you don’t want to save. Like:

  • load
  • copy
  • manipulate the copy

But what happened to the “validate” step? I’ve got us automatically saving things that haven’t been validated! But see above – I’ve designed the domain model so that I can’t make invalid transformations. So:

A good domain model makes direct manipulation of the domain data a safe operation.

So, what’s a domain model for? A good domain model on top of a full-featured persistence layer will:

  • Enforce invariants using a rich programming model
  • Make manipulating your data safe – without you having to remember to validate before save, or copy before manipulate, or save before exit.

Manipulating data safely while obeying invariants sounds like bread-and-butter CRUD to me.

Footnote from 2018:

What Riccardo says in the first article I linked above is so clear, at least to me.  How is it that Charles in the second article doesn’t get it?  Maybe OOP is like all design thinking, like design patterns and agile methodologies. If you can’t tolerate living in a world of judgement calls, if you can’t code to a conceptual model instead of or as well as a spec, if you think a bunch of smart people making independent decisions sounds like chaos, it’s not for you.  If you just want to know the rules, pick another door.  These are paradigms to help you write the rules.  Does anybody really think that is or can be easy?

5 essential tools for choosing a buzzword for your next listicle

Technology teams are not immune to hype and trends. <Buzzword> isn’t necessarily a new thing. A long time ago in a galaxy far away, <cool anecdote>.
We didn’t always know why things were broken, we had to examine the data to reveal the answers. It isn’t about what you call it or what tools you use.
Start with the strategy and desired outcomes.
<nice troubleshooting story>
At this point, the data reveals what is occurring.
<more nice troubleshooting stuff>
The trend towards <buzzword> tools reminds me of the craze around <every other buzzword> <since the dawn of time>.
There is no easy fix or magic pixie dust for ensuring <anything>.

Thanks and apologies to Mehdi Daoudi.  The above is a palimpsest of his article https://dzone.com/articles/practicality-of-observability – which is a good article with only a tiny bit of product placement.  But aside from the useful content, I was amused and inspired by the very first sentence.  Also as always entertained by DZone’s tagline writers, who in this case managed to take an article that is pretty strongly anti-buzzword and anti-tools-fetish, and give it a tagline that uses the buzzword du jour twice and promises toolz.

 

 

Java method overriding and visibility

This post is about a little test I set up to get my head around one aspect of method overriding in Java. A method in a superclass can call either another superclass method, or a subclass method, depending on the visibility of the methods involved.

These are the demo classes:

public class SuperClass {
	
	public String a() {
		return b();
	}
	
	public String b() {
		return c();
	}
	
	public String c() {
		return "superclass";
	}

}

public class Subclass extends SuperClass {
		
	public String a() {
		return b() + b();
	}
	
	public String c() {
		return "subclass";
	}

}

new SuperClass().a() returns “superclass”. new SubClass().a() returns “subclasssubclass”.

If we change the visibility of method c() to private, however:

new SuperClass().a() returns “superclass”. new SubClass().a() returns “superclasssuperclass”.

In other words, superclass method b() will call the subclass implementation of c() if it is visible, or the superclass implementation if it is not.

Of course, if we then overrride b() in the subclass as well, things change again. Then we will see new SubClass().a() returns “subclasssubclass” no matter whether c() is public or private.

Redux, selectors, and access to state

There are a couple of things I’ve struggled a lot with in working out best practices for React/Redux:

  1. How to actually implement the advice to use selectors everywhere
  2. How to get access to state when I need it

These two things are related, because selectors in general need access to the whole state tree (I think).

So I use three basic techniques:

1. To pass state to react components, I use react-redux, where the mapStateToProps function has access to the global state.
2. To provide state to reducers, I use redux-thunk, which lets me use state-aware action creators and thereby add all required state to the action payloads.
3. Alternatively, I use the third argument to redux-react’s connect() function, mergeProps, which lets me access both global state and component properties and pass them to action creators (and through actions, to the reducers).

Here’s a very basic sketch of how these three approaches look:

// A redux-thunk action creator that uses the getState()
// function to pass state to selectors
export function actionCreator1(someState, someProps) {
	return function (dispatch, getState) {
		someMoreState = selector3(getState());

		dispatch(action1(someState, someMoreState));
	};
}

// A normal action creator that just gets precalculated state
export actionCreator2 = (someState) => ({
	someState,
});

// A redux-react function that can use global state tree to call selectors
const mapStateToProps = (state, ownProps) => ({
	state1: selector1(state),
	state2: selector2(state),
});

const mapDispatchToProps = (dispatch) => ({ dispatch });

const mergeProps = (stateProps, dispatchProps, ownProps) => {
	return {
		...ownProps,
		...stateProps,
		// using stateProps to pass state to action creators
		action1: () => dispatch(actionCreator1(stateProps.state1, ownProps)),
		action2: () => dispatch(actionCreator2(stateProps.state2)),
	}	
};

export const Container = connect(
  mapStateToProps,
  mapDispatchToProps,
  mergeProps
)(Component);

Using these approaches, I can get access to whatever state I need, and therefore use selectors all over the place. I suspect this also lets me get away with a pretty suboptimal state tree and just paper over the gaps with global state and heavy-weight selectors. But I suspect that even with a great state tree shape and great selector design, these techniques are still going to be necessary. Maybe just less so.

ES6 nested imports (Babel+react)

With the ES 6 module system, you have a choice of whether to use a single default export:

export default DefaultObject

or potentially many named exports:

export const NondefaultObject = {}

You import these slightly differently, but otherwise they work the same:

import DefaultObject from './DefaultObject'
import {NondefaultObject} from './NondefaultObject'

const App = () => (
  <div>
    <DefaultObject/> 
    <NondefaultObject />
   </div>
)

Where things go awry is where you want to aggregate up imports, as per Jack Hsu’s excellent article on Redux application structure.

import * as do from './DefaultObject'
import * as ndo from './NondefaultObject'

const App = () => (
  <div>
    <do.DefaultObject/>       // Does NOT work
    <ndo.NondefaultObject />  // works
    <do.default/>              // works
   </div>
)

Why is it so? When you import a default export, the name of the object is actually “default”. Somewhere in the Babel/Redux/React magic factory, somebody is clever enough to use the module name as an alias for its own default export when you use that module name in a JSX tag. However, when you assign that same default export to another value and then try to use that value (as in the import * case), no such magic occurs.

AspectJ: using advised class fields

A short post to clarify something that was a little mysterious from the documentation.

AspectJ around advice typically looks something like:

	pointcut myPointCut( ) : execution(void my.package.myClass.myMethod());
	
	void around(): myPointCut() {
		// do some stuff
		proceed(); // call the advised method
		// do some other stuff
       
	}

What if I want to call other methods or use fields from myClass in the advice? There are a few moving parts here:

	pointcut myPointCut( ) : execution(void my.package.myClass.myMethod());
	
	void around(my.package.myClass myClass): target(myClass) && myPointCut() {
		myClass.method1(); // do some stuff
		proceed(myClass); // call the advised method
		myClass.publicField = null; // do some other stuff
       
	}

To break it down:

  1. Add a parameter to around() with the type of the advised class.
  2. Use the AspectJ target() method to populate that parameter.
  3. Use the parameter value within the advice however you like. But note that you’re limited to public accessible methods and members – despite what you might think, the advice isnot within the lexical scope of the advised class.
  4. Add the parameter value as the first parameter to proceed().

This example is for an advised method with no parameters. If the method has parameters:

	pointcut myPointCut( ) : execution(void my.package.myClass.myMethod(my.package.ParamClass param));
	
	void around(my.package.myClass myClass, my.package.ParamClass param): target(myClass) 
			&& args(param) && myPointCut() {
		myClass.method1(); // do some stuff
		proceed(myClass, param); // call the advised method
		myClass.publicField = null; // do some other stuff
       
	}