Hibernate, Javassist proxies and equals()

Hibernate uses Javassist proxies to help with lazy loading. If you’ve got no idea what that means, don’t worry – to use Hibernate, you’re not supposed to need to know anything about it. However, it is another one of those clever “transparent” framework programming techniques that can really give you some headaches in debugging if you don’t know it’s there (see http://lagod.id.au/blog/?p=266 for a couple more). There are four scenarios where the transparency gets a bit murky and you need to a know a bit more of what’s under the hood to avoid going crazy.

The first two, along with a good description of what Hibernate is doing with the proxies and why, can be found here: http://blog.xebia.com/2008/03/08/advanced-hibernate-proxy-pitfalls/. The third is any kind of reflective programming – see http://blog.jeremymartin.name/2008/02/java-reflection-hibernate-proxy-objects.html for a nice description of a typical issue and a good workaround. [Edit March 2014: Also applies to any framework that uses reflection, such as Apache BeanUtils/PropertyUtils].

The fourth, which is the one I really want to talk about, is the fact that the proxy actually is not strictly equal to the real object (in the sense of ==). Sounds obvious really. Not usually a problem, as you usually just have a reference to the proxy. However, there is a particular circumstance where you can end up with a reference to both the proxy and the real object. You will almost certainly believe them to be the identical object. There will be nothing about their behavior that will indicate to you that they are not, except for the fact that when you compare the with == you’ll get a false result. In fact, because the default implementation of equals() uses ==, the same applies to comparing them with equals().

You don’t need to do anything too wacky to end up in this very odd state. Consider this object:

class  MyClass {

Item head;

List items = new ArrayList(0);

// ... plus getters and setters
}

mapped to a database structure that looks like (pseudocode):

Table MyClass
Integer myClassID
Integer FK_headID

Table Item
Integer itemID
Integer FK_myClassID

So it’s a list of items, with an extra reference to one of the items broken out into a separate field for easy access. A bit denormalised, but if it’s a big list and it’s expensive to calculate “head”, it’s the kind of thing we might do without thinking too much about it. The important thing is that “head” is also in the list. Hibernate supports this mapping quite happily. You can load, manipulate and save this object graph all day long. However, this:

foreach (Item i: myClass.getItems()) {
  System.out.println(i.equals(myClass.getHead()));
}

will print false for every value in the list, even when you’ve confirmed that for one of them i.getItemId() == head.getItemId().

So what’s the trick? Using a standard many-to-one mapping for head, and a standard one-to-many mapping for the item list, myClass.getHead() returns a Javassist proxy object, and iterating through the list returns the unproxied object. If you look at it in a debugger, you can see that the real object backing the proxy is in fact the identical object that you get from the list – this is not a case where you’ve inadvertently mappedd one database record into two different in-memory objects. However, realobject != proxy (and vice versa).

The fix is easy. Although lazy loading is the default for Hibernate, it’s almost never a significant benefit for single objects. Simply set lazy=”false” in your mapping file for the many-to-one association. You can leave lazy loading enabled for the list, as it works by a different mechanism. If you really do need lazy loading for the single object and don’t mind a slightly more complex build configuration, there’s lazy=”no-proxy” (documented here: http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html#performance-fetching-proxies). And if you don’t mind Hibernate-specific code in your domain model (something I avoid like the plague, personally), you can also explicitly “unwrap” the proxy as described here: http://stackoverflow.com/questions/16383742/hibernate-javassistlazyinitializer-problems-with-validation.

So there you have it. A lesson in leaky abstraction, and, for a refreshing change, a head-scratching object equality problem that didn’t, for once, turn out to be some semi-intractable classloader wackiness.

Spring/Hibernate unit testing part 2

The basic Spring transactional testing approach is great for testing domain logic. If you want to test your persistence layer, though, it falls short in some important ways. Most importantly, with the usual approach you will never actually load an object from the database. All the objects you create will hang around in Hibernate’s session cache, and will always be fetched from there. Even objects you created in a @Before method will be within the one Hibernate session.

As an example of why this matters, you can misspell your primary key name in the hibernate mapping file, and these tests will never tell you. Don’t ask me how I know 🙂

The solution is to create some test data before the main test transaction kicks in. That has some significant disadvantages, so I only do it for test classes that are specifically testing the persistence layer. What are the disadvantages?

  • If something goes wrong with your setup or teardown code you can be left with a dirty database and you’ll have to clean up before any further tests will work. In a CI environment that can be a real pain, so ideally you’d be regenerating your test database after each run of the persistence tests.
  • Because you need three transactions per test, it’s significantly slower
  • You have to write the transactional boilerplate for setup and teardown

So, it’s worth doing, but it by no means replaces the basic single-transaction approach. Most of your tests should still be using that approach.

In use, the persistence test class looks something like this:

@ContextConfiguration(locations={"/myApplicationContext.xml"})
public class MyPersistenceTests extends  AbstractTransactionalJUnit4SpringContextTests   {
        @Resource protected SessionFactory sf;
	@Resource protected org.springframework.orm.hibernate3.HibernateTransactionManager txManager;

	@BeforeTransaction
	public void setupBeforeTransaction() {
		DefaultTransactionDefinition def = new DefaultTransactionDefinition();
		def.setName("SomeTxName");
		def.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRED);

		TransactionStatus status = txManager.getTransaction(def);
		try {
			// Your setup code here
			txManager.commit(status);
		}
		catch (Exception e) {
			txManager.rollback(status);
			throw new Error (e);
		}
	}
	
	@AfterTransaction
	public void teardownAfterTransaction() {
		DefaultTransactionDefinition def = new DefaultTransactionDefinition();
		def.setName("SomeTxName");
		def.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRED);

		TransactionStatus status = txManager.getTransaction(def);
		try {
			// your teardown code here
			txManager.commit(status);
		}
		catch (Exception e) {
			txManager.rollback(status);
			throw new Error (e);
		}

               // Your test methods here
	}

We’re still extending the same Spring base class as before. We’ve just injected our transaction manager and added a couple of methods. The @BeforeTransaction and @AfterTransaction annotations are provided by Spring for exactly this use case. Spring will still wrap our test methods in a transaction automatically, and still roll it back after each method invocation.

So now in our test methods we can load any of the objects we created in our “before transaction” code and be sure we are hitting the database.

After writing 20 or 30 of these persistence test classes you start to notice that they all look very similar. Create an object. Make sure you can load it. Tear it down. This is even more pronounced if you use a base domain object class as I do. In my next post I’ll talk about some techniques for writing abstract test classes that do most of the repetitive stuff.

Spring/Hibernate unit testing

Spring provides some nice tools to assist in unit testing with Hibernate, which are well covered in Spring’s excellent documentation. I want to recap some basics here, though. Later on I want to talk about persistence layer test generation, and I’ll need to refer back to these notes.

One of Spring’s convenient tools is a base test class that does a number of nifty things:

  • Sets up your application context, including Hibernate SessionFactory if you have one.
  • Uses that context to do DI into your test classes, which is a nice way to access your singletons within your tests.
  • Wraps each test method in a transaction which gets rolled back at the end of the method. Complete test isolation with zero work on your part!

In use, it looks something like this:

// This tells spring where to find my config.  Note that this could be a special test config rather than my production config
@ContextConfiguration(locations={"/myApplicationContext.xml"})
public class MyContextTests extends  AbstractTransactionalJUnit4SpringContextTests   {
	
        // Standard Spring DI
	@Resource protected SessionFactory sf;
	
       // a basic unit test (utility methods omitted).  Note that I'm creating new objects and flushing them to the database,
       // but not bothering to clean up in any way.
	@Test
	public void testCategoryManager_createDuplicateInDifferentService() {		
		Category category = createCategory("PCTEST");
		assertEquals("PCTEST", category.getName());
		sf.getCurrentSession().flush();	
		createCategoryFromOtherService("PCTEST");
	}	
}

This is a godsend for most test scenarios. It falls down in some ways, though, and I’ll talk about that in my next post.

Multiple datasources

It’s been stated that a weakness of CF9 ORM is that it only allows one ORM datasource. You can use other datasources via cfquery. It’s worth understanding why this is so. It certainly wasn’t left out arbitrarily.

In a nutshell, default CF9 ORM only allows one datasource because it wraps each request in a transaction. Generally speaking, this is a good idea, but it’s a characteristic of standard database transactions that each is tied to a single database connection. This is generally true, not just specific to ColdFusion datasources. It’s also why cftransaction only allows queries from one datasource within the transaction body.

I gave myself a couple of escape hatches in the preceding paragraph – I said “default” CF9 ORM and “standard” database transactions. Does that mean non-default CF9 ORM with non-standard transactions does allow multiple datasources? Why, yes it does! But be aware that this is a non-standard usage model. Not only do you need to delve into the underlying Hibernate configuration to make it work, many scenarios are not even standard practice within the Hibernate world.

I’m not going to write a how-to on CF9 ORM with multiple datasources, but I will provide some pointers to how this might be done.

  1. Delve into the underlying Hibernate and turn off the transactions. Non-transactional code is fairly common in plain CFML apps. This sort of thing is a large part of why “scripting” language developers are regarded as cowboys by “serious” (read Java) developers, but it can be a valid solution in some circumstances.
  2. Use XA datasources with a JTA transaction manager for true distributed transactions. There’s a succinct example for this in Java here. This is common practice in the J2EE world, but pretty outre for CF. For a start, you’ll have to find XA datasources for your database and replace ColdFusion’s built-in datasources. You may also need to configure your database server to participate in distributed transactions.
  3. Roll your own transaction manager in Java and plug it into Hibernate. There are any number of people out there in the Java world with strange use cases attempting just this. Google “hibernate multiple datasources” for some interesting and somewhat desperate reading. Or not 🙂
  4. Use CF9 ORM in its default transactional mode for your main datasource with non-transactional access (e.g. cfquery) to secondary datasources. You don’t have to completely give up on data integrity, though. Schedule a task to check data integrity and issue compensating transactions where necessary.

There are some other possibilities, but that should be enough to get my basic points across. Firstly, this isn’t really a CF9 or even a Hibernate issue. And secondly, you really only have a choice between either relaxing your transactional guarantees (options 1 and 4), or doing a lot more work (options 2 and 3).

Transactions – optional?

No. Transactions are not optional.

One of the things that irked me in learning Hibernate was the amount of time spent worrying about transactions. Indeed, when I speak about Spring/Hibernate implementation models to ColdFusion developers, I get the same reaction that I initially had myself – “Why do you keep talking about transactions? We know what they are, it’s kind of interesting, but this is hardly a central concern.”

Coldfusion database connections are in autocommit mode by default. The Hibernate guys wrote an excellent article on autocommit. Essentially it means you automatically get a transaction per <cfquery>.

This is a good thing. The CF server can’t possibly decide on a transaction strategy for you, so it can only do one of two things:

  • Turn on autocommit so you can get coding
  • Turn off autocommit so you must implement a transaction strategy before you can run a single query. Alternatively, you can work out how to turn autocommit back on.

Given ColdFusion’s RAD focus, which do you think CF is going to choose? It’s going to let you get coding.

Unfortunately, many of us take that as permission to put off thinking about a transaction strategy indefinitely. Can you actually put your hand on your heart and say “No, data consistency is not important to my application”? Probably not. But if you’re like me, you might be able to let your application slide across the line from prototype to production without really addressing this issue.

Hibernate, which is much more interested in being right than being RAD, starts from the other position. Work out your transaction strategy. Then we’ll talk. Autocommit mode is available if you really need it, but is off by default and is considered a non-standard usage model.

So no, transactions aren’t optional. Hibernate smacks you over the head with this fact, so if you’ve been letting this aspect of your applications slide, you’ve got some catching up to do.

Note: CF9 ORM in its default configuration does in fact choose the transaction-per-page-request strategy for you. This is a bit of a philosophical departure from the default cfquery behaviour, but I think it manages to walk the line between CF RAD culture and Hibernate “serious software engineering” culture.

Non-standard usage model

You’re the chief technologist of Webz R Us. You’ve been around, used a few different languages. You think in the abstract and can adapt to the specific implementation. You can make ColdFusion sing and dance. Now you’ve given yourself the task of getting Spring and Hibernate up and running with your ColdFusion web apps. You’re not using CF9 ORM – you’re going to roll your own for ultimate control.

It’s natural to think that you can take your accumulated wisdom and cherished practices and tweak them a bit for this new implementation technology. If you’re a JEE web app developer, that might be true. If you’re a ColdFusion developer, it almost certainly is not.

So after a train-wreck or two, you jump onto the forums – this is open-source software with a vibrant community, right? – and explain your cherished practices, why they’re so great, and ask for help on that one little API that you need to make everything fall into place. It’s so obvious to you that this should work. You’ve been doing it for years in ColdFusion. In Java, with all these great frameworks, your cool approach should be even better supported and easier to do.

And what you’re told is:

That’s a non-standard usage model

Let me translate that for you:

  • What a dumb idea
  • Even if it works, nobody cares
  • Go read the documentation
  • You n00b

OK, I’m being a bit harsh here. Let me try another translation:

  • 99% of successful projects don’t use your technique
  • Maybe you should think about why that is so
  • You really do have an edge case? Time to man up and earn those big bucks. Good luck.

I love the quote below. The guy who wrote this really had put in the hard yards to understand the questioner’s use case – this was by no means a brush-off:

How about you just trust what Hibernate is doing, because it always has very good reasons for its very sophisticated caching behavior, and the people who designed this stuff have spent a lot, lot more time thinking about caching and transactions than you have.

I didn’t post this just to be amusing. I’ll be referring to this post a lot in the coming months. You won’t believe how many non-standard usages I’ve been able to come up with. Stay tuned…