Monday, October 27, 2008

SD Best Practices 2008 Notes

For those that attended in person, thanks for coming to my talks at SD Best Practices.

If you're interested in the slides I presented, please e-mail me. They always change up to the last minute!

If you want advice on how to improve xUnit adoption on your enterprise project, feel free to e-mail me! I'm happy to provide advice. I'm working on a book about adoption of xUnit testing in enterprise projects, so I want to learn as much as possible about the different obstacles out there.

If you can, please take Scott Ambler's 5 minute TDD adoption survey:

Recommended bestpractices for xUnits with the Database:

1. Use the Dependent Object Framework (the DOF) for easy setup of database dependencies for xUnit tests.
2. Use JPA, possibly Hibernate Entity Manager
3. Use an in-memory database, such as H2 (see related blog entry)

To get started on the DOF,
1. Review the DOF User Guide
2. Review the "Hello DOF" example programs.

If you want help implementing the DOF, please contact me.

I'll be speaking at Voices that Matter in San Francisco on December 3

So Wassup with writing software without adding automated unit tests as you go along? Well, this video came to mind. The Wassup video:

In-Memory DBs for TDD: H2 versus HSQLDB

Justin Gordon, October 27, 2008

Use of in-memory databases can dramatically improve your ability to write xUnit tests. Should one also use H2 or HSQLDB for xUnit testing? Or some other Java open source database? My examples for the Dependent Object Framework support both H2 and HSQLDB. It is relatively simple to switch between them. Mostly, it's a matter of switching the JDBC connection string.

Criteria for doing database TDD:

The functionality desired is
  1. FAST. H2 claims to be the fastest In a simple test I did, H2 and HSQLDB performed almost identically.

  2. DB2 or Oracle compatiblity support. H2 claims to have both: Oracle and and MS SQL Server Compatiblity Mode. If this works well, this is a huge win.

  3. Stability: HSQLDB seems somewhat better supported as HSQLDB appears to be used in more commercial applications, so it may be better supported. In the past, Hibernate support seemed more stable. Much of this difference is probably attributable to HSQLDB being older.

  4. Ease of use: H2 slightly beat out HSQLDB in this category with a simpler install that provides a shortcut to starting an in-memory database and a browser session with a
    nice interface. Couldn't be any simpler than this! Setting up HSQLDB was definitely a bit trickier, and the UI is definitely better in H2.
Common DB features not required for TDD:
  1. Scalability
  2. Concurrency
  3. Stability under high load
  4. Data integrity

Those features are absolutely critical to your production application. However, TDD test suites run in single user mode and the tests do not even need to persist the data.

Annecdotes

I know of one commercial software company that completely relies on H2 for their unit tests, while they deploy on Oracle.

References

H2

HSQLDB


Miscellaneous

Article on H2 by H2 creator: Mueller states that H2 is faster than HSQLDB for most
operations and is architecturally superior. Since he wrote the original HSQLDB, he'd be well qualified to comment. Here's the summary straight out of infoq:

HSQLDB creator Thomas Mueller has released 1.0 final of H2, his pure Java database successor to HSQLDB.
H2's focus is to be best database for the lower end (low number of concurrent connections, embedded usage). H2's features are comparable to MySQL and PostgreSQL; it has views, subqueries, triggers, clustering, role based security, encryption, user defined functions, disk and in-memory usage, embedded and client-server usage, referential integrity, scrollable result sets, schema support, transaction isolation. There are a few tools like the browser based Console application (now with auto-complete). A few things are still missing: H2 currently only offers table-level locking, full outer joins are not supported yet, the ODBC driver is only 'experimental' so far, and the standard API for distributed transactions (two-phase commit) is incomplete, however for most use cases these may not be critical.


Here's a good comparison of open source databases by David Leung, based on the following criteria:
  • FREE or Open Source
  • Multiple platform support
  • Easy to install and use
  • Can be bundle in code so little or no extra installation is required
  • Can be scalable, i.e. load balancing, size limit, etc.
  • Reliable, i.e. backup, replication, etc.
  • Efficient, i.e. indexing, fast search, etc.

He recommends, in the following order:
  1. H2
  2. Derby
  3. HSQLDB
  4. MySQL
  5. PostgreSQL
Summary
It is not so critical which in-memory database you use as it is to simply try out using an in-memory database for your xUnit tests. H2 is a good starting point, especially with it's Oracle and SQL Server compatibility modes.

Sunday, March 23, 2008

Database Dependent JUnit Tests and the “Dependent Object Framework"

10/27/2008 --> Please stay tuned for an update to this article in the next couple of days.

Have you ever worked on a large enterprise software project (one that heavily uses a database) and wanted to add JUnit tests? And did you quickly realize that it’s easier said than done? Why is that? Want to learn a better way? If so, keep reading how a new open source project can solve this problem.

Back in 2004, I managed to introduce a huge body of JUnit tests in my team’s project, IBM’s “WebSphere Product Center,” but the tests solely focused on the core storage layer which had few dependencies to the rest of the complicated system. Managers kept pestering for more JUnit coverage. The conventional wisdom was that we needed to refactor the code into the appropriate layers so that dependencies on the database could be isolated and “mocked out”. The conventional wisdom (and most definitely the managerial wisdom!) was that thou shall not make big code changes without lots of automated unit tests. Hmmm, this sounded eerily like “Catch 22”. We couldn’t change the code without JUnit tests and we couldn’t have JUnit tests without changing the code!

We resigned ourselves to the reality that we had to have JUnit tests that ran against the database. What were our options for setting up the database to run tests?


Technique

Details

Issues

SQL Scripts

Write inserts to populate a blank database.

Any time the database schema changes, the scripts need modification.

Restore a backup of the database

Use the UI or any means to get the database ready for the unit tests. Then save a backup copy.

Any time the database schema changes, the backup copies need to be recreated.

Write Java code

Write Java code (probably many lines) to setup the objects for the test.

Too much Java code to maintain, and bugs with the copious code.



All of these options were painful, fragile, monolithic, and tedious. If you’ve tried any of these options, then you would agree.

A co-worker and lisp aficionado, Umair Akeel, came up with a proposal that at the beginning of a test, we list the database objects that must exist for the test. Let’s suppose we are writing a test on an Invoice object and thus that test would require a Customer object, a couple Product objects, and those product objects would require a couple Manufacturer objects. What would that look like?

We would have a method called “require” that would take a file name that would specify a file that would describe a given object, and the file name would also specify the object’s type and primary key. In this example, we would list the following at the beginning of the JUnit test on an Invoice:

require(“customer.25.xml”);

require(“manufacturer.12.xml”);

require(“manufacturer.30.xml”);

require(“product.13.xml”);

require(“product.31.xml”);

In this example, Product 13 depends on Manufacturer 12 and Product 31 depends on Manufacturer 30. The strings passed as parameters to the “require” method specify the object type and the object’s primary key, as well as a file that uniquely defines the object.

I later made the suggestion that instead of having to remember to list out all dependencies (in this case, listing “manufacturer.12.xml” before “product.13.xml”), we put the dependencies of each object in its own object description file.

Then we would only need this at the beginning of the test (or test fixture):

require(“customer.25.xml”);

require(“product.13.xml”);

require(“product.31.xml”);

The “require” method is smart in that it only creates the object in the database if the object does not exist in the database, and once the object is fetched from the database, the framework keeps the database object cached in a map.

This framework allowed us to be very modular in setting up objects for running a database test. That allowed us to achieve the following necessities of running JUnit tests effectively:

  1. Tests must run in any order, and any number of times, with the same result.
  2. Easy to add a new test, and run it without worrying about the state of the database.

We’ve got scripts to set up a blank database, ready for our customers. However, we use the “require” framework to check that the required objects for a test exist. This framework has allowed us to achieve far more legacy test coverage than we ever would have otherwise.

In 2007, I got IBM’s permission to convert the technique into an open source project, which I’ve named the “Dependent Object Framework”.

The Dependent Object Framework (DOF) (http://sourceforge.net/projects/dof/) enables efficient JUnit testing and Test Driven Development against code that depends on objects that are persisted (e.g., database).

Let’s look at a simple example using the Dependent Object Framework. Consider the invoice example listed above. Your invoice needs a product record in the database. So you simply put this line at the top of your JUnit test:

Product product = (Product) DOF.require("product.13.xml");

What you have not done is pre-load the database with a SQL script or database backup. Your test is just saying “make sure that the product record 13 exists in the database and give me that object. You know that the product record will have other records that it depends on, but you only need to specify that your test requires the product record. In fact, maybe the file “product.13.xml” is used by another test, so you didn’t even need to create it.

Can it get any easier than this for specifying what your test needs on top of an unpopulated database?

The DOF makes it easy to write tests with database dependencies. Advantages include:

1. Facilitation of testing legacy enterprise code lacking JUnit tests. With the DOF, you do not need to untangle the database dependencies in order to write mock objects.

2. Easy reuse of database objects needed for unit tests. This facilitates team leverage in creating JUnit tests. The work in creating the database setup for JUnit tests is distributed and modular, versus the monolithic approach of using a SQL script or database backup.

3. Very simple to define which objects a test depends upon. Any indirect dependencies (dependencies of the dependencies) are specified in the files defining the dependent objects. Thus, a JUnit test can depend on object A, and object A might depend on object C. The test simply needs to specify object A, and the definition file for object A will specify object C.

4. Very easy to delete persistent objects created for a test. When one is tweaking tests, this is very helpful.

5. Very easy to add support for new database object types.

6. Fabulous for creating “integration” JUnit tests that run against the database and other real dependencies, just as customers will use the product. So even if you have mocked out the database for many tests, you can still leverage the DOF for running integration tests.

So how does this work?

1. You define a data file format for your object type, such as XML.

2. You write a “handler” class for this data file format. The handler class implements an interface DependentObjectHandler composed of “create(objectFileInfo)”, “get(objectFileInfo)”, and “delete(objectFileInfo).” ObjectFileInfo is a struct with 4 fields: file path that describes the object (i.e., "product.13.xml"), the object's primary key (i.e., 13), the object type (i.e., product), and the file type (i.e., xml)

3. You specify the mapping of the object types to the handlers, in a file called handler_mappings.properties. For example, to specify that files named like product.13.xml map to the ProductXmlFactory class:

product.xml=myPackage.ProductXmlFactory

4. You create a data file for the object you want to create, named like {objectType}.{primaryKey}.{fileExtension} such as “product.13.xml”

5. The data file contains comment lines indicating its dependencies. For example, this line indicates that the product 13 depends on the manufacturer 33 existing in the database.

You can download the framework including the JavaDoc and an example that uses simple XML and hsqldb here: http://sourceforge.net/projects/dof/

Please let me know if you find this useful or if you’d like to contribute. I’m working on extending the example to support hibernate JPA. I suspect that the project might not need to change for this support.

I’m also looking for volunteers to port this code to:

  • C++
  • C#
  • Groovy
  • Other xUnit supported languages.

I'd like to hear if you have any comments or questions on this project.

Cheers,

Justin Gordon

justingordon at yahoo.com

Wednesday, March 12, 2008

Enterprise TDD

Welcome to my new blog. I'm going publish thoughts on how to build enterprise software more effectively, especially with regards to test driven development.