Database testing strategy in build
[ vmassol ] 09:18, Monday, 16 February 2004

Applying a working build strategy for testing against a database is not easy. It depends on the complexity of the database model, it depends on the size of the teams. However, I've found that the strategy described below is the one that has worked the best for the projects I have been involved in:

  • Do not mix unit tests independent of the environment (i.e. where interactions with the environments are stubbed/mocked) with integration unit tests (IUT). They have to be separated and put in different different in the SCM. The reason is that the 2 kinds of tests do not support the same execution workflow. More below.
  • Have a database build project (in the sense of an Ant or Maven project) in your SCM. This is extremely important. The goal of this project is to provide the following build targets/goals:
    • create-schema: create the database schema from the ground up (in the database specified as properties)
    • load-static-data: loads static data (i.e. read only data)
    • load-minimal-data: loads a functionally minimal set of data. It should contain all data required functionally but only 1 or a few entries of each type. It's not supposed to reflect the state of the database when in production.
    • load-full-data: loads a full set of data as expected in production.
  • Put the database data (minimal + full sets) in your SCM as flat files (by opposition as keeping the data live in the database). The reason for this is because:
    • you get automatic notification of data changes by using your SCM send-email-on-commit feature that all good SCM have
    • it is build-friendly and allows automated and controlled builds
    • it is controlled, i.e. you know what you're doing with your data, who is modifying them, you can revert if need be, etc
  • Here's the workflow for executing IUT or functional tests. For each project and before the test suite runs:
    • execute database:create-schema
    • execute database:load-static-data
    • execute database:load-minimal-data

    Then, each test should also have the opportunity to load data in its setup (using DBUnit or similar). This is required for example to test special cases where the database is missing some required data and we wish to verify the exception handling part of the code. It is also required if the test requires more than the minimal data set (although that should be relatively infrequent).

    Note that the tests can also be ordered to save some database load time. Although not the best strategy I've found that this was sometimes required on some projects with complex database models.

On the other hand, here are strategies that have not been working so well for me in the past:

  • Have a live database where developers can directly update data. The problems encountered were:
    • it is not controlled. You do no know who's putting data and what is being modified. You cannot easily revert a change
    • it's difficult with distributed teams as you need to set up a replication mechanism. The problem is that often developers update their local database and forget to update the master database which leads to lots of build failures. The solution exposed above does not suffer from this problem.
    • It's hard to sync everyone on the exact same set of data. Some minimal data + variations works best.
  • Do not provide minimal data and let developers write from scratch the data they require for their tests and load these data before each test. This does work for small projects with simple database models but not for complex ones. There's really the need for a minimal data set.

Is that also your experience?


Comments

I like in a world where the dba is so retarded they don't want to let developers have the rights to create schemas in teh development database!!! we so urgently need what you describe yet some silly person stands between us and the database. we need to support two, maybe three versions of the database schema also. the solution offered is for dba to maintain two or three schemas per developer to use as private areas for unit testing! i kid not. instead of just letting us execute sql script or dbunit xml into own private schema.

but thanks for your blog vincent yet another data point to nail in to this stupid coffin that confines us.

--frustrated, February 16, 2004 12:41 PM

Great and timely entry!

http://www.martinfowler.com/articles/evodb.html

I saw Pramod give this talk at our local JUG last year, it was really good. I have been following the agileDatabases list at yahoogroups also. It may be the next level of what you're describing.

I agree with you that it is extremely difficult. Our challenge right now is our programmer tests ( integration included ) run on each commit. It is really slow, 90 minute builds for 50+ projects, the majority of the time spent on these integration type tests. I'm attempting to try maven as a build solution, and at the same time separate into projects the code+programmer tests, integration programmer tests, and customer tests.

I have a theory that the integration tests are happening because the code isn't written with testing in mind ( ala TDD ). That pushes developers to test-last and farther away from the code which leads to the integration-type tests.

We have the same issues with other external systems, like file shares, ldap, remote services, etc.

--Jeff Brekke, February 16, 2004 02:39 PM

This is something that we're sorting out at the moment on my current project and one of the decisions is whether to implement the stuff that sets up the database schema/data as a script (not portable but easy) or as Java classes (can run on the UNIX boxes too). In the short term we're probably going for a script but will migrate this to a Java class in the future. Any thoughts?

--Simon Brown, February 16, 2004 04:46 PM

I have personally experienced all of the issues that you pointed out with regards to testing against a live database. Normally I used some of the strategies to avoid the necessity. That was until I consulted in an environment where I had no choice but to work with the live database. The way the problem had been addressed was to just avoid testing altogether. This was not an option.

What had sort of worked for me in the past dealing with this sort of environment was to put data into the database in the setup method, and then remove it in the teardown. The only issue is that multiple people could not run the tests at the same time for fear of having conflicting primary key entries (I have not dbunit, but I think it might have solved that?) This was not going to work because the database we were dealing with was very trigger happy.

The final solution that has worked well for the last 6 months or so was to wrap each test in its own transaction, finally rolling back the transaction in the tear down. This worked very well, and even played nice with Hibernate since Hibernate just piggy backs off of an existing transaction if one exists.

I put up a version of the code on my blog.

Has anyone else done this, or see issues that we may not have encountered yet? If so please let me know. My mail is on my site, or just comment here or there. Its a topic that definitely interests me.

--Cobbie Behrend, February 16, 2004 08:27 PM

I see other drawbacks with tests wrapped in their own transaction.
First of all, it's not possible for a developer to check modified data with a classic database client and I think this can be disturbing, especially for beginners. However, it is possible to write assertions from inside the test against a (disconnected) ResultSet loaded for example through a little API allowing to perform common SQL queries.
Secondly, have you ever experienced the following: launch your test from an IDE in debug mode and stop brutally its execution. What do you expect to happen? From my experience, this can lead to deadlocks and force you to wait till the transaction timeout. And this is very annoying...
So I think this approach doesn't follow the KISS principle and introduces useless complexity. It can quickly become difficult to add functional test for example :) That's why I definitely prefer to use DBUnit for IUT. The strategy would be very close to rollbacked test in fact: use DBUnit to load a dataset in the setup() method and use DBUnit to delete the loaded data in the tearDown() method in order to restore the database. Moreover, there is already a Maven DBUnit Plugin with predefined goals to generate a DTD regarding your schema, dump and import data that can be really helpful while in development and/or for continuous integration.

--Pascal Thivent, February 16, 2004 10:59 PM

Hi Jeff,

[snip]

> http://www.martinfowler.com/articles/evodb.html

I hadn't read that link. I've skimmed through it and it looks great. I'll now read it in details. I think you're right and it might be the next level of what I'm describing. Thanks for the link.

[snip]

> I agree with you that it is extremely difficult. Our challenge right now
> is our programmer tests ( integration included ) run on each commit. It
> is really slow, 90 minute builds for 50+ projects, the majority of the
> time spent on these integration type tests. I'm attempting to try maven
> as a build solution, and at the same time separate into projects the
> code+programmer tests, integration programmer tests, and customer tests.

That’s cool. We're using Maven too (we have about 100+ projects and our build takes about 1 hour - We are really missing tests though - that's the reason it's not taking longer...). People here have mixed IUT and pure unit tests. I'd also like make a local project build on each commit (actually I was going to blog about this) and we'll need to separate IUT from pure unit tests for that.

> I have a theory that the integration tests are happening because the code
> isn't written with testing in mind ( ala TDD ). That pushes developers to
> test-last and farther away from the code which leads to the integration-
> type tests.

I don't quite agree :-) I believe you're right that writing a la TDD will remove integration test needs but you'll still need to test the integration/functional parts. So it's just pushing the problem a bit further. But I agree that all in all it should reduce the time the whole suite of tests takes.

With Cactus2 (http://blogs.codehaus.org/people/vmassol/archives/000520_cactus_v2_architecture_proposal.html), I'm trying to introduce a new type of test: white box functional testing... ;-). It's a bit what I had started a year ago with PatternTesting (http://patterntesting.sourceforge.net/).

The idea is to run functional tests but still be able to see what's going on in the application by putting probes inside the code. You're also able to perform IUT (a la Cactus) by redirecting the flow of execution to wherever you wish. This is useful for testing tricky parts that are communicating with the server. For example, say you wish to verify that your code behaves correctly when there is a container transaction rollback.

Don’t know if people will find it useful or not but there's only one way to find out... ;-

[snip]

--VincentMassol, February 17, 2004 08:40 AM

1) SQL-Script vs. Java code for DB setup:
We used SQL-Scripts as well as Java code to setup the database in the past.
We have a small Java-Programm that parses and runs our SQL-Scripts which makes them portable. We also have some adoptions built into the parser to run the same script on different databases.

We have used a variant of the ObjectMother-Pattern to create persistent test data. This was more convenient for complex object models, compared to writing SQL-Scripts by hand. A lot of foreign key relationships had to be coded by hand in SQL-Scripts and were done automatically if you just persist your Java objects.
Also loops for a bunch of data is more easy in Java. In SQL-scripts this leads to a lot of SQL text.

2) Developer local databases and DBA trouble:
We installed our own database on a local machine that was under our own control. So there we could do whatever we wanted. No trouble with the DBA.
The easiest way to do something like that is to use HSQL. That's a very light in memory database. But it can also be run as a "database server" on your local machine.

Greetings,
Malte Finsterwalder

--Malte Finsterwalder, February 18, 2004 09:20 AM

Challenging Pascal's proposal: DbUnit files have one inconvenient, they require DB structure - so DB development ! - to be stable; because refactoring dbunit files when you change the structure of your database is tricky, nearly impossible. Whereas refactoring directly "data" at the same time you change the structure is more easy.

And there are no reason why java developers could develop progressively, refactore their code and db developers not !

However I agree using a live database is painfull to "version control".

Unless someone spend some time on a system to manage dbunit files...

--Dominique, March 15, 2004 10:15 AM

I have a detailed article posted at http://www.agiledata.org/essays/databaseTesting.html which you might find interesting. It overviews how you go about testing a relational database. The interesting point is that you need to be concerned about both interface testing as well as internal testing. Most people just focus on interface testing.

- Scott

--Scott Ambler, December 11, 2006 12:16 AM

We use Scriptella ETL (http://scriptella.javaforge.com) to setup a database for both production deployment and in-memory testing. As a bonus you get an embeddable easy-to-use ETL tool with support for XML, CSV and LDIF.

--ejboy, February 5, 2007 05:40 PM
Post a comment









Remember personal info?