Think tank
[ vmassol ] 10:25, Wednesday, 19 December 2007

Jason Hunter was kind enough to set up a Markmail site for XWiki. Markmail is a mailing list archiving tool with a powerful search feature.

What sets it apart from other such tool IMO is the UI and speed/quality of search. I especially like the ability to see who's sending the most mails to a list and the nice syntax coloring display of emails (in addition to the thread view). Another nice feature is that emails are indexed a few seconds after they have been received by the list (compared to several hours with other tools). I love it :)

[ vmassol ] 08:19, Tuesday, 18 July 2006

It would be nice if there were a tool that could verify that you have correctly added @since tags for methods added in the current version. It would do this by checking against the previous release.

This tool could be based on Clirr or JDiff for example. It would also have an option to fail the build if there are new methods without a @since tag.

Do you know if such a tool exists?

[ vmassol ] 13:52, Monday, 17 July 2006

The experience that I'm relating here is part of an exploratory refactoring that I'm currently doing on the Cargo code base. Till now we were using Java File objects for representing J2EE archives or container installation and configuration directories. This is ok but it makes unit testing a little bit complex when it comes to unit testing File operations. The reason is that you need to define a location on your local file system where you're going to read/write files to, clean up the files, etc.

Here's a method we had (it expands a JAR file):

    public void expandToPath(String path) throws IOException
    {
         File workDir = new File(path);
         JarInputStream inputStream = getContentAsStream();
         
         byte[] buffer = new byte[40960];
         
         ZipEntry entry;
         while ((entry = inputStream.getNextEntry()) != null)
         {
              String entryName = entry.getName();
              entryName = entryName.replace('/', File.separatorChar);
              
              String outFileName = workDir.getPath() + File.separator + entryName;
              File outFile = new File(outFileName);
              
              if (outFileName.endsWith("/") || outFileName.endsWith("\\"))
              {
                   outFile.mkdirs();
               }
              else
              {
                   if (!outFile.getParentFile().exists())
                   {
                        outFile.getParentFile().mkdirs();
                    }
                   
                   if (!outFile.exists())
                   {
                        outFile.createNewFile();
                    }
                   
                   FileOutputStream out = new FileOutputStream(outFile);
                   int read;
                   while ((read = inputStream.read(buffer)) > 0)
                   {
                        out.write(buffer, 0, read);
                    }
                   
                   out.close();
               }
          }
         inputStream.close();
     }

Here's how I've transformed the method by removing all File operations and instead introducing a FileHandler interface with the following methods, equivalent to the File ones:

  • append(URI, String): appends a suffix to a URI
  • mkdirs(URI): create directories for the URI
  • exists(URI): return true if the URI exists
  • createFile(URI): create a file
  • getOutputStream(URI): get an output stream for the passed URI
    public void expandToPath(URI path) throws IOException
    {
         JarInputStream inputStream = getContentAsStream();
 
         byte[] buffer = new byte[40960];
 
         ZipEntry entry;
         while ((entry = inputStream.getNextEntry()) != null)
         {
              String entryName = entry.getName();
  
              URI outFile = getFileHandler().append(path, entryName);
  
              if (outFile.toString().endsWith("/"))
              {
                   getFileHandler().mkdirs(outFile);
               }
              else
              {
                   if (!getFileHandler().exists(getFileHandler().getParent(outFile)))
                   {
                        getFileHandler().mkdirs(getFileHandler().getParent(outFile));
                    }
   
                   if (!getFileHandler().exists(outFile))
                   {
                        getFileHandler().createFile(outFile);
                    }
   
                   OutputStream out = getFileHandler().getOutputStream(outFile);
                   int read;
                   while ((read = inputStream.read(buffer)) > 0)
                   {
                        out.write(buffer, 0, read);
                    }
   
                   out.close();
               }
          }
         inputStream.close();
     }

The interesting part comes now. Because it was a bit hard to create a unit test for the original expandToPath method nobody had done it. It would have involved passing a test JAR but more difficult it would have involved passing a target directory where the JAR would be expanded. This is not easy as the location of this target dir would depend from where the tests is executed and making it work seamlessly from both a build tool and from your IDE is not trivial. Here comes VFS to help us. By implementing the FileHandler interface using VFS, we can now write the following unit test:

    public void testExpandToPath() throws Exception
    {
         URI jarURI = new URI("ram:///test.jar");
 
         FileObject testJar = VFS.getManager().resolveFile(jarURI.toString());
         ZipOutputStream zos = new ZipOutputStream(testJar.getContent().getOutputStream());
         ZipEntry zipEntry = new ZipEntry("rootResource.txt");
         zos.putNextEntry(zipEntry);
         zos.write("Some content".getBytes());
         zos.closeEntry();
         zos.close();
 
         DefaultJarArchive jarArchive = new DefaultJarArchive(jarURI);
         jarArchive.setFileHandler(new VFSFileHandler());
 
         jarArchive.expandToPath(new URI("ram:///test"));
 
         // Verify that the rootResource.txt file has been correctly expanded
         FileObject rootResource = VFS.getManager().resolveFile("ram:///test/rootResource.txt");
         assertTrue(rootResource.exists());
     }

Notice the use of the "ram:" URI scheme. This one of the many filesystems supported by VFS and it means that all file operations will happen in a virtual file system in memory. Also note that VFS doesn't currently support creating Zip files so we're using the JDK's ZipOutputStream API. The nice thing is that as this test operates in memory there's no need to define a target location on the file system.

The other nice thing is that by introducing VFS to this expandToPath() method it's now possible to expand a JAR to any file system supported by VFS. We could thus expand to a FTP server, to a WebDAV repository, to an HTTP URL, to a remote machine using SSH, etc. All this without changing a line to our code. Nice isn't it?

[ vmassol ] 09:54, Thursday, 13 July 2006

(Updated 2006-07-14: Added section on discovering modules and added disclaimer at the end)

IntelliJ IDEA has revolutioned the IDE landscape by adding "intelligence" to IDEs. A few days ago I did a thought experiment by asking myself the following question "how feasible would it be to build a project without knowing any meta-data about it?". In other words, is it possible for a build tool to be intelligent enough to build a project without build files nor POMs. Said differently, is it possible to figure out a project's POM automatically? Let's review some required typical meta-data information and see how they could be guessed.

Source locations

It is possible to guess where sources are by looking for *.java files (for Java projects - The same applies for other project types). Now we still need to differentiate main sources from test sources but that's also relatively easy to do. We can check for classes extending JUnit's TestCase for example or the TestNG equivalent, or any other well-known testing framework.

Note: An interesting thing here is that to be intelligent we'd need the help of the community to add new rules to the discovery process. For example imagine that a new testing framework appears; we'd need to add it to the Test Discovery Rules. Thus, this type of intelligent build system would need to rely a lot on the community and thus would need to get its data from an online repository that could be edited by the community.

Dependencies

How do we detect project dependencies? One relatively way is to parse the sources that we have found above and find all external imports. Then query ibiblio to find matching package names (this information is present in Maven POMs on ibiblio). Now for guessing the version, there's no easy magic. A first approach would be to get the latest released version of the dependencies we've found.

Project type

Project types can easily be guessed by looking at some files. For example if a web.xml file is present then it's a WAR project, if an application.xml one is found then it's an EAR project, if a jnlp file is found then it's a JNLP project, etc.

SCM

SCM can easily be guessed by looking for special files on the filesystem of the project. For example we would look for .cvs directories for SCV and for .svn files for Subversion, etc

Developers

Once we got the SCM URL we can then query the SCM to get the list of all developers.

Project name

The project name could be the name of the top level directory and the version could be set arbitrarily to 1.0. Actually we could even check ibiblio to see if the project is already on ibiblio, get the latest version there and increase the minor number by one as a first order guess. Another strategy would be to query the SCM and look for tags and deduce existing versions by parsing those tags (there are some usual conventions for naming tags so it should be possible to make a good guess).

Modules and artifacts

Discovering the different modules of a project is probably one of the hardest thing to do. If you look at different projects in the wild I believe there are not that many directory structures out there. Maybe 10-15. Thus it should be possible to register knowledge of these structures and let the tool discover which ones matches the closest with the project at hand. This would also allow to deduce the different artifacts that have to be generated. Of course it won't be perfect as there are projects which generate several artifacts and which may be in the same module. Again it's a question of doing 80% of the job and leaving 20% to be done manually.

Additional information

Of course, the information found above are just guesses. In most cases they could be correct but of course we would need to offer a way for the user to edit them and to add any missing information.

Conclusion

I believe it should be possible to create such an intelligent meta-build project which could be used to generate files for one of the existing build system such as Maven, Ant, etc. For example it could create an internal POM file on which Maven could then be executed to produce the build results. At a minimum such a tool could be used to convert existing projects to Maven. I wonder how intelligent it could be but I guess it could go pretty far.

Disclaimer: Of course, such a tool would be bad from a conventions stand point. One of the great strength of Maven has been to standardize the directory structure of projects. I can go to any Maven project and I know exactly where stuff will, what will be generated, etc.

Are there other information which you think could be guessed automatically? Can you think of better algorithms to guess some of the information shown above?

[ vmassol ] 10:07, Friday, 17 March 2006

Current wikis are great. However when used as development wikis I have found some limitations which are hampering their use. Please note that my experience is based on using Confluence and XWiki and other wikis may support some of the features mentioned below. Here's my top wishlist for development wikis and for Confluence and XWiki in particular:

  • Moderated wikis. Right now there are only two choices for a wiki: either they are open and anyone can edit a page or they are closed wikis and you need to register and get the rights to make modifications. For example most spaces on the Codehaus wiki are closed. They were initially open but vandalism was too high and we had to close them. This is hampering documentation contributions. A moderated wiki would alleviate this: when the page is saved, an email would be sent to a list of moderators for the space for approval of rejection (either by responding to a certain email address as for mailing list moderation or by clicking on a link in the email). Ideally, clicking on the validation link in the email would open the page in a browser with the modifications highlighted so that the moderator could make some changes before clicking on the save button.
  • Anonymous edits. Although this feature already exists, I'd like wikis to add 2 fields when anonymously editing a topic: a user name and an email address. The idea is make it even easier to contrinute to a wiki. If the wiki is moderated as explained above, moderators would receive an email. The idea of the username and email is to allow the moderator/community to discuss with the contributor if need be and to give him credits. These 2 fields would obviously be optional and there should be a text on the page explaining that the email will not get displayed on the wiki and that filling the fields will allow credits/acknowledgment to be given.
  • Diff notifications. Most wikis allow some form of space watch but the wikis I have used still do not offer the possibility to send notifications in a text diff format (wiki markup diff is good enough). For a development wiki, the idea is to send diff notifications to the development mailing list so that all developers are aware of wiki page modifications.
  • Daily notifications. This is also supported in some wikis but what I would like is the ability to watch a single space and to aggregate changes in that space (using the diff notification format mentioned above). Please note that Confluence does not support this as it requires you to modify all other spaces permissions so that the user doing the watch has no view rights on the other spaces, which is not usable for example on wikis such as the one on Codehaus which have hundreds of spaces.
  • In place comments. The idea is again to lower contribution by allowing wiki users to highlight a portion of text in their browser and to associate a comment with it (like a post-it). There would be an option to turn on/off these comments. It's easier for a user to highlight a line and put a comment like "I don't understand this sentence" or fix a typo rather than have to use current the type of comments at the bottom of a page. Note that this is similar to how word processors such as Word allow adding comments to a document.
  • Patch handling. I'd like the ability to make modifications to a page and then instead of saving, have the ability to click on a "Generate patch" button which would generate a text file in wiki markup diff format. Then there would need to be a "Apply patch" action that can be done on a page. This would allow using wikis for project development web sites and allow contributors to provide documentation patches along with code patches. This is currently a big pain when using a wiki as a project development web site.

I have quite a few other suggestions for improvements but I feel those are the major ones when it comes to using a wiki as a project development wiki. Let's hope wiki vendors are listening... :-). Are these also on your wish list?

[ vmassol ] 13:48, Sunday, 12 February 2006

I see 2 use cases where ensuring binary compatibility is a must:

  • When you're developing a framework, i.e. a piece of software meant to be used at an API level by other developers. In that case, breaking binary compatibility is not something to do lightly.
  • When working in a large team it's common to define "interface" projects that represent the contracts to be followed by the different teams. In that case breaking the binary compatibility in an "interface" project is something that has to be planned and organized.

Enforcing binary compatibility in the build

The automated build is a nice place to enforce binary compatibility as the build is something executed by the indiviudal developers before checking-in and it's also executed by the continuous integration build. Thus any binary incompatibility can be quickly discovered. Or course this doesn't replace tests which can also help discover breakages. However the problem is that with all the nice refactoring IDEs we have now, it's easy to refactor the tests at the same time as the code and thus introducing a binary incompatibility is not always noticed.

A good strategy to discover an incompatibility is to compare the current code with the latest released code. This is what Clirr is doing. Clirr currently sports an Ant and Maven1 integration. The good news is that there's a Maven2 plugin in the work (more on that when it's released). However using a tool is only good if there's a strategy behind it.

Strategy for using Clirr

Here is what I believe can be done to automate binary compatibility checks in the build:

  • Start by organizing your packages so that you clearly demarcate the user-public API from the SPI from the internal implementations. You'll probably want to fail the build only on the user-public API (and possibly on the SPI too but that's probably a lower severity).
  • Use Clirr to make your build fail upon violation on the user-public API.
  • After discussing with the team and possibly with users, decide whether you wish to allow the binary incompatibility. Always consider going for a deprecation cycle. If you choose to allow the incompatibility, register it in an exception file that you pass to Clirr so that it builds without choking on those errors (Note: I believe Clirr needs to be improved to better support exceptions not only at the file level but at the violation level).
  • When the release time comes, you'll have a nice file listing all the binary incompatibilities. Include it in the release notes so that your users know what to expect and even better, for each incompatibility add a description that explains how to modify the user code to use the new version of the API.

Note: On the Cargo project we've tried to do this, even though there's still room for lots of improvement. Actually our main issue on Cargo is not detecting binary incompatibilites but rather deciding to release a 1.0 version which would mean that from then forward we would aways look for a deprecation solution rather than break binary compatibility. We've always pushed back this 1.0 release because our API has been changing quite frequently but we're now nearing a 1.0 version. When that comes we'll turn Clirr on to fail the build upon breakage. I'll let you know how it goes...

[ vmassol ] 08:55, Saturday, 21 January 2006

I'm currently writing my third book and I'm starting to notice a pattern. Whenever I write a book about a tool/framework to which I have access to the sources, the code ends up being better.

The way I work goes like this: I start writing about a topic. If it's taking too long to explain it, I consider that something is wrong about the code. I modify the source code so that the document I'm writing has the minimal required size to explain the topic.

The good thing with a book is that what you're explaining has to be simple and not convulted which leads to this nice effect of improving usability of your code. I get a bit of the same result when I write project documentation but not to the same level. This is probably simply because writing a book is a more involved process, you dedicate more time to it and thus you want it to be as perfect as possible (and thus as readable as possible).

I guess nothing here is new. This is all about having a user of your code. Tests are "users" of your code and thus leads to better design. I guess documentation can also be a "user" of the code and thus help improving it.

If you're writing some framework/tool, consider writing a book for it and if you're diligent in your writing your code will end up being better! As an added benefit your users will love you... :-)

[ vmassol ] 18:30, Friday, 4 November 2005

Amazon has released a beta of the Mechanical Turk. It allows a program to programatically ask a question to a human and wait for the answer. Here's an example (copied from Google Blogoscoped):

read (photo);
photoContainsHuman = callMechanicalTurk(photo);
if (photoContainsHuman == TRUE) {
   acceptPhoto;
}
else {
   rejectPhoto;
}

This is really like the Matrix except that the humans get paid a little bit of money (but in the end that's close to getting fed) and it's other humans that controlling the programs... until we have web services using other web services using the Mechanical Turk. Then who's controlling who is going to be hard to decide :-)

Source: Google Blogoscoped.

[ vmassol ] 18:49, Wednesday, 26 October 2005

I'm working on automating a J2EE build using Maven 2 and I'm in need of a Maven 2 plugin to do the following:

  1. load a database schema in the instance
  2. load data in the instance
  3. start/stop a database instance
  4. ability to create an instance from scratch

The ideal situation would be to find an existing Java framework that would already perform all or some of those steps. Then I could easily create a Maven 2 plugin wrapping it. So far I haven't been able to find such a tool. If you know any please suggest them!

Here's what I have found so far below. Please note that I have probably made mistakes while filling this table and I'd be happy to be corrected...

Load schema Load data Start/stop instance Create instance Comments
DBunit  
DDLUtils I think DDLUtils is the old commons-sql project.
Derby ij ij 10.1.1.0 requires the db2jcc.jar which is not on Ibiblio. I need to check the license to see if it could be uploaded.

Again, let me know if you know some tools that are not listed here.

If no such tool exist, an idea I have would be to add support for databases in Cargo. Indeed Cargo is meant for manipulating any kind of containers. It happens that the first type of container we've implemented are J2EE containers but it should work for any other type and the interfaces should remain the same.

WDYT?

[ vmassol ] 18:19, Wednesday, 13 July 2005

I've just tried today Copernic Desktop Search (CDS). I've been using Yahoo Desktop Search (YDS) for several months now and I'm very happy with it. It has some issues though like it's putting my laptop on its knees when it performs indexing, it has no Windows taskbar integration, etc. I wanted to see how CDS fared against YDS.

Here are my findings after one day of using CDS. Please note that this is definitely not long enough to have a definitive opinion on the topic but I thought I'd still share what I've learnt today.

General opinion

CDS is a very good desktop search. I was very impressed. It seemed perfect at first and then slowly I started finding some little flaws compared to YDS. Still it is extremely good. It has all the features you'll find in YDS and Google Desktop Search (GDS).

Pros of CDS vs YDS

  • Integration with Windows taskbar
  • Low resource for indexing. It is not slowing my laptop when indexing. That's very good!
  • Immediate scanning of new resources. If you receive an email for example, it is immediatly available for searching. No need to wait for the next indexing.

Cons of CDS vs YDS

  • No vertical layout for views (as there is in YDS). This means that you cannot fully the message being previewed
  • No "All" categories search. You have to choose the category you wish to search (emails, files, contacts, etc)
  • No as-you-type results
  • No possibility to choose the columns to display (for exemple email folders or email size). There are only a few basic columns
  • Slower to search and display items than YDS. It was very fast initially and it quickly became slow and very slow as indexed items increased
  • XML preview is using IE engine on Windows and thus there are lots of XML files that don't display correctly

Some minor details:

  • Delete key does not work to suppress an email
  • Cannot select different emails (to suppress them for example)

Conclusion

If only it could have a better view layout and be faster to display results it would be perfect. Its killer features are really its CPU-friendly indexing for me and the immediate availability of new resources in searches.

I've just noticed that YDS has released verson 1.2beta yesterday and I'm installing it. For now, I'll still keep using YDS which is still my favorite. YMMV.

[ vmassol ] 15:22, Saturday, 30 April 2005

Clirr is one of these tools that would deserve to be known better. I have mentioned it several times in other posts but it's really the first time I get to use it in real. It rocks! I'm about to release Cargo 0.5 and I wanted to get an exact list of the API modifications we have done compared to version 0.4.

Here's the kind of output Clirr gives (the full output is available here):

ERROR: 8001: org.codehaus.cargo.deployment.DefaultJarArchive: Class org.codehaus.cargo.deployment.DefaultJarArchive removed
INFO: 8000: org.codehaus.cargo.module.DefaultJarArchive: Class org.codehaus.cargo.module.DefaultJarArchive added
ERROR: 7002: org.codehaus.cargo.container.Container: Method 'public void addDeployable(org.codehaus.cargo.container.deployable.Deployable)' has been removed
INFO: 7011: org.codehaus.cargo.ant.ConfigurationElement: Method 'public void addConfiguredEar(org.codehaus.cargo.ant.EARElement)' has been added
INFO: 4000: org.codehaus.cargo.container.jetty.JettyStandaloneConfiguration: Added org.codehaus.cargo.container.configuration.StandaloneConfiguration to the set of implemented interfaces
ERROR: 7005: org.codehaus.cargo.container.Container: Parameter 1 of 'public void setConfiguration(org.codehaus.cargo.container.Configuration)' has changed its type to org.codehaus.cargo.container.configuration.Configuration
ERROR: 7006: org.codehaus.cargo.ant.ConfigurationElement: Return type of method 'public org.codehaus.cargo.container.Configuration createConfiguration(org.codehaus.cargo.container.Container)' has been changed to org.codehaus.cargo.container.configuration.Configuration
ERROR: 4001: org.codehaus.cargo.container.jetty.JettyStandaloneConfiguration: Removed org.codehaus.cargo.container.Configuration from the set of implemented interfaces
INFO: 7003: org.codehaus.cargo.container.spi.AbstractConfiguration: Method 'public void configure()' has been removed, but an inherited definition exists.
ERROR: 5001: org.codehaus.cargo.container.deployable.EAR: Removed org.codehaus.cargo.util.MonitoredObject from the list of superclasses
INFO: 5000: org.codehaus.cargo.container.deployable.EAR: Added org.codehaus.cargo.util.monitor.MonitoredObject to the list of superclasses
ERROR: 7012: org.codehaus.cargo.container.Container: Method 'public java.io.File getOutput()' has been added to an interface
INFO: 7010: org.codehaus.cargo.container.spi.AbstractContainer: Accessibility of method 'protected java.io.File getOutput()' has been increased from protected to public
INFO: 6000: org.codehaus.cargo.container.property.GeneralPropertySet: Added public field JVMARGS

Even though we're using JIRA with an Iteration-Driven Development strategy (IDD) it was still a very interesting exercise to verify that we had not missed any issue by running Clirr on the source code. In addition, it provides a more detailed view of what exactly has changed in term of API which our JIRA report does not provide.

The next step would be to use it to fail our build whenever someone introduces a public API break. It would be quite easy for us because we've cleanly separated non-public API from public APIs by using internal packages (see the Cactus API design rule to see what it means). Of course sometimes, you want to voluntariy add a breaking change. That's legitimate but it has to be controlled. The strategy would be to have the build fail and then if the change is voluntary to exclude it from Clirr.

Well done Lars!

[ vmassol ] 16:29, Thursday, 7 April 2005

Where is Ant heading in the future? I would be very interested to learn more about this. I've been using Ant for several years now and I've always been a happy user. However these days, I'm no longer using much the XML scripting side of Ant but I'm using heavily the Ant Java API; what I'm interested in are the Java Ant tasks.

I think this is really where the value of Ant is. All those years of implementing the base building block for a portable OS Java API have created a very useful Task set. I think every Java application that needs to do copying, deleting a directory, spawning a Java application, etc should use these tasks. There's no point in reinventing the wheel!

For example, you may think that deleting a directory is simple. But it's not so easy. Have a look at the Delete Ant task source code. You'll find portion of code like this one:

/**
 * Accommodate Windows bug encountered in both Sun and IBM JDKs.
 * Others possible. If the delete does not work, call System.gc(),
 * wait a little and try again.
 */
private boolean delete(File f) {
     if (!f.delete()) {
          if (Os.isFamily("windows")) {
               System.gc();
           }
          try {
               Thread.sleep(DELETE_RETRY_SLEEP_MILLIS);
           } catch (InterruptedException ex) {
               // Ignore Exception
           }
          if (!f.delete()) {
               if (deleteOnExit) {
                    int level = quiet ? Project.MSG_VERBOSE : Project.MSG_INFO;
                    log("Failed to delete " + f + ", calling deleteOnExit."
                        + " This attempts to delete the file when the ant jvm"
                        + " has exited and might not succeed."
                        , level);
                    f.deleteOnExit();
                    return true;
                }
               return false;
           }
      }
     return true;
}

Would you have thought about this? Probably not and you would have been right not to as this only happens in some rare occasions. But when one of your users reports it, it's going to be darn difficult to identify and fix. Personally I'd rather depend on a stable and well tested library rather than recode it myself.

The problem is that the Ant tasks are a bit too much linked to the execution engine (the XML scripting engine). For example reusing an Ant tasks requires you to create a Project object. This in turn drags loggers, the Ant classloader (in some cases) and possibly other objects. I know it's possible to use Ant from Java (I've been doing it for a long time now) but I'd love it be even easier to do so.

Instead of writing:

Project project = new Project();
Expander expander = project.createTask("unzip"); 
expander.setSrc(new File(zipfile)); 
expander.setDest(new File(destdir)); 
expander.execute();

I'd like to be able to write:

Expand expand = new Expand();
expand.setSrc(new File(zipfile));
expand.setDest(new File(destdir));
expand.setLogger(myLogger);
expand.execute();

I don't want to see the get/setLocation, get/setTaskName(), get/setDescription() and in general all methods from Task.java.

What I'd love to see is Ant moving in the direction of providing completely reusable Tasks that have 0% dependencies on the Ant engine. This means that loggers, classloaders would be passed to the Ant task by the program who uses it.

I'd like to see Ant provide 2 distributable jars: one containing the XML scripting engine only and one containing all the pure java beans Ant tasks that can be reused in any Java application.

I'd like to see Ant separate into 2 subprojects: one for the XML scripting engine (let's call it engine) and one for the Ant tasks (let's call it tasks). The reason for the 2 projects is to ensure there's no dependency in the direction tasks->engine.

I'd like to see Maven2 use those completely reusable Ant tasks instead of recreating them (this is a wish I'm addressing to both projects, not just Ant! :-)).

I'd like to see those Ant tasks being a JSR and incorporated in a future version of the JDK, thus providing a higher level API that the best classes from the JDK.

Is that where Ant is heading today?

[ vmassol ] 11:21, Friday, 11 March 2005

I may be dense but I've just realized today that there is a potentially simple way to increase participation to an open source project. That's always been one of the questions on my mind: how do I make my open source projects more successful? For me a successful open source project is one which has a rich developer community. How do I make this possible? There are of course several ideas to make this happen but the one that dawned on me this morning is that the project has to reduce its complexity (by making it more modular for example).

Indeed, the barrier to participation is often due to the fact that a user who wants to participate will need to understand the whole design, how the different classes are entangled, what effect a change here will have on the rest of the project, etc. Thus, if we make the project more modular a contributor who wants to participate will only need to understand the design of a given 'module'.

A 'module' would need to have some good-to-have characteristics:

  • Very loose coupling with other modules
  • Clearly defined and *published* interfaces. There should be some documentation on the project's web site explaining them and a tutorial showing how to implement new modules (or swapping a module implementation by another one) for example.
  • Separate builds so that it's easy to build only the module (this can be alleviated if the master build is easy to use (i.e. no property tweaking necessary, it just builds - As it's the case with good Maven builds... ;-))
  • Separate documentation on the web site, so that the website itself is modular and the complexity of each module is hidden in that module's web site. Thus the top level web site would be quite simple only listing what the project does as a whole and listing the different modules

Interestingly one way to implement the 'very loose coupling with other modules' characteristic is by using a Service Architecture. This can be done for example by using the Dependency Injection pattern and/or using a lightweight container - PicoContainer, Spring, etc).

This is probably obvious stuff but I've just realized that it's not only good design practices but that it'll also help open source projects attract more contributions. Of course that leads to another topic which is when to accept contributions and how to maintain them in the long run but that would be another discussion...

[ vmassol ] 09:28, Friday, 11 March 2005

Continuing with my current build-mania, I'd like to propose the idea of a distributed build architecture. I'd love to see my favorite continuous integration tools (CruiseControl, DamageControl and Continuum in the future) support this notion in the future (I know they're thinking about it already!).

So what is the need for a distributed build?

I can see several use cases:

  • building on several JDKs
  • building on different OS platforms
  • building with different environment setups (for example building with different application servers, different browsers, different databases, etc) to validate that a product integrates well with various environment setups
  • delegating the build load on several machines when the build starts to take too long (of course, the first solution should be to try to lighten your build as much as possible)

A proposed architecture

Disclaimer: this only ONE potential solution. There are lots of other solution probably even more valid than this one. Please feel free to add your ideas as comments to this post.

It could work as follows:

  1. The central build machine (aka the build orchestrator) decides to start a build. The orchestrator can be one existing continuous integration tool like CruiseControl, DamageControl, etc. They can trigger a build on anything they want: time-based, change-based, manual, continuously, etc. The orchestrator sends a build request to the space. The request contains all the information about the requested build (e.g. JDK to run on, OS to run on, App.Sever/DB/etc to run on)
  2. The space holds all requests. It chould be a good idea to provide a browser to see pending requests (preferably using a simple HTTP browser so that people who wish to contribute can see what type of builds are required). In any case it's important that the space be transactional (Note: I'm not sure about the word "transactional". What I mean is that a request cannot be read by several build agents at the same time)
  3. Build agents listen on space build requests objects that match their capabilities. Using Jini/Javaspace would be nice here because (among other things) agents would be able to easily listen to requests with Jini attributes (OS, JDK, etc). Once they read a request they start a local build and publish the result to the space as a Result object
  4. The build orchestrator listens to Results object, and generate result reports, aggregating all results. Build results could contain anything required: result of the build, logs, generated artifacts, etc. The orchestrator gets the data from the Result object and perform usual build operations (publishing, build result notification).

Of course there would be several details to sort out, like should we send 2 Requests object for each build need so that we can compare the results and only accept the result if they match, etc.

Conclusion

I think this type of distributed build could be especially interesting for open source projects in order to build an active community around a project. This would be yet another way in which people can contribute to an open source project: by lending some of their machine CPU to perform continuous integration builds of this project. This usually makes sense as open source projects may be low on hardware resources and lending some would help. Of course it also bring its challenge of security issues that would also need to be implemented...

Would you like such a distributed build system? I personally prefer this architecture over one where the orchestrator directly sends build requests to build agents as I find it more scalable and more flexible.

[ vmassol ] 10:44, Wednesday, 12 January 2005

The concept

The typical local builds that developers run on their machines work by building the subproject they're working on but also all the dependent subprojects it requires. Usually, as building all dependent subprojects takes a lot of time, the developer infrequently checks-out other project sources and build them on demand. His focus is on his subproject that he's making modifications to (and rightly so!). This strategy has the following drawbacks:

  • Setting up the build on a new fresh machine is complex and takes time. Indeed you have to check out all the top level project sources and build all projecets one by one until you reach the subproject you're concerned with.
  • It doesn't scale too well. Your local build starts taking tens of minutes which does not encourage running it that often. And if you do, you don't rebuild all the subprojects even though there are probably lots of changes that have been made by other coworkers. Thus, you're increasing the possibility of an integration break (breaking your other coworkers when they integrate your changes).
  • When someone from another team inadverently breaks your project's build, you'll have to switch context (i.e. stop what you're doing) and help out to restore the master build. If this happens unfrequently, it's probably fine and even positive (as it increases team collaboration ;-)). However when it happens frequently (which is bound to happen as the team grows), you'll start suffering from it...

Because of all these problems, I have been using a different approach on my current project for the past 2 years. This was mostly motivated by the fact that the project is a big project (close to 100 developers) and we were hitting the issues mentioned above. I have called this strategy "Binary Dependencies Build". If you're interested this is an approach I have presented both at TSSS2004 and at Javapolis 2004.

Here is how it works (click on image for a larger picture):

Imagine that you have a "trading" subproject that depends on 2 other subprojects ("partners" and "referenceData"). The idea is that your local build will NOT build them from sources but instead will download their latest version that work from a remote artifact repository (a location where the result of the subproject build is located). In order to accelerate even further the build, the versions downloaded are stored locally. In our example, the latest "partners" jar is already available locally and is thus not downloaded but the "referenceData" one is not. It is downloaded and then stored locally. The "trading" subproject is built using these binary dependencies.

This is all fine but there is a burning question: How do I do continuous integration with such a system? Won't the binary dependencies be old versions when I get them? The solution to this is to have a continuous build server that continuously build subprojects and puts their artifacts in the remote repository. Note that there are put in the repository only if their build passes with no errors. This ensure that there are always fresh versions available and that they are as "good" as they can get.

Doing it with Maven

The good news is that this feature is built in Maven. Maven implements this support of artifact repositories (local and remote) and it supports the process of automatic download of artifacts not available in the local repository. Usually Maven will verify first in the local repository if the artifact's version exists and if so will use it. However, if an artifact's version contains the "SNAPSHOT" keyword, Maven will always check if there's is a more uptodate artifact in the remote repository. This allows implementing easily the strategy defined above.

Conclusion

We've been very happy with this solution so far. I think there are 2 key points in making this work:

  • A good build that provides assurance that the binary artifacts are working. Indeed we've experienced that our subproject build was not always good enough to qualify how "good" was a jar artifact. This was usually caused by the non-existence of automated functional tests which meant that even though the build was passing the jar was not working when executed on the developer's machine. The solution is of course to include integration/functional tests in the build (at least the master CI build).
  • A quick master build. It's important that it generates fresh jar artifacts as quickly as possible so that CI can happen as often as possible.
[ vmassol ] 12:28, Friday, 31 December 2004

Here's a non-ordered list of the main problems causing build-breaks that we had found on the current project I'm working on (Note that this list is now a year old and that we have fixed some of them - Unfortunately the majority still remains...). I've added some possible ideas on how to fix them.

  1. Build takes too long to execute (and thus it is executed less often)
    • Fix the build by having more subprojects with binary dependencies and/or streamline the build to ensure that only important build steps are run. Optimize it (f.e. offer different goals/targets: one for a clean build and another one that does not perform a clean).
  2. Local build not executed
  3. Public API breakage in dependent project without warning
  4. Not enough continuous commits (all packed up at end of iteration)
    • Team meetings to explain more the importance of continuous integration. Complementary idea: "unbreakable builds". The idea is that if you keep your changes to yourself and accumulate them, whenever you'll want to commit them, the unbreakable build will likely reject your changes as they will break some other part of the code. Thus you'll need to spend several days to talk to other developers to not only fix your code but also fix theirs. Normally after doing this several times, you should understand that it is in your best interest to commit frequently.
  5. No functional/integration automated tests (f.e. no local verification of ejb-jar deployments)
    • Automated functional tests! Build a suite slowly over time, improving it at each iteration. And maintain it! Decide on a good data handling strategy (this is usually the main issue). Ensure that your data strategy keeps everyone in sync WRT DB data.
  6. Commit problems (Forget to commit some files, Pb due to SCM tool - Starteam: new directory do not appear in Starteam view!)
  7. Devs “building” with IDE but forgetting to use the automated build
  8. Checkstyle errors failing the build
    • Coaching. More team meeting to decide what checkstyle errors we want to fail the build or not. Get a strong team buy-in. Complementary idea: "unbreakable builds".
  9. Failing unit tests
    • It probably means that the unit tests are actually integration tests depending on database data. Ensure that unit tests are quick and fast and independent of the environment. Complementary idea: "unbreakable builds".
  10. [Maven] project.xml not up to date and missing dependency
    • SCM diff emails on check-ins (team by team) in order for everyone to have the knowledge of what's happening. Complementary idea: "unbreakable builds".
  11. Database data modifications (voluntarily or involuntarily) leading to test breakage
  12. Continuous build not cleaned between different runs
    • Fix it. Perform a clean build from time to time.
  13. Local SCM update not done before local build (in order to get the latest files)
    • SCM diff emails on check-ins (team by team) in order for everyone to have the knowledge of what's happening. So you'll know better when to update our local workspace. Complementary idea: "unbreakable builds".
  14. Environment differences in local build vs central build
    • Work continuously towards making the developer's environment as close as possible as the integration environment. Complementary idea: "unbreakable builds". This allows executing the build on the server and thus it runs in the same environment as the continuous build.
  15. No local deployments done before commits (f.e. no EJB deployments)
    • Coaching (in order to ensure that developers do perform deployments on their machines before check-in) + add some checks in the build to automate the verification (they can be f.e. some hand-picked functional tests).
  16. Checkstyle errors hidden in tons of warnings
    • Fix it. Newest versions of Checkstyle allow filtering on severity.
  17. Non-atomic commits and central build starting with in-flight commits
    • Use a scheme a la CruiseControl (wait for some inactivity time on the SCM before trigging a build). Or change the SCM (for subversion for example). Note: We have tried to use CC with StarTeam but even though the infrastructure team increased CPU + RAM, StarTeam falls when it is polled by 3 or 4 CC builds in parallel... (Solution: Dump ST or ask Borland to come and tune the parameters). Complementary idea: "unbreakable builds". This forces "atomic" commits.
  18. [Distributed development] rsync issues: sometimes jars are corrupted or lost
    • Fix the rsync process (Note: this is now no longer happening I believe)
  19. [Distributed development] VPN instability making it difficult to SCM-update
    • Fixed mostly. However usage of Starteam is still extremely slow making hard to SCM-update from remote. Solutions: Use a less bandwidth/responsetime consuming SCM (f.e. Subversion), increase bandwidth (but the issue is mostly with response time which cannot be changed), or use a replication mechanism (I don't like this as I believe it introduces its own issues - I really much prefer everyone working directly on the same repository, especially as I know it works (I've done it in the past using CVS with a team of 30 developers and it was working fine).
  20. Errors when executing the application
    • This is because there are no automated functional tests. Automate them!
[ vmassol ] 15:18, Wednesday, 29 December 2004

Let's create Unbreakable Builds

Out of my last two development projects, one had a strong sense of quality and excellence in general and continuous build failures were the exceptions (about 3-4 per week for a 30 developers team) and the other one was quite the opposite and everyone was surprised when the continuous build was passing (there were about 5 build breaks a day as an average for a 40 developers team). I'm sure this is also pretty common to other projects. Obviously the best is to build (pun intended) a build awareness in the team. However, you'll need strong evangelists for this to happen who may not always be available and other circumstances may make this difficult.

A thought struck me about a year back: what if we were able to prevent the continuous build from failing by design. There's a French saying that goes something like "it's better to cure than to heal". I think this is definitely a good idea to apply to continuous build failures. Why not make a continuous build system that cannot fail. At that time I thought it was a nice idea (I had meant to blog about it but I forgot) but I could not see very well how it could work. Now a year after, I really think it's a nice idea and I'd like to explore it.

The architecture

A potential basic architecture is shown in figure 1 (click to get a larger picture).

The general principle is to catch the commit data before they get committed to the SCM, to perform a build and to perform the actual commit only if the build is successful. Here are the detailed steps:

  1. The developer performs a commit using his favorite SCM client tool. Note that it is best if the tool is able to perform the commit asynchronously so that the developer can continue working on something else.
  2. The committed data are intercepted using a pre-commit hook script (all modern SCM support this). This script is in charge of doing 2 things:
    • Finding out the list of projects to be built. Indeed, say that the commit contains 5 files belonging to 2 different projects. We need to rebuild these 2 projects. The algorithm for finding out the projects to which belong the changes sources can be as simple as a mapping between the file paths (which contains the project name) and the project name.
    • Creating a build job and pushing it on a queue. The reason for the queue is that building all the projects on the machine that hosts the SCM is not going to be scalable. We want the SCM to be as responsive as before. Hence the queue.
  3. We need build machines to perform the actual build. They could be dedicated build machines that build continuously the build jobs. There could also be developer workstation. The concept is to have one or several build kicker applications installed on those machines. The "continuous build kicker" will continuously get a job from the build job queue and build it, whereas the "idle build kicker" will only pick a job to build when the machine is idle (hey, look around you and see how many machines are unused because the people are either on holiday, sick, in a meeting, etc. That's a lot of power).
  4. The build kickers start by updating their workspace to have the latest files for the projects associated with the changes files. Then they try to "merge" the changes files in their workspace (note: this may be the tricky part to implement unless the SCM offers a way in the pre-commit hook to get the full file - I need to explore this). If they cannot succeed they stop with an error message that flows back to the user. This can happen if someone else has been working on the same source and their change has made it to the SCM before ours has. If the merge succeeds, the build kicker starts the build. The build hasa to be relatively quick so you should not build all the projects. I suggest building the modified projects and the ones that directly depend on them so that an API break can be detected (more on that below)
  5. When the build is finished (or if an error occurs), the build kicker sends the result back to the pre-commit hook (using a RPC mechanism for example).
  6. If the result if positive, the pre-commit script either performs the real commit to the SCM
  7. The resulting message is returned to the user. In case of error the user would see for example the build console log

Advantages

Here are the following advantages of such a system:

  • Does not break other developers upon a build failure. All developers can work uninterrupted even though they can still work on HEAD in a continuous integration fashion
  • Lowers the effort required to get a CI system working thus it helps teams adopt CI
  • Prevents breakage of APIs. Indeed in step 4 above, we've mentioned that a good strategy is for the build to build not only the projects that have changes but also all projects that directly uses those projects (one level). This will allow detecting unwanted API breakages.
  • Increase self-confidence when committing which (I hope) will make it easier to get developers to commit continuously
  • Allows continuing working on one's own machine (instead of having to wait for the current build to free the CPU which is being used at 100%!). You know get your own PBS (Personal Build Server)
  • Forces atomic commits!

Questions/Issues

I'm sure you're now burning with tons of remarks/questions showing why it wouldn't work :-) Here's what I've currently thought about. If you have any opinion or other questions, I'd love to hear them.

Q: What happens if someone else also commits a change to the same file?

It works in the same way as usual. The build kicker will try to "merge" the changes after having done a workspace update and if it cannot, the user will get an error explaining that the merge failed. The user will then need to perform an update on his local machine and resolve the conflict.

Q: Imagine I perform a commit and I start working on a new feature. Then my commit is rejected because of a failure. How do I fix this without loosing my current changes?

Answer 1: This is actually relatively similar to what you're currently doing. Imagine you're committing something. Then you start working on something new and the continuous build tells you 2 hours later that your change has broken something. The difference is that your changes have been committed so you can easily create a new workspace and fix it there. We could do the same here by having the pre-commit hook actually make your changes available through a URL (sent in the commit answer) as a patch so that it is easy for you to apply it to a fresh new checkout.

Answer 2: You wait till the build is finished on the server. You can perform other activities like documenting, reading, thinking, designing, writing new classes, new tests, etc. Basically you work on stuff that do not conflict with the past changes. Actually this is probably what you're currently doing when your build is running as it is eating all your CPU...

Q: Doesn't it take too long to build?

You need to ensure your build is taking as little time as possible. I think 5-10 minutes should be ok. The best way to achieve this is probably to use binary dependencies instead of rebuilding dependent projects (a la Maven), except maybe direct dependencies. You'll still need a continuous build running continuously to produce fresh binary dependencies. I guess it's also best to use an SCM client that can do asynchronous commits in order to let you continue working while the commit is in progress.

Q: What if I want to modify an API but I wish that each projects modifies its own files?

Several options:

  • You could go through a deprecation cycle.
  • You could be doing the refactoring on one machine only (not always possible)
  • You could also plan it. Anyway an API breakage has to be planned with communications. Thus you could say: on that day, at such hour we're going to be committing this break and we have 1 day to fix all our dependent projects. When this happens you can turn off this "unbreakable build" feature for the day.

The interesting point here is that you *want* the API breakage to be detected as the default instead of the opposite.

Conclusion

It seems to me this would be particularly useful on big projects with lots of developers. It should also be useful to introduce continuous integration on an existing project as it lowers the discipline required by everyone. Obviously this is just an idea that I haven't tested yet. I'm very keen to see this in action. If any of you has any experience please share it. I'm planning to spend some time trying to implement it. If you're interested to help out, let me know too.

[ vmassol ] 11:11, Saturday, 4 December 2004

When working using a Time-boxing approach with JIRA there are some typical issue-smells that I have noticed appear frequently. In order to perform good deliveries it is important to fight them.

  • Issue smell 1: Too many unscheduled issues. This means that new issues are not assigned to iterations, i.e. that they are not planned to be fixed.
  • Issue smell 2: Open issue from past iterations. Any issue that is left from a previous iteration has to be rescheduled so that everyone knows when it is planned to be fixed. If some portion of the issue has been done, I've found that it is usually best to split the task into 2, so that the work done in the iteration it was scheduled is clearly shown in the release notes for that iteration and the unfinished part can be scheduled in a future iteration.
  • Issue smell 3: No iterations in changelog view. This means that past iterations that are finished have not been JIRA-released. The good thing about releasing an iteration is that it forces to solve the unfinished issues (see Issue smell 2). In addition it allows cleaning the roadmap view that becomes less cluttered by all past issues and which gives a clear view of what's left to be done. Last it provides an important feeling of achievement.
  • Issue smell 4: Issue types in issue description. I have often noticed that some JIRA projects were using some description conventions for some issue types. For example, using XXX - Code review for a code review issue on the XXX feature. In that case, a real JIRA issue type should be created. The reason is that by defining a proper JIRA issue type, it is now possible to perform operations on this new issue type: it will appear properly in the release notes under its own category, it can be searched for, etc.
  • Issue smell 5: Issue status are not in sync with the reality. This is often a big problem (especially with distributed teams) as people usually rely on JIRA to provide an exact view of the progress. If issues are found not in sync, there's a tendency to not "trust" JIRA anymore, which in turn leads to less using it and loosing visibility. One good strategy is to do Issue Driven Development (IDD). It goes like this: When a task is done and just before the code is checked in, ensure that the corresponding JIRA issue is marked as Resolved/Closed. If there's no issue, create one (unless the modification is a really minor one that the user should really not be concerned with). Then check in the code mentioning the issue number in the checkin comment (that allows for example using the JIRA CVS/Subversion plugins). Note: If you're using CVS/Subversion you could write a quick pre-commit hook that verifies that each comment has a reference to a JIRA issue.
  • Issue smell 6: Lots of resolved (but not closed) issues. Most projects I have seen do not use a Resolved state. However, people often mark the issue as resolved but not closed and the issue stays in this state for ages without anyone doing anything about it. So either remember to directly close issues or if you're using JIRA 3 create a custom workflow that do not have a Resolved state (if you're not using the resolved state of course!).

Let me know if you have found other important issue-smells when using JIRA!

[ vmassol ] 19:43, Wednesday, 17 November 2004

Introduction

Automated tests are good. Automated Functional tests are even better as they are the proof that your application is working. In addition, with automated functional tests you can also automate your delivery process. However, writing automated functional tests is hard. The main reason it is hard is because you need to control your execution environment (database, application server, etc).

Cargo is a framework that you can use to automatically install, configure and execute J2EE containers. Thus it allows you to control your execution environment (for the J2EE container at least) and permits completely automated functional tests for J2EE applications.

Example

Let's walk through an example. Imagine we wish to start up Tomcat 4.1.31 before such test runs. Here's what we could write:

public class MyTest extends TestCase
{
     private Container container;
 
     protected void setUp()
     {
          // (1) Optional step to install the container from a URL pointing to its distribution
          Installer installer = new ZipURLInstaller(
              "http://www.apache.org/dist/jakarta/tomcat-4/v4.1.31/bin/jakarta-tomcat-4.1.31.zip");
          installer.install();
  
          // (2) Create the Cargo Container instance wrapping our physical container
          container = new Tomcat4xContainer();
          container.setHomeDir(installer.getHomeDir());
      }
 
     public void testSomething()
     {
          // (3) Statically deploy some WAR
          Deployable war = container.getDeployableFactory().createWAR("src/testinput/my.war");
          container.addDeployable(war);
  
          // (4) Start the container
          container.start();
  
          // (5) Perform any test you wish here
          [...]
      }
 
     protected void tearDown()
     {
          // (6) Stop the container
          container.stop();
      }
}

Step 1 is optional. You can also rely on the container being already installed on the test machine if you wish. However, it's nice to completely automated the testing and assume nothing (or very little - We still need an OS and a JDK on the machine). In this example we're fetching the Tomcat 4.1.31 installation from the web. We could fetch it from our intranet or from a location on the machine or from our SCM.

In Step 2, we have not told Cargo what container Configuration to use. Thus Cargo will use a default Configuration and it will configure it so that your container will execute in a temporary directory that it will create in your OS system tmp dir. If you wish to control this you can use:

Configuration configuration = new CatalinaStandaloneConfiguration(container, "target/tomcat4x");
configuration.setProperty(ServletPropertySet.PORT, "8080");
container.setConfiguration(configuration);

in step 3, we create a Cargo wrapper around a physical WAR and we add it to our container so that it is deployed when the container starts.

We then start the container (step 4), perform any testing we wish (step 5) and ensure the container is always stopped at the end of our test (step 6).

If we wish to start and stop the container only once during our whole test suite we can use a standard JUnit TestSetup.

Conclusion

This is just a short introduction to Cargo to demonstrate how easy it is to start/stop a container. The API is of course richer. Also, we're showing here how to use Cargo for functional testing of J2EE application but Cargo is also meant to be used by any application that requires a container to be up and running. It could also be used by IDE plugin writers, etc.

For more information on Cargo, please see the Cargo website and join us on the Cargo mailing lists. You'll be warmyl welcomed! :-)

[ vmassol ] 15:50, Friday, 5 November 2004

How often are you trying to debug some Java application to find that you can't continue your debugging easily because the code is entering into some third-party library?

At that point, either the library is open sourced and you can rush to download the source, modify the code to add some System.out.println and spend 3-4 hours to find out how to rebuild the project.... or it's not open source and then there's much that you can do except trying to find out the reason with your sheer brain power!

How good would it be if there was an application (let's call it a Logifier) at which you could throw a jar and it would return a new aspectified jar on which it would have weaved some Logging aspect that you could configure!

This would allow us to realize the full power of aspects: an external Java application that was not built with logging can now be converted to log things for us...

So who wants to be the first to build such a handy application? :-) Does it already exist?

Update 7/11/04: I've just remembered reading about AntFlow on TSS. That would be an excellent way of implement this. Imagine a hot folder called "logifier" and any jar you drop in there is automatically logified using an AspectJ/AspectWerkz/etc Ant task and a common logging aspect such as this one! Now that would be cool. It could be a good coding exercise for the next OSSGTP.

[ vmassol ] 10:34, Monday, 1 November 2004

Analysis

I've just tried Omea Pro (build 353) and I've got to say it's very promising! It's hard to explain what Omea is... I think it can be viewed as two things:

  1. A search tool that aggregates all data from your computer (all files types including PDF, Word, Excel but excluding PPT, Outlook emails, ICQ/Miranda conversations, Outlook Tasks, Outlook contacts, RSS/Atom feeds, Newsgroups, etc)
  2. A productivity tool that you can use instead of all your different tools for managing all your incoming data (mails, files, feeds, newsgroups)

After using it for 2 days, here are the pros I have found:

  • The searching feature is excellent. I find it much better than Google Desktop or Lookout in term of relevance, breadth of search and organizing the results

Here are the thing to improve I have noticed (please remember that it is beta software):

  • It's resource hungry: after a few hours of using it, it easily reaches 300MB and more. It's also a little bit slow.
  • It has not reached a level where it is at least as good as the tools it gets its data sources from. For example, it's not as good as Outlook, it's not as good as a dedicated RSS feed reader, etc. I believe it will never be able to be as good as those specialized tools. JetBrains has recognized this by trying to make it bi-directional data-wise (your changes from Omea are reflected in Outlook and vice-versa). However this doesn't work for Newsgroups and RSS feeds for example.

Now the real questions is how should I use it? As a search tool? But then it's a bit heavy to be left sitting idle on my desktop. And it's too heavy to start it on demand (it currently takes 30 seconds to 1 minute to start - I'm sure it'll be improved in the future). As a productivity tool? Possibly, although it's missing some of the features I use in my specialized tools. For example:

  • I use the "Reading pane - Right" view of Outlook which gives me 3 vertical panes next to each other. Once you get used to this, it's hard to go back. I'm told this will be in the next version of Omea Pro.
  • I use NewzCrawler which let me see only feeds which have unread items in them, feed by feed (I don't read all blogs at the same speed).

Conclusion

I currently don't think I'll be able to keep Omea open all the time as it's too heavy to have both Outlook and Omea open at the same time. Also, I don't like the fact that I have to stop using my favorite Feed reader (NewzCrawler). I really do not want to manage 2 feed tools and tell each one which feeds I have already read. Of course, I could simply not use the feed feature of Omea. However I feel that using Omea just for searching makes it loose a lot of its attraction. If I just want a search tool, I can use Goodle Desktop or Lookout (even if they are less powerful they are probably good enough for my daily needs).

I think the real challenge for JetBrains is to make the tool good enough in each domain (mail handling, feed reading, etc) so that it can be used instead of the dedicated tool. That means that you would use Omea for day to day activities and the specialized tool from time to time only when you need one of the power feature. Of course, this is a huge challenge for JetBrains and honestly I am not sure if it is achievable.

Anyway, the tools is promising and intriguing enough so that I'll follow the different builds to see how it evolves.

If you're using it, please drop me a note on how you use it and how you handle it vs the specialized tools. Thanks

Update 04/11/2004: I've just tried build 358 and I am extremely pleased to report that the memory consumption has decreased a lot: whereas it wasa before 300MB for me, it's now 140MB. Good job!

[ vmassol ] 10:22, Thursday, 30 September 2004

On one of my projects at work we have moved to JIRA 3 (beta). We moved to benefit from the new custom workflow feature. Unfortunately it was missing one key feature we wanted: the ability to send notification emails on custom workflow transitions (I've just been told by Atlassian that this is a feature they're currently working on). To remedy this and thanks to Atlassian's support, I've decided to delve in the JIRA Java API and develop a workflow function plugin to implement email sending.

I have to say that JIRA 3's extensibility is great! JIRA can almost be seen as a full fledge foundation for developing project tracking applications, in the same spirit as Eclipse is a full fledge foundation for developing java applications (RCP). Both Eclipse and JIRA come with a default application using this API to demonstrate their power (the IDE for Eclipse, the issue tracker for JIRA). Note that the new plugin system in JIRA has several similiarities with the Eclipse plugin architecture. Of course, I'm sure JIRA still has a lot of ground to cover to expose a plugin API covering all domains of issue tracking (i.e. allowing to replace all parts of the JIRA issue tracker) but it's going in the right direction.

Here is a short tutorial on how to develop a workflow function plugin. Note that you should also check the Atlassian tutorial on how to develop plugins.

The source code is available here and the plugin jar is available here.

Setting up the project

Here's the directory structure I have chosen for my plugin. Please also note that I have used Maven to perform the build (extremely easy to setup as Atlassian is also using Maven and they have all their jars in a Maven remote repository on http://repository.atlassian.com).

A plugin is composed of several files (it is packaged as a JAR at runtime):

  • A plugin descriptor (atlassian-plugin.xml )
  • Java source files
  • Velocity templates for the plugin UI (the *.vm files)

The project.properties file simply adds the Atlassian Maven remote repo to the list of repos searched by Maven to download dependencies:

maven.repo.remote=http://repository.atlassian.com,http://www.ibiblio.org/maven

The project.xml contains the required JIRA dependencies and the definition of resources to include in the generated jar. Here's an extract:

<code>[...] <dependencies> <dependency> <groupId>atlassian-jira</groupId> <artifactId>atlassian-jira</artifactId> <version>3.0-beta</version> </dependency> <dependency> <groupId>osworkflow</groupId> <artifactId>osworkflow</artifactId> <version>17Aug2004</version> </dependency> <dependency> <groupId>propertyset</groupId> <artifactId>propertyset</artifactId> <version>1.3</version> </dependency> [...] <build> <sourceDirectory>src/main</sourceDirectory> <resources> <resource> <directory>src/etc</directory> <includes> <include>atlassian-plugin.xml</include> </includes> </resource> <resource> <directory>src/etc/templates</directory> <includes> <include>**/*.vm</include> </includes> </resource> </resources> </build> </code>

Generating the plugin jar is as simple as typing maven jar .

The Worflow Function plugin extension point

Here's what the atlassian-plugin.xml plugin descriptor contains:

<code><atlassian-plugin key="sendmail.jira.plugin.workflow.sendmail" name="SendMail Plugin"> <plugin-info> <description>Plugin for sending emails on custom workflow transitions.</description> <version>1.0</version> <application-version min="3.0" max="3.0"/> <vendor name="Vincent Massol" url="http://blogs.codehaus.org/people/vmassol/"/> </plugin-info> <workflow-function key="sendmail-function" name="Send Notification Mail" class="sendmail.jira.plugin.workflow.SendMailFunctionPluginFactory"> <description>Sends a notification email.</description> <function-class>sendmail.jira.plugin.workflow.SendMailFunction</function-class> <orderable>false</orderable> <unique>true</unique> <deletable>true</deletable> <weight>900</weight> <default>false</default> <resource type="velocity" name="view" location="sendmail-function-view.vm"/> <resource type="velocity" name="input-parameters" location="sendmail-function-input-params.vm"/> </workflow-function> </atlassian-plugin> </code>

What you have to understand:

  • A plugin is made of 2 java classes: a plugin factory class (SendMailFunctionPluginFactory ) which is in charge of setting up all that is necessary for the execution of the plugin feature, and the plugin execution class (SendMailFunction ).
  • A workflow plugin is expected to bundle 2 velocity template files: one for asking the user to input some data required by the plugin execution (this is the input-parametes velocity template, and one for displaying what the function will do. The later is visible if you click on a workflow transition in JIRA and then on the post-functions tab.

The Java API

The Plugin Factory class

Without further ado, here's the skeleton for the SendMailFunctionPluginFactory class:

public class SendMailFunctionPluginFactory extends AbstractWorkflowPluginFactory
    implements WorkflowPluginFunctionFactory
{
     public SendMailFunctionPluginFactory(FieldManager fieldManager)
     {
      }
 
     protected void getVelocityParamsForInput(Map velocityParams)
     {
      }
 
     protected void getVelocityParamsForView(Map velocityParams, 
          AbstractDescriptor descriptor)
     {
      }
 
     public Map getDescriptorParams(Map conditionParams)
     {
      }
}

Those 4 methods are called by JIRA itself:

  • The constructor is called when you click on the "add" button to add the function to your list of post-functions. The FieldManager instance can be used to get issue fields meta-data (it does not contain any issue data as there's no issue associated with the function yet - This will only happen when the function is triggered by an issue transition).
  • The getVelocityParamsForInput() method can be used to store some properties in the velocityParams map. These properties will then be accessible from the "input-parameters" Velocity template.
  • The getVelocityParamsForView() method can be used to store some properties in the velocityParams map. These properties will then be accessible from the "view" Velocity template. In addition the descriptor parameter provides access to the data entered by the user in the input phase (these data are stored in the workflow data structure itself).
  • The getDescriptorParams() method is the bridge between the data contained in the Velocity context and the data in the Workflow context. More precisely you put in there the code to extract the data that have been entered by the user in the Velocity context and you put the data in the workflow descriptor context. This descriptor context is the second parameter that is available in your getVelocityParamsForView() method.

Here's a look at the input-parameters velocity template:

<code><tr bgcolor=ffffff> <td align="right" valign="top" bgcolor="fffff0"> <span class="label">Group emails:</span> </td> <td bgcolor="ffffff" nowrap> <input type="text" name="groupEmails" value=""/> <br><font size="1">Comma-separated list of JIRA groups to send emails to.</font> </td> </tr> <tr bgcolor=ffffff> <td align="right" valign="top" bgcolor="fffff0"> <span class="label">Individual emails:</span> </td> <td bgcolor="ffffff" nowrap> <input type="text" name="individualEmails" value=""/> <br><font size="1">Comma-separated list of JIRA users to send emails to.</font> </td> </tr> </code>

As you can see, the variables groupEmails and individualEmails will hold the data entered by the user.

The Plugin Function class

Here's the code that implements the plugin feature (in our case the sending of the notification email):

public class SendMailFunction implements FunctionProvider
{
     public void execute(Map transientVars, Map args, PropertySet ps)
     {
      }
}

The execute() method is called by JIRA when an issue transition happens.

The parameters have the following meanings:

  • The transientVars parameter holds useful data such as the issue that was modified. You get a referernce to the issue by calling transientVars.get("issue"); . It contains also other piece of data such as the comment entered by the user, etc.
  • The args parameter holds all the data stored in the workflow context (aka the workflow descriptor). This is the data you have stored yourself in the getDescriptorParams() method explained above.
  • I'm not too sure what the ps parameter is used for. I think it holds data related to the workflow steps but this needs to be confirmed. Anyway, you shouldn't need it in most cases.

Deploying and executing the plugin

Deployment is a simple as dropping the plugin jar in [jira install dir]/atlassian-jira/WEB-INF/lib for example (it may be possible to drop it in other classloaders but I haven't tried).

In the following image we can see how JIRA has automatically discovered our plugin and extracted information from the plugin descriptor to make them available at the right extension point in JIRA:

Here is the page (using the input-parameters Velocity template) to let the user enter data for our function:

Here is how our function is displayed (using the view Velocity template):

I hope you got a good feel of what's possible to do with the JIRA API.

Note: For those wondering, I am not affiliated with Atlassian at all. I simply happen to like their tools (JIRA, Confluence) and I like the spirit of their team.

[ vmassol ] 12:06, Saturday, 3 July 2004

The need for source code communication

When you're working in a team one of the most important aspect of development is to know what others are working on, what they are modifying. Indeed, in development everything is tied one way or another:

  • if someone changes the database schema, I need to know
  • if someone updates a build dependency to a newer version of a framework I need to know
  • if someone changes a public API, I need to know
  • if someone starts working on the same set of source files as mine, I need to know
  • if someone modifies a best practice document, I need to know so that I can have a look at what's new

Basically, I need to know whenever some source file on which my work depend on is modified.

Of course, I could rely on verbal communications: "Hey Joe, I'm gonna work on this files for the coming 4 hours". Sure, that'll work too. However it's not a scalable solution (Imagine a project divided in 5 teams of 10 persons each) and it works only if the participants are acutely aware of what each developer is depending upon which is near impossible. In addition, if the team is distributed it becomes virtually impossible to communicate such small events in any efficient manner. There has to be a better solution.

And there is! There are actually 2 solutions I know of, one better than the other.

Solution 1: Diff emails on SCM commit

This first solution involves setting up a server-side hook in your SCM. Most SCMs supports executing a script upon source check-in/commit. Just write a script that sends an email containing the diffs between what the user is committing and what already exists in the repository. The reason for the diff is that we're interested by what's different and not by the full content. This is what is done in most open source projects. Here's an example:

The problems with this solution are:

  • It's not very easy to let the developer choose finely what files/directories they want to monitor
  • The diff information is received in your email, among all you other emails. It's good to reserve emails for actions that you have to perform. Here we're not talking about actions but about information. I believe a better channel for information is for example a RSS feed. It still comes to you but it does not clutter your other emails.

Solution 2: RSS feeds

This solution involves setting up some kind of server that will generate RSS feeds for the developers. Implementing this could be quite involved. Fortunately there's a tool called FishEye from Cenqua that does exactly this. It's an improved viewcvs. Most importantly it allows developers to generate RSS feeds on any resource (any directories and any files). Here's what the HTML view looks like:

Then here's what you get in your favorite feed reader (I'm using Newz Crawler here):

Clicking on the diff link generates the following view:

Note: Please note that FishEye currently only supports CVS. Other SCMs like Subversion are planned in the future.

Conclusion

It is relatively simple and very effective to set up a way to notify developers about changes happening in the repository. I would even suggest that not doing so is a "communication anti-pattern". In order for it to be the most effective possible, developers should not be "spammed" by tons and tons of diff information. Thus to prevent this spamming you should consider 2 options: using RSS feeds instead of emails and allowing developers to choose their feeds. There are obviously some feeds that should be more "mandatory" than others. For example, the one notifiying of database schema change, the one notifying of build changes and the one notifying of public API changes. It's important, as always, that there is a champion in the team, explaining to others the benefit of these feeds and how they should be best used. Otherwise the power of them may exist but it won't be harnessed.

[ vmassol ] 08:50, Wednesday, 21 April 2004

Pattern Testing is the concept of automatically verifying the good application of architectural/design patterns in code. It uses AOP to perform this feat.

Pattern Testing is a concept I started researching in 2002 on a big project at work. At that time, I was already using Checkstyle and PMD to perform verification of rules. However, I was finding them limited in several ways:

  • They were mostly focusing on syntactic rules, whereas I was mostly interested in semantic rule checking. Checkstyle has improved a lot in this area over the past 2 years. However, even though there are some semantic rules , they are usually simple and not very "business"-oriented. Note that this is not to diminish Checkstyle which I find great and use on all projects.
  • They were focusing on applying rules on a single file (using AST trees), whereas I was interested in applying rules that span several files (this is necessary for architectural/design checks). As a consequence writing pattern tests using Checkstyle is difficult.
  • They were limited to static checks. I wanted to be able to say "ensure that no method call is passed a null parameter", "ensure that there are no more than 10 calls to the database per user case", etc

The result was the Pattern Testing project on SourceForge. It's implemented using AspectJ. It contains some pre-made Pattern Tests but more importantly it lets you write your own. There is Maven plugin that makes it easy to run any Pattern Test on any mavenized project.

The possibilities of Pattern Testing are endless.

Here's an example of a rule that says that we do not want to instantiate business classes (i.e. a class that extends BasicBC). That's because they are instantiated by factories/service managers:

public aspect NoNewOnFrmwkClassesPatternTest
{
     declare error: 
       call(com.some.package.BasicBC+.new(..)) : 
         "Do not instantiate a business class directly";
}      

I used Pattern Tests for some time on that project at work and then I changed job. I was still interested in the concept but with my other open source involvements I didn't take the time to work on it as much as I wanted. Thus it became a bit abandoned. However, these past weeks, I have found renewed interest in the concept for several reasons:

  • I was contacted by Matt Smith who was interested in taking the lead of the Pattern Testing project. I gladly handed this over to him and he's now the one driving the project
  • I have started working on Cactus2 and as the Cactus 2 architecture is based on AOP, the concepts are very close.
  • On the new project I am on, we're starting to want to automate our architectural/design checks and some teams have initiated using Pattern Tests.

Are any of you using Pattern Testing too?

[ vmassol ] 11:32, Thursday, 8 April 2004

As part of my day work, I've met CAST Software in France. They are really nice guys. And they have a very good product suite that analyzes application source code and produces different kinds of reports (quality and reverse-engineering/browsing).

The really nice part is their open architecture:

It's nice for the following reasons:

  • There can be as many parsers as there are languages, script languages, etc. All data goes to the same database and this data is available to all analyzers. This means that it's possible to define analyzers that act at the level of the full application. For example, you can define some architecture pattern validation. This is the same idea I had some time ago with the PatternTesting project, using AOP to write these rules (I was relying on the Javac parser to perform the source code parsing for me).
  • It allows to write browsing tools that show the full flow of the application, from configuration files to database tables

Compare this to existing open source tools (or not open source for some) like Checkstyle, Clirr, PMD, Findbugs, jcoverage, Simian, Clover, etc. They all use their own parsers and more important their own persistence format for storing the parsed data. Even though they almost all offer an XML report format, it's already processed data (as opposed to raw parsing data). Thus it is very hard to reuse the information from these tools to offer a higher level integrated view. It's also a lot of work for these tools to develop all the necessary parsers.

What would be really nice is the creation of an open source project that does the following:

  • Define a common storage model (database schema, XML schema, other). It should allow to store historical data too.
  • Define a common API to store parsing data
  • Define a common API to query parsing data

If such open source project existed, we could have parser projects which could parse Java code, .Net code, Hibernate configuration + code, Struts configs + code, AspectJ Aspects, etc. These parsers would user the common API to store parsed data in the common storage format, thus making the data available to other projects, such as the one mentionned above (Checkstyle, PMD, etc). These projects woud be analyzers and would use the common API to query parsed data and generate reports for their domain of work.

With this, we could build nice analyzers that could completely check for full architecture best practices, across technologies and across frameworks.

Do you think such a project would work?

Note: Maybe CAST should open source this part (storage model) and keep as closed source their added value Parsers and Analyzers? :-)

[ vmassol ] 19:36, Thursday, 1 April 2004

A year ago, I spent some time looking at Alice Bot. It's a bot that allows discussing with human beings. It has won several times the Loebner prize, which means it gets as close as possible to imitating a human person. The bot is controlled by XML files written in AIML.

The idea that I had at that time was to use it as a live FAQ answerer for the Cactus project. I never came to do it though... need more time...

I've been reminded of this idea by some posts on the Cactus mailing list. I'd venture that at least 50-70% of the the questions asked have already been answered in some form in the past, are available in the Cactus documentation or are easy to answer by some generic rule. I think it could be fun to train a bot to answer these questions. For those that the bot couldn't answer, the user would be redirected to the mailing list where Cactus experts would find the answer and train the bot so that it gets better next time.

It should be possible to have rules that would generate the following kind of dialogue:

  • [user] I'm getting an error when running Cactus tests
  • [bot] Is the error happening on the client side or the server side?
  • [user] I don't know
  • [bot] Could you turn logging on and see if the error appears in the client side log or the server side log?
  • [user] how do I turn logging on?
  • [bot] See the Cactus logging configuration page
  • [user] It's happening on the client side
  • [bot] Is the error hapenning during the HTTP connection?
  • [user] yes
  • etc...

It would be cool! My only worry is that it would take too long to create the AIML files. But maybe not. Any experience?

What would be even better would be that the bot would learn from the mailing list posts itself. However, that's probably science fiction at this point in time...

Note: The image was borrowed from Alicebot.org

[ vmassol ] 08:58, Friday, 19 March 2004

I'm using SharpReader and I don't whether the following is true for other readers.

Whenever a blog entry is updated by its author it appears in bold (indicating a change) and in italic (indicating that it's an update). However I have no clue what modifications were done to the entry and most of the time I don't have enough energy to read again the full entry trying to figure out by myself what has changed.

What would be really nice would be that the RSS reader shows a diff, highlighting the changes (as it's done for CVS commit diffs).

[ vmassol ] 22:30, Monday, 15 March 2004

All continuous builds have a common point: when a project fails we have to find out why. There can be several reasons:

  • Reason 1: a change in the project itself is causing the failure
  • Reason 2: a change in one of the project's dependencies is causing the failure

Solving case 1 is relatively easy. However, solving case 2 is usually quite difficult. How could we make it easier?

Here's a proposal for improving continuous build tools:

When a project goes from a "success" build state to a "failure" build state, perform a source diff (using CVS or whatever SCM used) between the 2 dates on both the project itself + all its dependencies. Then generate a report showing all the changes.

This should allow to get a clear view of all the things that changed and that led to the build failure. This strategy seems especially well suited to continuous build systems as these builds are executed often and thus the differences between 2 builds should be small enough to get a clear picture.

Has anyone done this already? Does it work?

Note: I have suggested this idea to the Gump project. I would be very eager to see how it works out.

Implementation details:

  • If your SCM is too slow to perform large diffs over several revisions there is still hope! Save the last build checkout and perform a file diff against the current checkout and the last saved one. If you're using CVS or an SCM that supports putting metainformation in source file (like $Id:$) then you'll also get to know who made the change. Otherwise, for each file change you'll need to query your SCM to know who made the change.
  • I think it might be interesting to target the build failure emails to the list of people who have made changes since the last build + at least one person from the failed project's team.

Limitations:

  • Say we have project A which depends on project B which depends on project C. If project A fails because of a change in project C, then we won't know it unless we're doing diff on transitive dependencies. But that might be too long. I think a good strategy would be to only generate diff on the first level dependencies. Then possibly provide a feature to generate transitive dependencies on demand (but not at at every build failure - it'll be too costly I think).
[ vmassol ] 09:06, Wednesday, 18 February 2004

Warning: Don't take this comparison too seriously... ;-) It's a bit like comparing apples and oranges and I'm sure the analogy breaks quite quickly if you pursue it too far. However, I do believe that the contained message is true.

I've just realized that collaborative offshore and EJB have a common point: they are both using a distributed model. By collaborative offshore, I mean teams developing on both sides (onsite and offhsore) and interacting continuously to build a system.

I've been working on 2 big offshore projects so far for the past 2.5 years (working with an Indian partner) and I've found that there is an organizational model that does not work well: onsite people directly managing developers (Figure 1). In the same way, calling directly Entity Beans from the client side is a bad practice because 1) it involves a lot of network round-trips and thus is inefficient and 2) it does not allow changing the implementation without affecting the client (Figure 2).

Figure 1: Onsite Project lead managing directly offshore developers
Figure 1: Onsite Project lead managing directly offshore developers
Figure 2: Client calling Entity beans directly
Figure 2: Client calling Entity beans directly

What is the solution we have used for EJBs? Answer: introduce a facade (Session Bean) which "manages" the underlying components (Figure 3). I've found that it is the same with collaborative offshore: there is a strong need to always introduce a local Project Lead (Figure 4).

Figure 3: EJB client calling a facade
Figure 3: EJB client calling a facade
Figure 4: Onsite Project Lead interacting with a local Project Lead
Figure 4: Onsite Project Lead interacting with a local Project Lead

It probably sounds very obvious but very often there is an initial tendency by onsite managers new to offshore to directly manage offshore "resources" (in order to reduce support costs). So far, whenever they've tried, it has failed (although we told them it wasn't a good idea but it seems some people need to see it by themselves to believe... ;-)).

I wonder if there'll be a time in the future where our communications skills will be so great that it will allow direct managment across the wire. Probably... but this is still in the future...

Has this also been your experience?

[ vmassol ] 09:18, Monday, 16 February 2004

Applying a working build strategy for testing against a database is not easy. It depends on the complexity of the database model, it depends on the size of the teams. However, I've found that the strategy described below is the one that has worked the best for the projects I have been involved in:

  • Do not mix unit tests independent of the environment (i.e. where interactions with the environments are stubbed/mocked) with integration unit tests (IUT). They have to be separated and put in different different in the SCM. The reason is that the 2 kinds of tests do not support the same execution workflow. More below.
  • Have a database build project (in the sense of an Ant or Maven project) in your SCM. This is extremely important. The goal of this project is to provide the following build targets/goals:
    • create-schema: create the database schema from the ground up (in the database specified as properties)
    • load-static-data: loads static data (i.e. read only data)
    • load-minimal-data: loads a functionally minimal set of data. It should contain all data required functionally but only 1 or a few entries of each type. It's not supposed to reflect the state of the database when in production.
    • load-full-data: loads a full set of data as expected in production.
  • Put the database data (minimal + full sets) in your SCM as flat files (by opposition as keeping the data live in the database). The reason for this is because:
    • you get automatic notification of data changes by using your SCM send-email-on-commit feature that all good SCM have
    • it is build-friendly and allows automated and controlled builds
    • it is controlled, i.e. you know what you're doing with your data, who is modifying them, you can revert if need be, etc
  • Here's the workflow for executing IUT or functional tests. For each project and before the test suite runs:
    • execute database:create-schema
    • execute database:load-static-data
    • execute database:load-minimal-data

    Then, each test should also have the opportunity to load data in its setup (using DBUnit or similar). This is required for example to test special cases where the database is missing some required data and we wish to verify the exception handling part of the code. It is also required if the test requires more than the minimal data set (although that should be relatively infrequent).

    Note that the tests can also be ordered to save some database load time. Although not the best strategy I've found that this was sometimes required on some projects with complex database models.

On the other hand, here are strategies that have not been working so well for me in the past:

  • Have a live database where developers can directly update data. The problems encountered were:
    • it is not controlled. You do no know who's putting data and what is being modified. You cannot easily revert a change
    • it's difficult with distributed teams as you need to set up a replication mechanism. The problem is that often developers update their local database and forget to update the master database which leads to lots of build failures. The solution exposed above does not suffer from this problem.
    • It's hard to sync everyone on the exact same set of data. Some minimal data + variations works best.
  • Do not provide minimal data and let developers write from scratch the data they require for their tests and load these data before each test. This does work for small projects with simple database models but not for complex ones. There's really the need for a minimal data set.

Is that also your experience?

[ vmassol ] 08:23, Wednesday, 11 February 2004

I'm revisiting an old entry I posted about a year ago about Starteam woes. The reason is that the project I'm working on is delivering a first release soon and we'll be attacking the second leg of the journey... and it maye be time to lobby for a source repository change... :-). Thus I need to prepare my ammunitions again. By running it through you guys I hope to flesh out the inaccuracies of my points and possibly find new arguments in favor of... CVS. Yeah, I am biased!

Here's what I feel is wrong with Starteam:

  • No "clean checkout" option. That is, if a file is deleted from the StarTeam repository, even if you perform a checkout all, the deleted files will not be removed from your local working copy. Actually it is possible but only through the command line interface.
  • No ability to send email diffs on commits (using tools like CVSSpam)
  • No nice IDE integration such as the CVS integrations we can see in IntelliJ, Eclipse, NetBeans, JBuilder, etc. More specifically the ability to see exactly what files are not in sync with the repository.
  • Limited integration with the majority of development tools: limited integration in Maven (BTW that's because it's been developed on this project itself by Emmanuel Venisse that there is limited integration in Maven!), no JIRA integration, etc
  • No JIRA integration. It seems there's a nice CVS integration which allows to link source code to issues resolved by entering the issue number in CVS commits. As we're using JIRA we could use this!
  • No possibility to run an Ant or Maven build every time there's a commit (a la Damage Control). That would allow our build to be in a better shape.
  • Starteam is very slow on WAN links. It may be due to our project policy to use locking on files. In any case if we had CVS we wouldn't have used this locking which is hampering productivity.
  • No windows explorer integration. To use Starteam you need to open yet another GUI application and perform operations from there (unless you use the command line but nobody is using it here). CVS has a nice TortoiseCVS client.
  • With Starteam a major problem for our build is that people forget to check in directories. No wonder as the Starteam GUI client does not show at all new directories!
  • Starteam is quite expensive and as a result we have only a limited number of fixed licenses. Anyone using floating licenses gets disconnected every few minutes. Very annoying.
  • Starteam admin seems more complex than CVS's. We have often had problems of database in the past year.

Any more? Any inaccuracy in there (I'm sure there are as I am biased!)?

[ vmassol ] 20:22, Tuesday, 10 February 2004

I've just learnt that there was a name for "a computer-generated test that humans can pass but computer programs cannot". It's called a captcha. You can see those on some web sites during registration. Here's an example:

captcha.jpg

Some of my Octo workmates are developing a java framework for generating captchas called JCaptcha. What I find interesting is one possible use of captchas: preventing spam. More specifically the idea would be to use captchas to prevent blog spam. It means that people who enter blog comments would need to be humans. What I don't know is whether blog spam is being done manually by individuals or if it's automated. In any case this solution will prevent automated blog spam which is a good first step!

The JCaptcha project has just released a beta version. I guess one next step could be the creation of a MoveableType plugin. Then I would hope to convince Bob to let us try it on Codehaus blogs :-)

Update 18/04/04: It seems that James Seng beat me with this captcha idea. Not only the idea but also an MT implementation. In addition, he's created yet another MT plugin for preventing comment spam by implemeting an MT Bayesian filter.

[ vmassol ] 15:28, Tuesday, 10 February 2004

I've started thinking about what would be the best possible IDE back in 2001. At that time, I had tried Sun's Jini and liked it quite a lot. I linked the 2 concepts (IDE and Jini) and came up with the idea of a Jini-based IDE. At that time I started writing down some ideas (here and here). However I did not pursue this idea as creating a full fledged IDE is a master achievement and I did not have the time nor the wish to do so!

However, even today in 2003, I still think it had some nice ideas that I would like to see in existing and future IDEs. Here are some ideas about this Jini IDE:

  • Each module would be a Jini service. Examples of modules are: javac compiler module, RMI compiler module, java editor module, source repository module, java execution module, junit execution module, etc.
  • It would be lightweight. It would be able to bootstrap with a minimal jar containing only the "microkernel" (+ possibly a module cache manager). Thus you could move from one machine to another easily. To install it, simply click on a browser link and download this minimal core. The rest will be downloaded as need be when you need the modules.
  • As each module is a Jini module, there would be 3 possibilities to implement a module:
    • The Proxy contains only local methods and there are no communication between the proxy and the back-end service,
    • The Proxy is a “smart proxy”, i.e. there are both local and remote methods which communicate with the back-end service,
    • The Proxy is only a client stub, all the methods are remote
  • It would be completely distributed. For example, the compile menu of the IDE would list all compiler module implementations the IDE has been able to discover when contacting the different Jini lookup services.
  • It would be self-healing: if a module is no longer available on a given server, another replacement will be automatically discovered (by the magic of Jini leases).
  • By using Jini leases, the IDE would support hot-patching/uninterrupted services
  • It would be secure using Jini security for modules. This will allow to support both open source modules and commercial modules.
  • What would be nice would be to have a repository service which would automatically save edited code on the server side (in a user-private zone). This would enable remote building. Developers on the move would be able to get their environment set up rapidly on any machine. Same if you wish to share environment with someone else, etc.
  • It would be completely modular with caching done on the client side to improve performances. The modularity should allow the creation of module repositories on the web. It would also allow creating IDE "a la carte".

Of course, this is a bit utopic as we would need to overcome several difficulties:

  • definition of standard module interfaces. However lots of work has been done there already by Netbeans and Eclipse and that could be reused.
  • there would need to be some module certification tests to ensure a module is properly coded, does not hog the IDE, plays well with others, etc

Of course, this IDE of the future could be implemented in a technology other than Jini (P2P, Web services, etc). I still believe Jini is way ahead of web services but they are catching up slowly on security, transactions. To my knowledge there's still no notion of "leases" in web services, nor of code that moves around the network and can execute locally.

What do you think? Is that Jini-like IDE something you would also like to see in the future?

Update: This blog entry has been reposted on TSS (several interesting comments).

[ vmassol ] 10:34, Monday, 9 February 2004

Imagine you have a continuous build system in place and that it builds automatically your projects every few hours. When the team is large it can be quite challenging to coach all team members in being careful about the build and that before committing code, people need to run the build locally on their machine first. There are also other problems, like the build works locally but not on the continuous build machine.

Anyway, I've found that there are different kinds of projects. Some where the build is taken very seriously and a build-aware mentality quickly spreads and others where people do see the value of the build but have more difficulties taking it very seriously (leading to lots of build failures).

One idea that I've had is to use a physical artifact to represent a build success or a build failure. People like to see and touch things. Doing some research I've found the Ambient Orb which seems close enough to what I have in mind:

ambient-orb-alt1.jpg

(Image stolen from ThinkGeek)

The idea is that the orb will turn more and more red depending on the number of projects that failed to build during the past cycle.

I have yet to buy one and try it but I like the idea. Has anyone done this already? Can it be done easily with the Ambient Orb? Are there devices other than the Ambient Orb on the market (for example, I don't need the wireless radio network connection at all especially as I live in France)?

Update: This blog entry has been reposted on TSS.

[ vmassol ] 16:11, Sunday, 21 December 2003

Cactus v2 architecture

Rationale

Why a new architecture? Several reasons:

  • The existing architecture is restricted to testing Servlet components (and its variations: Taglibs, Filters, JSP). We've tried to create an SPI so that implementations for other containers can be written but it is not possible with the current architecture.
  • We'd like to make Cactus the de facto tool for performing integration unit testing (aka in-container testing) for any type of component in any type of container. See figure 1.
  • We'd like to make it easy for others to create Cactus extensions. This is not currently possible with the existing architecture.
  • We'd like to maximize the reusability of other testing tools. For example, instead of implementing in Cactus the HTTP layer that calls the server side, we'd like the user to use his favorite tool (e.g. HttpUnit). For unit testing Message-Driven Beans, the user will be able to use his favorite JMS injector (e.g. Commons Messenger), etc. This will allow leveraging all the features in those tools (for example, support for HTTPS in HttpUnit, support for Cookie handling, etc).
  • We'd like to standardize on a server side interception mechanism instead of inventing our own.
Figure 1: Scope of Cactus v2
Figure 1: Scope of Cactus v2

Architecture choices

These are the high-level architecture choices on which Cactus v2 will be built:

  • An AOP framework for server-side interception. There are 2 possible contenders: AspectJ or AspectWerkz. We are currently favorizing AspectWerkz because it allows to write test cases in Java. AspectJ extends the Java language and requires strong tool support. It will change with the advent of JDK 1.5 as the JDK will become meta-data compatible and Aspect will probably take advantage of this. However, it will take several years before everyone is on JDK 1.5 and we need a solution before this.
  • Cactus v2 will continue to be a JUnit extension. A Cactus v2 test case will be a JUnit test case and thus any JUnit test runner will work (provided the server has been started and the components and tests deployed).

High level architecture

The Cactus system is composed of 3 parts (see figure 2):

  • A Cactus test case, which is a combination of a JUnit test case (with testXXX() methods executed on the client side) and aspects used on the server side to perform interception and/or validation on the server side (see figure 3). More specifically, 3 typical uses cases for these aspects are:
    • Intercept the call to the component under test and redirect the flow of execution to a specific method to unit test it,
    • Prevent the flow of execution to call some subsystem. For example, stop the flow of execution before it goes to the database and instead return canned values.
    • Perform asserts to verify server-side expectations. For example, verify that the Servlet HTTP Session contains such and such values after executing such method, verify that the Database connection is closed as many times as it is open for such and such use case, verify that the number of SQL queries is below such number (e.g. less than 10 SQL queries per use case), etc. These are server-side expectations.
  • A Cactus runner to execute Cactus tests automatically. This involves starting the container, deploying the application and tests in the container, starting the tests and stopping the container.
  • A Cactus framework to support starting a test case on the client side and continuing it on the server side, and also to support transferring test results from the server side to the client side so that results can be displayed in the executing JUnit test runner. This framework also contains helper aspects and classes to help write test cases.
Figure 2: High-level architecture
Figure 2: High-level architecture

Cactus test case example

Here is an example of a typical Cactus test case using AspectWerkz 0.9. Please note that this example is a work in progress and is non-functional at this stage. We're also working towards simplifying the syntax for test case writers:

Figure 3: Cactus test case sample
package org.apache.cactus.sample.servlet;

import java.util.Hashtable;

import javax.servlet.http.HttpServletRequest;

import org.codehaus.aspectwerkz.attribdef.Pointcut;
import org.codehaus.aspectwerkz.attribdef.aspect.Aspect;
import org.codehaus.aspectwerkz.joinpoint.JoinPoint;
import org.codehaus.aspectwerkz.joinpoint.MethodJoinPoint;

import com.meterware.httpunit.GetMethodWebRequest;
import com.meterware.httpunit.WebConversation;
import com.meterware.httpunit.WebRequest;
import com.meterware.httpunit.WebResponse;

import junit.framework.TestCase;

public class TestSampleServletAspectWerkz extends TestCase
{
     /**
      * Intercepts Servlet's doXXX calls and instead redirect the flow of
      * execution to the {@link SampleServlet#getRequestParameters} method to
      * unit test.
      * 
      * @Aspect
      */
     public static class GetRequestParametersTestAdvice extends Aspect
     {
          /**
           * @Execution * *..SampleServlet.do*(..)
           */
          Pointcut interceptServlet;
          
          /**
           * @Around interceptServlet
           */
          public Object catchGetRequestParameters(JoinPoint joinPoint) 
              throws Throwable
          {
               MethodJoinPoint jp = (MethodJoinPoint) joinPoint;
               SampleServlet servlet = (SampleServlet) jp.getTargetInstance();
               Hashtable params = servlet.getRequestParameters(
                   (HttpServletRequest) jp.getParameters()[0]);
               assertNotNull(params.get("param1"));
               assertNotNull(params.get("param2"));
               assertEquals("value1", params.get("param1"));
               assertEquals("value2", params.get("param2"));
               return null;
           }
      }
 
     /**
      * Test {@link SampleServlet#getRequestParameters} by calling the server 
      * side using HttpUnit. On the server side, our aspect will kick in and
      * the {@link GetRequestParametersTestAdvice#catchGetRequestParameters} 
      * test method will be called to unit test our method.    
      */
     public void testGetRequestParameters() throws Exception
     {
          WebConversation conversation = new WebConversation();
          WebRequest request = new GetMethodWebRequest(
              "http://localhost:8080/test/SampleServlet?param1=value1&param2=value2");
          WebResponse response = conversation.getResponse(request);
      }    
}

We would like to be able to write the following (not yet supported by AspectWerkz but we've had commitment from the AW team that they will make modifications to support it! :-)). The difference with the previous sample is the removal of the inner aspect class + the typed poincut interception.

Figure 4: Ideal Cactus test case sample
package org.apache.cactus.sample.servlet;

import java.util.Hashtable;

import javax.servlet.http.HttpServletRequest;

import org.codehaus.aspectwerkz.attribdef.Pointcut;
import org.codehaus.aspectwerkz.joinpoint.JoinPoint;
import org.codehaus.aspectwerkz.joinpoint.MethodJoinPoint;

import com.meterware.httpunit.GetMethodWebRequest;
import com.meterware.httpunit.WebConversation;
import com.meterware.httpunit.WebRequest;
import com.meterware.httpunit.WebResponse;

import junit.framework.TestCase;

public class TestSampleServletAspectWerkz extends TestCase
{
     /**
      * @Execution * *..SampleServlet.do*(..)
      * @And @Target(SampleServlet)
      * @And @Args(HttpServletRequest)
      */
     Pointcut interceptServlet;
         
     /**
      * @Around interceptServlet
      */
     public void catchGetRequestParameters(SampleServlet servlet,
         HttpServletRequest request) throws Throwable
     {
          Hashtable params = servlet.getRequestParameters(request);
          assertNotNull(params.get("param1"));
          assertNotNull(params.get("param2"));
          assertEquals("value1", params.get("param1"));
          assertEquals("value2", params.get("param2"));
      }
 
     /**
      * Test {@link SampleServlet#getRequestParameters} by calling the server 
      * side using HttpUnit. On the server side, our aspect will kick in and
      * the {@link #catchGetRequestParameters} test method will be called 
      * to unit test our method.    
      */
     public void testGetRequestParameters() throws Exception
     {
          WebConversation conversation = new WebConversation();
          WebRequest request = new GetMethodWebRequest(
              "http://localhost:8080/test/SampleServlet?param1=value1&param2=value2");
          WebResponse response = conversation.getResponse(request);
      }    
}

Detailed design

The detailed design of Cactus v2 is shown in figure 5 below.

Figure 5: Detailed design
Figure 5: Detailed design

It works as follows:

  1. The Cactus tests are started by a JUnit Test Runner (any JUnit Test Runner).
  2. The Cactus framework intercepts the JUnit call to the test case runBare() method. It checks if a listener socket has been set up. If not it sets up one. It passes the test name to it (so that the server side can later on find out what test is currently being executed),
  3. It calls the test case testXXX() method. In this method the test case writer has written the logic to call the server side (using any existing framework; for example HttpUnit for calling an HTTP service),
  4. The flow of execution reaches the application to test on the server side,
  5. Somewhere during the execution of the application, the test aspect defined by the test case writer kicks in. Before that aspect is executed, the Cactus framework intercepts it,
  6. The Cactus server side interceptor then calls the listener socket set up in step 2 to get the name of the test being executed (the test that was started on the client side). It checks if the aspect matches the current test,
  7. If the aspect matches, its advice is executed, performing whatever logic the test case writer has put in it,
  8. Before the call returns to the client side, the Cactus server side interceptor calls the socket listener to pass to it the server side test result (it passes to it any exception raised on the server side; for example AssertionFailedError exceptions),
  9. After the testXXX() method finishes its execution and before the test result is communicated to the JUnit Test Runner, the Cactus client side interceptor verifies if any error has been reported by the server side execution. If so, it rethrows the server side exception to the JUnit Test Runner. Otherwise it lets the result of testXXX() bubble up to the JUnit Test Runner.

Some additional comments/ideas:

  • If one of the catchXXX() methods is not called, it should result in an AssertionFailedError being raised. This is to prevent not executing server-side test code without knowing it. As we are using interception, I guess it's easy to make a mistake when defining the join point and thus we need this safeguard.

Challenges

The following challenges await us:

  • Being able to make the Cactus test case easy to write for test case writers,
  • Make the execution of Cactus test case easily executable. Runtime code weaving would be nice but is not supported by old JVMs. We will probably have a mixed model as is being supported by AspectWerkz.
  • Find out if integration with Chad's VirtualMock is possible/desirable.

Please challenge us to improve our design! :-)

Disclaimer: Please also note that, at this point in time, this architecture and ideas are only mine and do not represent (yet!) the official view of the Cactus project. I am proposing it to the Cactus project members.

[ vmassol ] 12:29, Monday, 1 December 2003

Currently the Cactus project is a framework to help unit test J2EE components (and mostly Servlet/JSP/Taglib).

I'd like to expand its goal and make it a framework for building in-container testing solutions. Cactus would still offer an implementation for J2EE component testing but it will also open up an API for plugging other implementations. Some ideas are shown on the diagram below.

cactus_new_vision.jpg

For this to happen, the core helper classes will have to be separated from the HTTP protocol implementation and the existing Cactus TestCases. 2 SPIs will appear:

  • one for plugging in different protocol implementations (RMI, JMS, etc). Currently Cactus provides the HTTP implementation.
  • one for plugging in custom test case implementations (still looking for a good name for these). Currently Cactus provides the ServletTestCase, FilterTestCase and JspTestCase.

Moreover, the Cactus integration modules (aka front-ends) will also need to provide clearly-defined extension points to help automate the whole process of starting the container, deploying components, running the tests and shutting down the container.

I'm currently working on the Cactus code to make the 2 SPIs surface. The first test drive of these new SPIs will be to implement support for EJB TestCases.

[ vmassol ] 12:10, Sunday, 9 November 2003

Joel Shellman has coded an extension (easymock-patch-1.0.jar) of EasyMock which allows to mock class (and not only interfaces).

To use it:

  • put the extension jar in front of the easymock jar in your classpath (it works with Easymock 1.0 only, not with the new 1.0.1b which has modifications for extending Easymock). It overrides the MockControl class by a new one accepting a class as parameter as shown below.
  • add cglib to your classpath (I've tried it with version rc2-1.0).
  • add bcel to your classpath (I've tried it with version 5.1).

Here's an example. First let's start by the class we wish to mock:

package test;

public class Calculator
{
     private int amount;
     
     public Calculator(Integer amount)
     {
          this.amount = amount.intValue();
      }
     
     public int compute()
     {
          return this.amount;
      }
}

Here's now the class we wish to test (it uses the Calculator class):

package test;

public class Account
{
     public int computeBalance(Calculator calculator)
     {
          return calculator.compute();
      }
}

Now, here's our test of computeBalance . Notice that we are mocking the Caculator class:

package test;

import org.easymock.MockControl;

import junit.framework.TestCase;

public class AccountTest extends TestCase
{
     public void testComputeBalance()
     {
          Account test = new Account();
          
          MockControl control = MockControl.createControl(Calculator.class,
              new Class[]{Integer.class}, new Object[]{new Integer(5)});
          Calculator mock = (Calculator) control.getMock();
          
          mock.compute();
          control.setDefaultReturnValue(10);
          control.replay();
          
          int result = test.computeBalance(mock);
          assertEquals(10, result);
      }
}

Note: I don't know how to instantiate a constructor taking primitive types. Don't know if it is supported.

That's nice. However, there are a few drawbacks I can see:

  • It doesn't encourage refactoring and creating interfaces (which is the right way to go in most cases). However, it can still be useful in some cases like when mocking third party classes which do not have interfaces (and there are quite a lot of them, especially in the JDK...)
  • It doesn't work with final classes. Thus not possible to mock the JDK's URL class for example. It also won't work with private constructors of course
  • It forces to know the constructor values when creating the mock
  • It still doesn't help in cases where it is no easy to introduce the mock in the class under test. In that regards, it's less powerful than interception (a la AOP).

So it's not a silver bullet but it will certainly help in cases where you have no control over the sources (like when using third-party libraries.

[ vmassol ] 08:32, Saturday, 27 September 2003

Let's imagine you wish to perform unit test for code that calls the database. Let's also imagine that you want to test in integration, i.e. verify that the SQL query does actually goes to the database and returns database data.

The traditional approach is to:

  1. Preload static data before the test suite runs
  2. Load specific test related data in the database before each test. A variation is to restore the modified data at the end of the test

There are several disadvantages to point 2:

  • Finding out the specific data that needs to be set up for the test is time-consuming and you often need to know the full database schema (and not only the domain you're working on). This is especially true for big projects
  • There are referential integrity concerns that forces you to set up again data for a lot of other tables that the ones you are concerned with as they are linked through keys

One solution that our project team discussed yesterday is about using transaction rollbacks. It would work as follows:

  • Before the test suite starts, load the database with full data (this is done only once and usually takes a good 5 minutes for complex projets)
  • Before the test starts, start a transaction (in JUnit's setUp() method for example)
  • run the test
  • rollback the transaction to restore the database data in a pristine state

I was suprised to see no mention of this on the articles mentioning database unit testing (http://www.dallaway.com/acad/dbunit.html, http://www.dbunit.org/bestpractices.html). The reason may be that there are some glitches with this technique that make it impossible to use in practice... I'd like to know what you think? Have you done this before?

Notes:

  • The code under tests should not create a new transaction as nested transactions are usually not supported
[ vmassol ] 19:54, Saturday, 30 August 2003

The mock object strategy is nice but how do you apply it when you have some existing code that uses tons of static calls, does not have setters prepared so that you can introduce your mocks, etc?

One solution is to use AUT: AOP Unit Testing! (hehe... yet another acronym :-)).

Let's try that by writing a unit test using AspectJ for an EJB. Let's imagine we want to unit test the following createOrder() method:

package junitbook.ejb.service;

import java.rmi.RemoteException;
import java.util.Date;

import javax.ejb.EJBException;
import javax.ejb.SessionBean;
import javax.ejb.SessionContext;

import junitbook.ejb.domain.OrderLocal;
import junitbook.ejb.domain.OrderUtil;
import junitbook.ejb.util.JMSUtil;
import junitbook.ejb.util.JNDINames;

public abstract class PetstoreEJB implements SessionBean
{
     public int createOrder(Date orderDate, String orderItem)
     {
          OrderLocal order = OrderUtil.createOrder(orderDate, 
              orderItem);
  
          try
          {
               JMSUtil.sendToJMSQueue(JNDINames.QUEUE_ORDER, 
                   order.getOrderId(), false);
           }
          catch (Exception e)
          {
               throw new EJBException(e);
           }
          return order.getOrderId().intValue();
      }
 
     public void setSessionContext(SessionContext sessionContext) 
         throws EJBException, RemoteException {}
     public void ejbRemove() 
         throws EJBException, RemoteException {}
     public void ejbActivate() 
         throws EJBException, RemoteException {}
     public void ejbPassivate() 
         throws EJBException, RemoteException {}
}

Note that the nasty calls to OrderUtil.createOrder() and JMSUtil.sendToJMSQueue are static!

Our challenge here is to unit test this method in isolation from the rest. Here are the corresponding unit tests: one to verify it works when there is no exception (testCreateOrderOk ) and one to verify it also works when there is a JMS exception raised, for example (testCreateOrderWhenJMSException ):

package junitbook.ejb.service;

import java.util.Date;

import javax.ejb.EJBException;
import javax.jms.JMSException;

import com.mockobjects.dynamic.Mock;

import junit.framework.TestCase;
import junitbook.ejb.domain.OrderLocal;
import junitbook.ejb.domain.OrderUtil;
import junitbook.ejb.util.JMSUtil;

public class TestPetstoreEJB extends TestCase
{
     private PetstoreEJB ejb;
     
     protected void setUp()
     {
          this.ejb = new PetstoreEJB() {};
      } 
 
     public void testCreateOrderOk()
     {
          int result = this.ejb.createOrder(new Date(), "1234");
          assertEquals(1234, result);
      }
 
     public void testCreateOrderWhenJMSException()
     {
          try
          {
               this.ejb.createOrder(new Date(), "1234");
               fail("Should have thrown an EJBException");
           }
          catch (EJBException expected)
          {
               assertEquals("some jms error", 
                   expected.getCausedByException().getMessage());
           }
      }
}    

aspect TestPetstoreEJBAspect
{
     OrderLocal around():
         call(* OrderUtil.createOrder(..)) &&
         cflow(execution(* testCreateOrder*()))
     {
          Mock mockOrderLocal = new Mock(OrderLocal.class);
          OrderLocal orderLocal = (OrderLocal) mockOrderLocal.proxy();
          mockOrderLocal.matchAndReturn("getOrderId", new Integer(1234));
          return orderLocal;
      }
 
     void around():
         call(* JMSUtil.sendToJMSQueue(..)) &&
         cflow(execution(* testCreateOrderOk()))
     {
          return;
      }
 
     void around() throws JMSException:
         call(* JMSUtil.sendToJMSQueue(..)) &&
         cflow(execution(* testCreateOrderWhenJMSException()))
     {
          throw new JMSException("some jms error");
      }
}

Note: I highly recommend using Eclipse 3.0M3 and the latest AJDT plugin if you wish to run this code. The AJDT works great for writing AspectJ projects.

That's quite nice and it works nicely but as you have noticed the test code is neither very easy to write nor to read. What I would love is a dynamic language geared towards writing these kinds of unit tests! Could Groovy provide that? That would be soooooo nice!

[ vmassol ] 14:49, Saturday, 30 August 2003

James Strachan has posted a news about starting a new dynamic language for the java platform called Groovy. He mentions using it for writing unit tests a la JUnit.

I think that's an excellent idea! On most projects I've been on, writing good unit tests often comes late in the development. Usually people write functional or integration tests but rarely unit tests which test code in isolation from the rest. And once the design best practices have been decided and lots of code has been already written, writing unit tests is hard because it requires severe refactoring and design changes. The hard part is the introduce mock objects in the code under test. Some code uses statics, other instanciate domain objects instead of them being passed to it, etc.

Of course, extreme TDD practictioners would say that it is much better to refactor the code under test so that it become more flexible and better able to support change. That's true but that's difficult to practice for the majority of projects and difficult to apply if the project has not been started the TDD way.

What would be nice is to have a dynamic language geared towards introducing easily changes to the code so that it can support the mock objects approach. There is a unit testing framework out there that does almost this; it's called AgileTest from Polygenix.

Maybe Groovy could extend the ideas from AgileTest and become a generic dynamic language to easily insert mock objects (among other things)? Sprinkle this with an AspectJ-like syntax and you've got something quite nice!

I know I'm probably changing the original idea but what I'd love to have is a language purely geared towards unit testing

What do you think?

[ vmassol ] 17:53, Tuesday, 26 August 2003

About 2 years ago, I discovered the Concept Map idea and I began to do a simple proof of concept as shown by the following diagram.

simple unit testing map

Note that more thorough examples are available from the NASA web site.

I would like to revive this idea and start drawing some concept maps about the JUnit ecosystem. I would like to have some of these maps in my JUnit in Action book but I would also like to set up a collaborative web site gathering all knowledge around JUnit using these concept maps.

This is all nice but the software I used then (IHMC Concept Map Software) is now very old (2001), the GUI is flaky and the software does not lend too well to distributed updating of the maps on the web (it uses a custom server which opens a specific TCP/IP port and thus does not easily allow updates behind a firewall, etc).

I need your help! My questions to you are:

  1. Do you know how this concept has evolved? Is it still a hot topics nowadays?
  2. Do you know of any software (preferable free) that would allow me to easily draw maps such as the one above and allows collaborative editing over the web?

Thanks!

[ vmassol ] 09:47, Sunday, 13 July 2003

I have been coding in java for the past 6 years and I thought I knew the language and platform quite well. Well, two days ago I was proved wrong as I discovered a new facet of it that I wasn't aware of: binary compatibility...

There is a good article called Evolving Java-based APIs by Jim des Rivieres.

It's a complex subject that not a lot of persons are aware of and which is extremely important if you're a framework writer. Let me give one example.

Imagine you have a ServletTestCase public API class (i.e. users can use it) and it inherits from an AbstractTestCase class. This latter class is not supposed to be used by end users. However, it has some public API methods accessed through the ServletTestCase by the users.

Now, if you think you can safely refactor the AbstractTestCase class you're wrong! For example, if you wish to split the AbstractTestCase class in 2: AbstractClientTestCase and AbstractWebTestCase , the second one inheriting from the first, you're in for trouble...

Indeed, code that has been compiled with the first version of the framework will have a reference to AbstractTestCase in its .class files (provided it was using methods inherited from AbstractTestCase ). Thus when you bring in the new framework and put it in the runtime classpath of your other code, it will fail with a NoClassDefFoundError !

What it means is that preserving binary compatibility is difficult and is something that strongly impacts how you design your APIs. Using delegation has to be preffered over class inheritance as it will allow you to change the implementation without breaking binary compatibility.

That's what was new to me. So far I had been designing simply using the OO concepts and choosing the pattern the most adapted. Now, I've discovered that I also need to introduce the binary compatibility aspect in the picture and that it drives my design choices!

Evolving Java-based APIs is really difficult...

[ vmassol ] 09:12, Monday, 2 June 2003

Some time ago, I blogged about Subversion. One of things I noticed was the lack of a good Eclipse plugin.

Daniel Bradby has informed me he has just released an Eclipse plugin for Subversion.

I haven't tried it yet... If you have I'll be happy to know what you think.

Update 22/07/2004: There is now a TortoiseSVN wrapper plugin for Eclipse which seems quite nice.

[ vmassol ] 11:05, Saturday, 17 May 2003

Having project documentation written in XML (xdocs) and stored in the project's CVS is great as it allows changing the style without chasnging the content. It also allows to easily write directly the docs in XML format

That said, lots of persons do find that writing xdocs is a pain. Moreover wouldn't it be nice if end users could easily contribute to the documentation?

Here's a solution:

  • Use a Wiki for the project's web site. But not any wiki. Use a wiki that stores it's web pages in XML format, such as MoinMoin (Here's an example of generated XML. Use View Source for IE users).
  • Then write a hook (script) in that wiki so that the modifications get saved to your project's CVS (or even better use CVS as the underlying storage for the Wiki). Question: is that directly supported by MoinMoin?
  • (optional). Use a CVS syncmail script to send CVS commit diffs to the project's development mailing list. This allows everyone subscribed to see the changes to the documentation (and to the sources of course).
  • (optional). Now that we have our XML doc sources saved in the project's CVS we can (if we wish) generate the docs in an HTML format different than the wiki we are using, for packaging them in the project's distirbution for example.

Nice, no?

[ vmassol ] 09:13, Tuesday, 13 May 2003

Scott Stirling has posted a nice followup of the initial StarTeam woes I posted some time ago.

[ vmassol ] 11:24, Monday, 12 May 2003

I knew I was missing something! For some time I've had this feeling that everyone was enjoying some way of getting interesting news that I wasn't aware of... Now I know it was true... :-)

I've, at last, discovered the joy of RSS syndication. This is a revolution for me. I was used to opening my browser every morning and scanning my favorite sites... No longer. I'm now using FeedReader, a very nice windows application that automatically polls your favorite RSS feeds and displays them in a nice manner.

The other nice features are

  • automatic pop up notification when news arrive
  • automatic scans for news every N minutes
  • start automatically upon windows start up in the systray
  • integrated browser (IE) for easy navigation

To get you started, here is my list of subscriptions (FeedReader format). Drop this file in your Documents and Settings/yourusername/Application Data/FeedReader directory or simply merge it with your existing subscription file.

Update:: I've just discovered that there are some other very nice RSS feed readers (I haven't tried them yet though): Syndirella and SharpReader.

Update:: Thanks to Scott Stirtling for the best RSS Feed Reader / News Aggregators Directory he's seen.

[ vmassol ] 23:07, Monday, 5 May 2003

This morning I've started reading about Subversion. I am very excited and have a few reservations. On the positive side:

  • Whenever you check out a file, the Subversion client puts a pristine copy of this file in a special .svn directory (equivalent to the CVS/ one for CVS). This means that whenever you later edit this file, Subversion is able to tell you without going to the server what changes you have made to the file. It is also able to revert your changes again without calling the server side. But the best is that when you perform a commit it is thus able to only send the difference whether you are working on text or binary files! That really rocks!
  • Ability to version anything (directories, etc)
  • Supports move of files, directories, branches, etc
  • Uses HTTP(S) and WebDAV (well, a specific version of it see below)
  • Some nice GUI clients (especially TortoiseSVN which I haven't tried but which looks very nice)
  • Lots of another nice features described on the Subversion web site

On the negative side, it is still missing the following to be perfect:

  • It does not support locks at all. CVS did not support locks but add the watch/edit feature which could be used (the current version of Subversion does not have this feature). Not having locks can be a pain when you have a big team working on binary files such as Word documents, Powerpint presentations, Rose .cat files, etc.
  • Missing a nice Eclipse plugin (there is some work in progress but nothing visible at this point in time
  • Does not really support the Generic WebDAV clients such as the WebFolder in Windows. Details about this can be found here.
[ vmassol ] 23:06, Monday, 5 May 2003
I have been working on a new project with the StarTeam SCM. Here are the drawbacks I have found when using Starteam vs CVS/Subversion:
  • No "clean checkout" option. That is, if a file is deleted from the StarTeam repository, even if you perform a checkout all, the deleted files will not be removed from your local working copy.
    • Update: Scott Stirling has told me that by using the StarTeam CLI, you can run stcmd update-status $st_opts "$STARTEAM_URL" -cmp $ST_DEST and then stcmd delete-local $st_opts "$STARTEAM_URL" -cmp $ST_DEST -filter N > /dev/null . This is still far far more complex that just doing a cvs update in CVS.
  • No server-side hooks. For example there is no way to get email check-in notifications (containing diff of the check-in as with CVS/Subversion). The way to do it is to manually go in the StarTeam client GUI, check the 'out-of-dates' files and perform a manual diff for each file that you wish to check...
  • StarTeam is very bandwidth intensive, especially if you work using the default locking mechanism. For every file that you need to edit, you'll need to acquire a lock which results in a network operation. There might be a mode to work without locks but that's not the way our StarTeam administrator has set it up (thus resulting in everyone chasing each other to release locks on files - quite improductive especially in a distributed development team...)
  • Very expensive especially when compared with CVS/Subversion. It is so expensive that we only have a few floating livenses which means we get disconnected every 15 minutes to let others use the repository... :-(
  • The StarTeam GUI does not show directories that are not in StarTeam, i.e. you don't know if there are directories that you have not yet committed!
  • StarTeam does not have nice IDE integration like CVS has and the StarTeam GUI client is far from being as nice as TortoiseCVS for example, which integrates seamlessly in the Windows Exlporer. BTW, we've had to extend the Maven Changelog plugin to make it work with StarTeam (as it was only supporting CVS)...
  • No user community, no place to easily ask questions, no dynamic...
  • I haven't found yet the equivalent of ViewCVS for StarTeam but it may exist