Think tank
[ vmassol ] 10:25, Wednesday, 19 December 2007

Jason Hunter was kind enough to set up a Markmail site for XWiki. Markmail is a mailing list archiving tool with a powerful search feature.

What sets it apart from other such tool IMO is the UI and speed/quality of search. I especially like the ability to see who's sending the most mails to a list and the nice syntax coloring display of emails (in addition to the thread view). Another nice feature is that emails are indexed a few seconds after they have been received by the list (compared to several hours with other tools). I love it :)

[ vmassol ] 08:19, Tuesday, 18 July 2006

It would be nice if there were a tool that could verify that you have correctly added @since tags for methods added in the current version. It would do this by checking against the previous release.

This tool could be based on Clirr or JDiff for example. It would also have an option to fail the build if there are new methods without a @since tag.

Do you know if such a tool exists?

[ vmassol ] 13:52, Monday, 17 July 2006

The experience that I'm relating here is part of an exploratory refactoring that I'm currently doing on the Cargo code base. Till now we were using Java File objects for representing J2EE archives or container installation and configuration directories. This is ok but it makes unit testing a little bit complex when it comes to unit testing File operations. The reason is that you need to define a location on your local file system where you're going to read/write files to, clean up the files, etc.

Here's a method we had (it expands a JAR file):

    public void expandToPath(String path) throws IOException
    {
         File workDir = new File(path);
         JarInputStream inputStream = getContentAsStream();
         
         byte[] buffer = new byte[40960];
         
         ZipEntry entry;
         while ((entry = inputStream.getNextEntry()) != null)
         {
              String entryName = entry.getName();
              entryName = entryName.replace('/', File.separatorChar);
              
              String outFileName = workDir.getPath() + File.separator + entryName;
              File outFile = new File(outFileName);
              
              if (outFileName.endsWith("/") || outFileName.endsWith("\\"))
              {
                   outFile.mkdirs();
               }
              else
              {
                   if (!outFile.getParentFile().exists())
                   {
                        outFile.getParentFile().mkdirs();
                    }
                   
                   if (!outFile.exists())
                   {
                        outFile.createNewFile();
                    }
                   
                   FileOutputStream out = new FileOutputStream(outFile);
                   int read;
                   while ((read = inputStream.read(buffer)) > 0)
                   {
                        out.write(buffer, 0, read);
                    }
                   
                   out.close();
               }
          }
         inputStream.close();
     }

Here's how I've transformed the method by removing all File operations and instead introducing a FileHandler interface with the following methods, equivalent to the File ones:

  • append(URI, String): appends a suffix to a URI
  • mkdirs(URI): create directories for the URI
  • exists(URI): return true if the URI exists
  • createFile(URI): create a file
  • getOutputStream(URI): get an output stream for the passed URI
    public void expandToPath(URI path) throws IOException
    {
         JarInputStream inputStream = getContentAsStream();
 
         byte[] buffer = new byte[40960];
 
         ZipEntry entry;
         while ((entry = inputStream.getNextEntry()) != null)
         {
              String entryName = entry.getName();
  
              URI outFile = getFileHandler().append(path, entryName);
  
              if (outFile.toString().endsWith("/"))
              {
                   getFileHandler().mkdirs(outFile);
               }
              else
              {
                   if (!getFileHandler().exists(getFileHandler().getParent(outFile)))
                   {
                        getFileHandler().mkdirs(getFileHandler().getParent(outFile));
                    }
   
                   if (!getFileHandler().exists(outFile))
                   {
                        getFileHandler().createFile(outFile);
                    }
   
                   OutputStream out = getFileHandler().getOutputStream(outFile);
                   int read;
                   while ((read = inputStream.read(buffer)) > 0)
                   {
                        out.write(buffer, 0, read);
                    }
   
                   out.close();
               }
          }
         inputStream.close();
     }

The interesting part comes now. Because it was a bit hard to create a unit test for the original expandToPath method nobody had done it. It would have involved passing a test JAR but more difficult it would have involved passing a target directory where the JAR would be expanded. This is not easy as the location of this target dir would depend from where the tests is executed and making it work seamlessly from both a build tool and from your IDE is not trivial. Here comes VFS to help us. By implementing the FileHandler interface using VFS, we can now write the following unit test:

    public void testExpandToPath() throws Exception
    {
         URI jarURI = new URI("ram:///test.jar");
 
         FileObject testJar = VFS.getManager().resolveFile(jarURI.toString());
         ZipOutputStream zos = new ZipOutputStream(testJar.getContent().getOutputStream());
         ZipEntry zipEntry = new ZipEntry("rootResource.txt");
         zos.putNextEntry(zipEntry);
         zos.write("Some content".getBytes());
         zos.closeEntry();
         zos.close();
 
         DefaultJarArchive jarArchive = new DefaultJarArchive(jarURI);
         jarArchive.setFileHandler(new VFSFileHandler());
 
         jarArchive.expandToPath(new URI("ram:///test"));
 
         // Verify that the rootResource.txt file has been correctly expanded
         FileObject rootResource = VFS.getManager().resolveFile("ram:///test/rootResource.txt");
         assertTrue(rootResource.exists());
     }

Notice the use of the "ram:" URI scheme. This one of the many filesystems supported by VFS and it means that all file operations will happen in a virtual file system in memory. Also note that VFS doesn't currently support creating Zip files so we're using the JDK's ZipOutputStream API. The nice thing is that as this test operates in memory there's no need to define a target location on the file system.

The other nice thing is that by introducing VFS to this expandToPath() method it's now possible to expand a JAR to any file system supported by VFS. We could thus expand to a FTP server, to a WebDAV repository, to an HTTP URL, to a remote machine using SSH, etc. All this without changing a line to our code. Nice isn't it?

[ vmassol ] 09:54, Thursday, 13 July 2006

(Updated 2006-07-14: Added section on discovering modules and added disclaimer at the end)

IntelliJ IDEA has revolutioned the IDE landscape by adding "intelligence" to IDEs. A few days ago I did a thought experiment by asking myself the following question "how feasible would it be to build a project without knowing any meta-data about it?". In other words, is it possible for a build tool to be intelligent enough to build a project without build files nor POMs. Said differently, is it possible to figure out a project's POM automatically? Let's review some required typical meta-data information and see how they could be guessed.

Source locations

It is possible to guess where sources are by looking for *.java files (for Java projects - The same applies for other project types). Now we still need to differentiate main sources from test sources but that's also relatively easy to do. We can check for classes extending JUnit's TestCase for example or the TestNG equivalent, or any other well-known testing framework.

Note: An interesting thing here is that to be intelligent we'd need the help of the community to add new rules to the discovery process. For example imagine that a new testing framework appears; we'd need to add it to the Test Discovery Rules. Thus, this type of intelligent build system would need to rely a lot on the community and thus would need to get its data from an online repository that could be edited by the community.

Dependencies

How do we detect project dependencies? One relatively way is to parse the sources that we have found above and find all external imports. Then query ibiblio to find matching package names (this information is present in Maven POMs on ibiblio). Now for guessing the version, there's no easy magic. A first approach would be to get the latest released version of the dependencies we've found.

Project type

Project types can easily be guessed by looking at some files. For example if a web.xml file is present then it's a WAR project, if an application.xml one is found then it's an EAR project, if a jnlp file is found then it's a JNLP project, etc.

SCM

SCM can easily be guessed by looking for special files on the filesystem of the project. For example we would look for .cvs directories for SCV and for .svn files for Subversion, etc

Developers

Once we got the SCM URL we can then query the SCM to get the list of all developers.

Project name

The project name could be the name of the top level directory and the version could be set arbitrarily to 1.0. Actually we could even check ibiblio to see if the project is already on ibiblio, get the latest version there and increase the minor number by one as a first order guess. Another strategy would be to query the SCM and look for tags and deduce existing versions by parsing those tags (there are some usual conventions for naming tags so it should be possible to make a good guess).

Modules and artifacts

Discovering the different modules of a project is probably one of the hardest thing to do. If you look at different projects in the wild I believe there are not that many directory structures out there. Maybe 10-15. Thus it should be possible to register knowledge of these structures and let the tool discover which ones matches the closest with the project at hand. This would also allow to deduce the different artifacts that have to be generated. Of course it won't be perfect as there are projects which generate several artifacts and which may be in the same module. Again it's a question of doing 80% of the job and leaving 20% to be done manually.

Additional information

Of course, the information found above are just guesses. In most cases they could be correct but of course we would need to offer a way for the user to edit them and to add any missing information.

Conclusion

I believe it should be possible to create such an intelligent meta-build project which could be used to generate files for one of the existing build system such as Maven, Ant, etc. For example it could create an internal POM file on which Maven could then be executed to produce the build results. At a minimum such a tool could be used to convert existing projects to Maven. I wonder how intelligent it could be but I guess it could go pretty far.

Disclaimer: Of course, such a tool would be bad from a conventions stand point. One of the great strength of Maven has been to standardize the directory structure of projects. I can go to any Maven project and I know exactly where stuff will, what will be generated, etc.

Are there other information which you think could be guessed automatically? Can you think of better algorithms to guess some of the information shown above?

[ vmassol ] 10:07, Friday, 17 March 2006

Current wikis are great. However when used as development wikis I have found some limitations which are hampering their use. Please note that my experience is based on using Confluence and XWiki and other wikis may support some of the features mentioned below. Here's my top wishlist for development wikis and for Confluence and XWiki in particular:

  • Moderated wikis. Right now there are only two choices for a wiki: either they are open and anyone can edit a page or they are closed wikis and you need to register and get the rights to make modifications. For example most spaces on the Codehaus wiki are closed. They were initially open but vandalism was too high and we had to close them. This is hampering documentation contributions. A moderated wiki would alleviate this: when the page is saved, an email would be sent to a list of moderators for the space for approval of rejection (either by responding to a certain email address as for mailing list moderation or by clicking on a link in the email). Ideally, clicking on the validation link in the email would open the page in a browser with the modifications highlighted so that the moderator could make some changes before clicking on the save button.
  • Anonymous edits. Although this feature already exists, I'd like wikis to add 2 fields when anonymously editing a topic: a user name and an email address. The idea is make it even easier to contrinute to a wiki. If the wiki is moderated as explained above, moderators would receive an email. The idea of the username and email is to allow the moderator/community to discuss with the contributor if need be and to give him credits. These 2 fields would obviously be optional and there should be a text on the page explaining that the email will not get displayed on the wiki and that filling the fields will allow credits/acknowledgment to be given.
  • Diff notifications. Most wikis allow some form of space watch but the wikis I have used still do not offer the possibility to send notifications in a text diff format (wiki markup diff is good enough). For a development wiki, the idea is to send diff notifications to the development mailing list so that all developers are aware of wiki page modifications.
  • Daily notifications. This is also supported in some wikis but what I would like is the ability to watch a single space and to aggregate changes in that space (using the diff notification format mentioned above). Please note that Confluence does not support this as it requires you to modify all other spaces permissions so that the user doing the watch has no view rights on the other spaces, which is not usable for example on wikis such as the one on Codehaus which have hundreds of spaces.
  • In place comments. The idea is again to lower contribution by allowing wiki users to highlight a portion of text in their browser and to associate a comment with it (like a post-it). There would be an option to turn on/off these comments. It's easier for a user to highlight a line and put a comment like "I don't understand this sentence" or fix a typo rather than have to use current the type of comments at the bottom of a page. Note that this is similar to how word processors such as Word allow adding comments to a document.
  • Patch handling. I'd like the ability to make modifications to a page and then instead of saving, have the ability to click on a "Generate patch" button which would generate a text file in wiki markup diff format. Then there would need to be a "Apply patch" action that can be done on a page. This would allow using wikis for project development web sites and allow contributors to provide documentation patches along with code patches. This is currently a big pain when using a wiki as a project development web site.

I have quite a few other suggestions for improvements but I feel those are the major ones when it comes to using a wiki as a project development wiki. Let's hope wiki vendors are listening... :-). Are these also on your wish list?

[ vmassol ] 13:48, Sunday, 12 February 2006

I see 2 use cases where ensuring binary compatibility is a must:

  • When you're developing a framework, i.e. a piece of software meant to be used at an API level by other developers. In that case, breaking binary compatibility is not something to do lightly.
  • When working in a large team it's common to define "interface" projects that represent the contracts to be followed by the different teams. In that case breaking the binary compatibility in an "interface" project is something that has to be planned and organized.

Enforcing binary compatibility in the build

The automated build is a nice place to enforce binary compatibility as the build is something executed by the indiviudal developers before checking-in and it's also executed by the continuous integration build. Thus any binary incompatibility can be quickly discovered. Or course this doesn't replace tests which can also help discover breakages. However the problem is that with all the nice refactoring IDEs we have now, it's easy to refactor the tests at the same time as the code and thus introducing a binary incompatibility is not always noticed.

A good strategy to discover an incompatibility is to compare the current code with the latest released code. This is what Clirr is doing. Clirr currently sports an Ant and Maven1 integration. The good news is that there's a Maven2 plugin in the work (more on that when it's released). However using a tool is only good if there's a strategy behind it.

Strategy for using Clirr

Here is what I believe can be done to automate binary compatibility checks in the build:

  • Start by organizing your packages so that you clearly demarcate the user-public API from the SPI from the internal implementations. You'll probably want to fail the build only on the user-public API (and possibly on the SPI too but that's probably a lower severity).
  • Use Clirr to make your build fail upon violation on the user-public API.
  • After discussing with the team and possibly with users, decide whether you wish to allow the binary incompatibility. Always consider going for a deprecation cycle. If you choose to allow the incompatibility, register it in an exception file that you pass to Clirr so that it builds without choking on those errors (Note: I believe Clirr needs to be improved to better support exceptions not only at the file level but at the violation level).
  • When the release time comes, you'll have a nice file listing all the binary incompatibilities. Include it in the release notes so that your users know what to expect and even better, for each incompatibility add a description that explains how to modify the user code to use the new version of the API.

Note: On the Cargo project we've tried to do this, even though there's still room for lots of improvement. Actually our main issue on Cargo is not detecting binary incompatibilites but rather deciding to release a 1.0 version which would mean that from then forward we would aways look for a deprecation solution rather than break binary compatibility. We've always pushed back this 1.0 release because our API has been changing quite frequently but we're now nearing a 1.0 version. When that comes we'll turn Clirr on to fail the build upon breakage. I'll let you know how it goes...

[ vmassol ] 08:55, Saturday, 21 January 2006

I'm currently writing my third book and I'm starting to notice a pattern. Whenever I write a book about a tool/framework to which I have access to the sources, the code ends up being better.

The way I work goes like this: I start writing about a topic. If it's taking too long to explain it, I consider that something is wrong about the code. I modify the source code so that the document I'm writing has the minimal required size to explain the topic.

The good thing with a book is that what you're explaining has to be simple and not convulted which leads to this nice effect of improving usability of your code. I get a bit of the same result when I write project documentation but not to the same level. This is probably simply because writing a book is a more involved process, you dedicate more time to it and thus you want it to be as perfect as possible (and thus as readable as possible).

I guess nothing here is new. This is all about having a user of your code. Tests are "users" of your code and thus leads to better design. I guess documentation can also be a "user" of the code and thus help improving it.

If you're writing some framework/tool, consider writing a book for it and if you're diligent in your writing your code will end up being better! As an added benefit your users will love you... :-)

[ vmassol ] 18:30, Friday, 4 November 2005

Amazon has released a beta of the Mechanical Turk. It allows a program to programatically ask a question to a human and wait for the answer. Here's an example (copied from Google Blogoscoped):

read (photo);
photoContainsHuman = callMechanicalTurk(photo);
if (photoContainsHuman == TRUE) {
   acceptPhoto;
}
else {
   rejectPhoto;
}

This is really like the Matrix except that the humans get paid a little bit of money (but in the end that's close to getting fed) and it's other humans that controlling the programs... until we have web services using other web services using the Mechanical Turk. Then who's controlling who is going to be hard to decide :-)

Source: Google Blogoscoped.

[ vmassol ] 18:49, Wednesday, 26 October 2005

I'm working on automating a J2EE build using Maven 2 and I'm in need of a Maven 2 plugin to do the following:

  1. load a database schema in the instance
  2. load data in the instance
  3. start/stop a database instance
  4. ability to create an instance from scratch

The ideal situation would be to find an existing Java framework that would already perform all or some of those steps. Then I could easily create a Maven 2 plugin wrapping it. So far I haven't been able to find such a tool. If you know any please suggest them!

Here's what I have found so far below. Please note that I have probably made mistakes while filling this table and I'd be happy to be corrected...

Load schema Load data Start/stop instance Create instance Comments
DBunit  
DDLUtils I think DDLUtils is the old commons-sql project.
Derby ij ij 10.1.1.0 requires the db2jcc.jar which is not on Ibiblio. I need to check the license to see if it could be uploaded.

Again, let me know if you know some tools that are not listed here.

If no such tool exist, an idea I have would be to add support for databases in Cargo. Indeed Cargo is meant for manipulating any kind of containers. It happens that the first type of container we've implemented are J2EE containers but it should work for any other type and the interfaces should remain the same.

WDYT?

[ vmassol ] 18:19, Wednesday, 13 July 2005

I've just tried today Copernic Desktop Search (CDS). I've been using Yahoo Desktop Search (YDS) for several months now and I'm very happy with it. It has some issues though like it's putting my laptop on its knees when it performs indexing, it has no Windows taskbar integration, etc. I wanted to see how CDS fared against YDS.

Here are my findings after one day of using CDS. Please note that this is definitely not long enough to have a definitive opinion on the topic but I thought I'd still share what I've learnt today.

General opinion

CDS is a very good desktop search. I was very impressed. It seemed perfect at first and then slowly I started finding some little flaws compared to YDS. Still it is extremely good. It has all the features you'll find in YDS and Google Desktop Search (GDS).

Pros of CDS vs YDS

  • Integration with Windows taskbar
  • Low resource for indexing. It is not slowing my laptop when indexing. That's very good!
  • Immediate scanning of new resources. If you receive an email for example, it is immediatly available for searching. No need to wait for the next indexing.

Cons of CDS vs YDS

  • No vertical layout for views (as there is in YDS). This means that you cannot fully the message being previewed
  • No "All" categories search. You have to choose the category you wish to search (emails, files, contacts, etc)
  • No as-you-type results
  • No possibility to choose the columns to display (for exemple email folders or email size). There are only a few basic columns
  • Slower to search and display items than YDS. It was very fast initially and it quickly became slow and very slow as indexed items increased
  • XML preview is using IE engine on Windows and thus there are lots of XML files that don't display correctly

Some minor details:

  • Delete key does not work to suppress an email
  • Cannot select different emails (to suppress them for example)

Conclusion

If only it could have a better view layout and be faster to display results it would be perfect. Its killer features are really its CPU-friendly indexing for me and the immediate availability of new resources in searches.

I've just noticed that YDS has released verson 1.2beta yesterday and I'm installing it. For now, I'll still keep using YDS which is still my favorite. YMMV.

[ vmassol ] 15:22, Saturday, 30 April 2005

Clirr is one of these tools that would deserve to be known better. I have mentioned it several times in other posts but it's really the first time I get to use it in real. It rocks! I'm about to release Cargo 0.5 and I wanted to get an exact list of the API modifications we have done compared to version 0.4.

Here's the kind of output Clirr gives (the full output is available here):

ERROR: 8001: org.codehaus.cargo.deployment.DefaultJarArchive: Class org.codehaus.cargo.deployment.DefaultJarArchive removed
INFO: 8000: org.codehaus.cargo.module.DefaultJarArchive: Class org.codehaus.cargo.module.DefaultJarArchive added
ERROR: 7002: org.codehaus.cargo.container.Container: Method 'public void addDeployable(org.codehaus.cargo.container.deployable.Deployable)' has been removed
INFO: 7011: org.codehaus.cargo.ant.ConfigurationElement: Method 'public void addConfiguredEar(org.codehaus.cargo.ant.EARElement)' has been added
INFO: 4000: org.codehaus.cargo.container.jetty.JettyStandaloneConfiguration: Added org.codehaus.cargo.container.configuration.StandaloneConfiguration to the set of implemented interfaces
ERROR: 7005: org.codehaus.cargo.container.Container: Parameter 1 of 'public void setConfiguration(org.codehaus.cargo.container.Configuration)' has changed its type to org.codehaus.cargo.container.configuration.Configuration
ERROR: 7006: org.codehaus.cargo.ant.ConfigurationElement: Return type of method 'public org.codehaus.cargo.container.Configuration createConfiguration(org.codehaus.cargo.container.Container)' has been changed to org.codehaus.cargo.container.configuration.Configuration
ERROR: 4001: org.codehaus.cargo.container.jetty.JettyStandaloneConfiguration: Removed org.codehaus.cargo.container.Configuration from the set of implemented interfaces
INFO: 7003: org.codehaus.cargo.container.spi.AbstractConfiguration: Method 'public void configure()' has been removed, but an inherited definition exists.
ERROR: 5001: org.codehaus.cargo.container.deployable.EAR: Removed org.codehaus.cargo.util.MonitoredObject from the list of superclasses
INFO: 5000: org.codehaus.cargo.container.deployable.EAR: Added org.codehaus.cargo.util.monitor.MonitoredObject to the list of superclasses
ERROR: 7012: org.codehaus.cargo.container.Container: Method 'public java.io.File getOutput()' has been added to an interface
INFO: 7010: org.codehaus.cargo.container.spi.AbstractContainer: Accessibility of method 'protected java.io.File getOutput()' has been increased from protected to public
INFO: 6000: org.codehaus.cargo.container.property.GeneralPropertySet: Added public field JVMARGS

Even though we're using JIRA with an Iteration-Driven Development strategy (IDD) it was still a very interesting exercise to verify that we had not missed any issue by running Clirr on the source code. In addition, it provides a more detailed view of what exactly has changed in term of API which our JIRA report does not provide.

The next step would be to use it to fail our build whenever someone introduces a public API break. It would be quite easy for us because we've cleanly separated non-public API from public APIs by using internal packages (see the Cactus API design rule to see what it means). Of course sometimes, you want to voluntariy add a breaking change. That's legitimate but it has to be controlled. The strategy would be to have the build fail and then if the change is voluntary to exclude it from Clirr.

Well done Lars!

[ vmassol ] 16:29, Thursday, 7 April 2005

Where is Ant heading in the future? I would be very interested to learn more about this. I've been using Ant for several years now and I've always been a happy user. However these days, I'm no longer using much the XML scripting side of Ant but I'm using heavily the Ant Java API; what I'm interested in are the Java Ant tasks.

I think this is really where the value of Ant is. All those years of implementing the base building block for a portable OS Java API have created a very useful Task set. I think every Java application that needs to do copying, deleting a directory, spawning a Java application, etc should use these tasks. There's no point in reinventing the wheel!

For example, you may think that deleting a directory is simple. But it's not so easy. Have a look at the Delete Ant task source code. You'll find portion of code like this one:

/**
 * Accommodate Windows bug encountered in both Sun and IBM JDKs.
 * Others possible. If the delete does not work, call System.gc(),
 * wait a little and try again.
 */
private boolean delete(File f) {
     if (!f.delete()) {
          if (Os.isFamily("windows")) {
               System.gc();
           }
          try {
               Thread.sleep(DELETE_RETRY_SLEEP_MILLIS);
           } catch (InterruptedException ex) {
               // Ignore Exception
           }
          if (!f.delete()) {
               if (deleteOnExit) {
                    int level = quiet ? Project.MSG_VERBOSE : Project.MSG_INFO;
                    log("Failed to delete " + f + ", calling deleteOnExit."
                        + " This attempts to delete the file when the ant jvm"
                        + " has exited and might not succeed."
                        , level);
                    f.deleteOnExit();
                    return true;
                }
               return false;
           }
      }
     return true;
}

Would you have thought about this? Probably not and you would have been right not to as this only happens in some rare occasions. But when one of your users reports it, it's going to be darn difficult to identify and fix. Personally I'd rather depend on a stable and well tested library rather than recode it myself.

The problem is that the Ant tasks are a bit too much linked to the execution engine (the XML scripting engine). For example reusing an Ant tasks requires you to create a Project object. This in turn drags loggers, the Ant classloader (in some cases) and possibly other objects. I know it's possible to use Ant from Java (I've been doing it for a long time now) but I'd love it be even easier to do so.

Instead of writing:

Project project = new Project();
Expander expander = project.createTask("unzip"); 
expander.setSrc(new File(zipfile)); 
expander.setDest(new File(destdir)); 
expander.execute();

I'd like to be able to write:

Expand expand = new Expand();
expand.setSrc(new File(zipfile));
expand.setDest(new File(destdir));
expand.setLogger(myLogger);
expand.execute();

I don't want to see the get/setLocation, get/setTaskName(), get/setDescription() and in general all methods from Task.java.

What I'd love to see is Ant moving in the direction of providing completely reusable Tasks that have 0% dependencies on the Ant engine. This means that loggers, classloaders would be passed to the Ant task by the program who uses it.

I'd like to see Ant provide 2 distributable jars: one containing the XML scripting engine only and one containing all the pure java beans Ant tasks that can be reused in any Java application.

I'd like to see Ant separate into 2 subprojects: one for the XML scripting engine (let's call it engine) and one for the Ant tasks (let's call it tasks). The reason for the 2 projects is to ensure there's no dependency in the direction tasks->engine.

I'd like to see Maven2 use those completely reusable Ant tasks instead of recreating them (this is a wish I'm addressing to both projects, not just Ant! :-)).

I'd like to see those Ant tasks being a JSR and incorporated in a future version of the JDK, thus providing a higher level API that the best classes from the JDK.

Is that where Ant is heading today?

[ vmassol ] 11:21, Friday, 11 March 2005

I may be dense but I've just realized today that there is a potentially simple way to increase participation to an open source project. That's always been one of the questions on my mind: how do I make my open source projects more successful? For me a successful open source project is one which has a rich developer community. How do I make this possible? There are of course several ideas to make this happen but the one that dawned on me this morning is that the project has to reduce its complexity (by making it more modular for example).

Indeed, the barrier to participation is often due to the fact that a user who wants to participate will need to understand the whole design, how the different classes are entangled, what effect a change here will have on the rest of the project, etc. Thus, if we make the project more modular a contributor who wants to participate will only need to understand the design of a given 'module'.

A 'module' would need to have some good-to-have characteristics:

  • Very loose coupling with other modules
  • Clearly defined and *published* interfaces. There should be some documentation on the project's web site explaining them and a tutorial showing how to implement new modules (or swapping a module implementation by another one) for example.
  • Separate builds so that it's easy to build only the module (this can be alleviated if the master build is easy to use (i.e. no property tweaking necessary, it just builds - As it's the case with good Maven builds... ;-))
  • Separate documentation on the web site, so that the website itself is modular and the complexity of each module is hidden in that module's web site. Thus the top level web site would be quite simple only listing what the project does as a whole and listing the different modules

Interestingly one way to implement the 'very loose coupling with other modules' characteristic is by using a Service Architecture. This can be done for example by using the Dependency Injection pattern and/or using a lightweight container - PicoContainer, Spring, etc).

This is probably obvious stuff but I've just realized that it's not only good design practices but that it'll also help open source projects attract more contributions. Of course that leads to another topic which is when to accept contributions and how to maintain them in the long run but that would be another discussion...

[ vmassol ] 09:28, Friday, 11 March 2005

Continuing with my current build-mania, I'd like to propose the idea of a distributed build architecture. I'd love to see my favorite continuous integration tools (CruiseControl, DamageControl and Continuum in the future) support this notion in the future (I know they're thinking about it already!).

So what is the need for a distributed build?

I can see several use cases:

  • building on several JDKs
  • building on different OS platforms
  • building with different environment setups (for example building with different application servers, different browsers, different databases, etc) to validate that a product integrates well with various environment setups
  • delegating the build load on several machines when the build starts to take too long (of course, the first solution should be to try to lighten your build as much as possible)

A proposed architecture

Disclaimer: this only ONE potential solution. There are lots of other solution probably even more valid than this one. Please feel free to add your ideas as comments to this post.

It could work as follows:

  1. The central build machine (aka the build orchestrator) decides to start a build. The orchestrator can be one existing continuous integration tool like CruiseControl, DamageControl, etc. They can trigger a build on anything they want: time-based, change-based, manual, continuously, etc. The orchestrator sends a build request to the space. The request contains all the information about the requested build (e.g. JDK to run on, OS to run on, App.Sever/DB/etc to run on)
  2. The space holds all requests. It chould be a good idea to provide a browser to see pending requests (preferably using a simple HTTP browser so that people who wish to contribute can see what type of builds are required). In any case it's important that the space be transactional (Note: I'm not sure about the word "transactional". What I mean is that a request cannot be read by several build agents at the same time)
  3. Build agents listen on space build requests objects that match their capabilities. Using Jini/Javaspace would be nice here because (among other things) agents would be able to easily listen to requests with Jini attributes (OS, JDK, etc). Once they read a request they start a local build and publish the result to the space as a Result object
  4. The build orchestrator listens to Results object, and generate result reports, aggregating all results. Build results could contain anything required: result of the build, logs, generated artifacts, etc. The orchestrator gets the data from the Result object and perform usual build operations (publishing, build result notification).

Of course there would be several details to sort out, like should we send 2 Requests object for each build need so that we can compare the results and only accept the result if they match, etc.

Conclusion

I think this type of distributed build could be especially interesting for open source projects in order to build an active community around a project. This would be yet another way in which people can contribute to an open source project: by lending some of their machine CPU to perform continuous integration builds of this project. This usually makes sense as open source projects may be low on hardware resources and lending some would help. Of course it also bring its challenge of security issues that would also need to be implemented...

Would you like such a distributed build system? I personally prefer this architecture over one where the orchestrator directly sends build requests to build agents as I find it more scalable and more flexible.

[ vmassol ] 10:44, Wednesday, 12 January 2005

The concept

The typical local builds that developers run on their machines work by building the subproject they're working on but also all the dependent subprojects it requires. Usually, as building all dependent subprojects takes a lot of time, the developer infrequently checks-out other project sources and build them on demand. His focus is on his subproject that he's making modifications to (and rightly so!). This strategy has the following drawbacks:

  • Setting up the build on a new fresh machine is complex and takes time. Indeed you have to check out all the top level project sources and build all projecets one by one until you reach the subproject you're concerned with.
  • It doesn't scale too well. Your local build starts taking tens of minutes which does not encourage running it that often. And if you do, you don't rebuild all the subprojects even though there are probably lots of changes that have been made by other coworkers. Thus, you're increasing the possibility of an integration break (breaking your other coworkers when they integrate your changes).
  • When someone from another team inadverently breaks your project's build, you'll have to switch context (i.e. stop what you're doing) and help out to restore the master build. If this happens unfrequently, it's probably fine and even positive (as it increases team collaboration ;-)). However when it happens frequently (which is bound to happen as the team grows), you'll start suffering from it...

Because of all these problems, I have been using a different approach on my current project for the past 2 years. This was mostly motivated by the fact that the project is a big project (close to 100 developers) and we were hitting the issues mentioned above. I have called this strategy "Binary Dependencies Build". If you're interested this is an approach I have presented both at TSSS2004 and at Javapolis 2004.

Here is how it works (click on image for a larger picture):

Imagine that you have a "trading" subproject that depends on 2 other subprojects ("partners" and "referenceData"). The idea is that your local build will NOT build them from sources but instead will download their latest version that work from a remote artifact repository (a location where the result of the subproject build is located). In order to accelerate even further the build, the versions downloaded are stored locally. In our example, the latest "partners" jar is already available locally and is thus not downloaded but the "referenceData" one is not. It is downloaded and then stored locally. The "trading" subproject is built using these binary dependencies.

This is all fine but there is a burning question: How do I do continuous integration with such a system? Won't the binary dependencies be old versions when I get them? The solution to this is to have a continuous build server that continuously build subprojects and puts their artifacts in the remote repository. Note that there are put in the repository only if their build passes with no errors. This ensure that there are always fresh versions available and that they are as "good" as they can get.

Doing it with Maven

The good news is that this feature is built in Maven. Maven implements this support of artifact repositories (local and remote) and it supports the process of automatic download of artifacts not available in the local repository. Usually Maven will verify first in the local repository if the artifact's version exists and if so will use it. However, if an artifact's version contains the "SNAPSHOT" keyword, Maven will always check if there's is a more uptodate artifact in the remote repository. This allows implementing easily the strategy defined above.

Conclusion

We've been very happy with this solution so far. I think there are 2 key points in making this work:

  • A good build that provides assurance that the binary artifacts are working. Indeed we've experienced that our subproject build was not always good enough to qualify how "good" was a jar artifact. This was usually caused by the non-existence of automated functional tests which meant that even though the build was passing the jar was not working when executed on the developer's machine. The solution is of course to include integration/functional tests in the build (at least the master CI build).
  • A quick master build. It's important that it generates fresh jar artifacts as quickly as possible so that CI can happen as often as possible.
[ vmassol ] 12:28, Friday, 31 December 2004

Here's a non-ordered list of the main problems causing build-breaks that we had found on the current project I'm working on (Note that this list is now a year old and that we have fixed some of them - Unfortunately the majority still remains...). I've added some possible ideas on how to fix them.

  1. Build takes too long to execute (and thus it is executed less often)
    • Fix the build by having more subprojects with binary dependencies and/or streamline the build to ensure that only important build steps are run. Optimize it (f.e. offer different goals/targets: one for a clean build and another one that does not perform a clean).
  2. Local build not executed
  3. Public API breakage in dependent project without warning
  4. Not enough continuous commits (all packed up at end of iteration)
    • Team meetings to explain more the importance of continuous integration. Complementary idea: "unbreakable builds". The idea is that if you keep your changes to yourself and accumulate them, whenever you'll want to commit them, the unbreakable build will likely reject your changes as they will break some other part of the code. Thus you'll need to spend several days to talk to other developers to not only fix your code but also fix theirs. Normally after doing this several times, you should understand that it is in your best interest to commit frequently.
  5. No functional/integration automated tests (f.e. no local verification of ejb-jar deployments)
    • Automated functional tests! Build a suite slowly over time, improving it at each iteration. And maintain it! Decide on a good data handling strategy (this is usually the main issue). Ensure that your data strategy keeps everyone in sync WRT DB data.
  6. Commit problems (Forget to commit some files, Pb due to SCM tool - Starteam: new directory do not appear in Starteam view!)
  7. Devs “building” with IDE but forgetting to use the automated build
  8. Checkstyle errors failing the build
    • Coaching. More team meeting to decide what checkstyle errors we want to fail the build or not. Get a strong team buy-in. Complementary idea: "unbreakable builds".
  9. Failing unit tests
    • It probably means that the unit tests are actually integration tests depending on database data. Ensure that unit tests are quick and fast and independent of the environment. Complementary idea: "unbreakable builds".
  10. [Maven] project.xml not up to date and missing dependency
    • SCM diff emails on check-ins (team by team) in order for everyone to have the knowledge of what's happening. Complementary idea: "unbreakable builds".
  11. Database data modifications (voluntarily or involuntarily) leading to test breakage
  12. Continuous build not cleaned between different runs
    • Fix it. Perform a clean build from time to time.
  13. Local SCM update not done before local build (in order to get the latest files)
    • SCM diff emails on check-ins (team by team) in order for everyone to have the knowledge of what's happening. So you'll know better when to update our local workspace. Complementary idea: "unbreakable builds".
  14. Environment differences in local build vs central build
    • Work continuously towards making the developer's environment as close as possible as the integration environment. Complementary idea: "unbreakable builds". This allows executing the build on the server and thus it runs in the same environment as the continuous build.
  15. No local deployments done before commits (f.e. no EJB deployments)
    • Coaching (in order to ensure that developers do perform deployments on their machines before check-in) + add some checks in the build to automate the verification (they can be f.e. some hand-picked functional tests).
  16. Checkstyle errors hidden in tons of warnings
    • Fix it. Newest versions of Checkstyle allow filtering on severity.
  17. Non-atomic commits and central build starting with in-flight commits
    • Use a scheme a la CruiseControl (wait for some inactivity time on the SCM before trigging a build). Or change the SCM (for subversion for example). Note: We have tried to use CC with StarTeam but even though the infrastructure team increased CPU + RAM, StarTeam falls when it is polled by 3 or 4 CC builds in parallel... (Solution: Dump ST or ask Borland to come and tune the parameters). Complementary idea: "unbreakable builds". This forces "atomic" commits.
  18. [Distributed development] rsync issues: sometimes jars are corrupted or lost
    • Fix the rsync process (Note: this is now no longer happening I believe)
  19. [Distributed development] VPN instability making it difficult to SCM-update
    • Fixed mostly. However usage of Starteam is still extremely slow making hard to SCM-update from remote. Solutions: Use a less bandwidth/responsetime consuming SCM (f.e. Subversion), increase bandwidth (but the issue is mostly with response time which cannot be changed), or use a replication mechanism (I don't like this as I believe it introduces its own issues - I really much prefer everyone working directly on the same repository, especially as I know it works (I've done it in the past using CVS with a team of 30 developers and it was working fine).
  20. Errors when executing the application
    • This is because there are no automated functional tests. Automate them!
[ vmassol ] 15:18, Wednesday, 29 December 2004

Let's create Unbreakable Builds

Out of my last two development projects, one had a strong sense of quality and excellence in general and continuous build failures were the exceptions (about 3-4 per week for a 30 developers team) and the other one was quite the opposite and everyone was surprised when the continuous build was passing (there were about 5 build breaks a day as an average for a 40 developers team). I'm sure this is also pretty common to other projects. Obviously the best is to build (pun intended) a build awareness in the team. However, you'll need strong evangelists for this to happen who may not always be available and other circumstances may make this difficult.

A thought struck me about a year back: what if we were able to prevent the continuous build from failing by design. There's a French saying that goes something like "it's better to cure than to heal". I think this is definitely a good idea to apply to continuous build failures. Why not make a continuous build system that cannot fail. At that time I thought it was a nice idea (I had meant to blog about it but I forgot) but I could not see very well how it could work. Now a year after, I really think it's a nice idea and I'd like to explore it.

The architecture

A potential basic architecture is shown in figure 1 (click to get a larger picture).

The general principle is to catch the commit data before they get committed to the SCM, to perform a build and to perform the actual commit only if the build is successful. Here are the detailed steps:

  1. The developer performs a commit using his favorite SCM client tool. Note that it is best if the tool is able to perform the commit asynchronously so that the developer can continue working on something else.
  2. The committed data are intercepted using a pre-commit hook script (all modern SCM support this). This script is in charge of doing 2 things:
    • Finding out the list of projects to be built. Indeed, say that the commit contains 5 files belonging to 2 different projects. We need to rebuild these 2 projects. The algorithm for finding out the projects to which belong the changes sources can be as simple as a mapping between the file paths (which contains the project name) and the project name.
    • Creating a build job and pushing it on a queue. The reason for the queue is that building all the projects on the machine that hosts the SCM is not going to be scalable. We want the SCM to be as responsive as before. Hence the queue.
  3. We need build machines to perform the actual build. They could be dedicated build machines that build continuously the build jobs. There could also be developer workstation. The concept is to have one or several build kicker applications installed on those machines. The "continuous build kicker" will continuously get a job from the build job queue and build it, whereas the "idle build kicker" will only pick a job to build when the machine is idle (hey, look around you and see how many machines are unused because the people are either on holiday, sick, in a meeting, etc. That's a lot of power).
  4. The build kickers start by updating their workspace to have the latest files for the projects associated with the changes files. Then they try to "merge" the changes files in their workspace (note: this may be the tricky part to implement unless the SCM offers a way in the pre-commit hook to get the full file - I need to explore this). If they cannot succeed they stop with an error message that flows back to the user. This can happen if someone else has been working on the same source and their change has made it to the SCM before ours has. If the merge succeeds, the build kicker starts the build. The build hasa to be relatively quick so you should not build all the projects. I suggest building the modified projects and the ones that directly depend on them so that an API break can be detected (more on that below)
  5. When the build is finished (or if an error occurs), the build kicker sends the result back to the pre-commit hook (using a RPC mechanism for example).
  6. If the result if positive, the pre-commit script either performs the real commit to the SCM
  7. The resulting message is returned to the user. In case of error the user would see for example the build console log

Advantages

Here are the following advantages of such a system:

  • Does not break other developers upon a build failure. All developers can work uninterrupted even though they can still work on HEAD in a continuous integration fashion
  • Lowers the effort required to get a CI system working thus it helps teams adopt CI
  • Prevents breakage of APIs. Indeed in step 4 above, we've mentioned that a good strategy is for the build to build not only the projects that have changes but also all projects that directly uses those projects (one level). This will allow detecting unwanted API breakages.
  • Increase self-confidence when committing which (I hope) will make it easier to get developers to commit continuously
  • Allows continuing working on one's own machine (instead of having to wait for the current build to free the CPU which is being used at 100%!). You know get your own PBS (Personal Build Server)
  • Forces atomic commits!

Questions/Issues

I'm sure you're now burning with tons of remarks/questions showing why it wouldn't work :-) Here's what I've currently thought about. If you have any opinion or other questions, I'd love to hear them.

Q: What happens if someone else also commits a change to the same file?

It works in the same way as usual. The build kicker will try to "merge" the changes after having done a workspace update and if it cannot, the user will get an error explaining that the merge failed. The user will then need to perform an update on his local machine and resolve the conflict.

Q: Imagine I perform a commit and I start working on a new feature. Then my commit is rejected because of a failure. How do I fix this without loosing my current changes?

Answer 1: This is actually relatively similar to what you're currently doing. Imagine you're committing something. Then you start working on something new and the continuous build tells you 2 hours later that your change has broken something. The difference is that your changes have been committed so you can easily create a new workspace and fix it there. We could do the same here by having the pre-commit hook actually make your changes available through a URL (sent in the commit answer) as a patch so that it is easy for you to apply it to a fresh new checkout.

Answer 2: You wait till the build is finished on the server. You can perform other activities like documenting, reading, thinking, designing, writing new classes, new tests, etc. Basically you work on stuff that do not conflict with the past changes. Actually this is probably what you're currently doing when your build is running as it is eating all your CPU...

Q: Doesn't it take too long to build?

You need to ensure your build is taking as little time as possible. I think 5-10 minutes should be ok. The best way to achieve this is probably to use binary dependencies instead of rebuilding dependent projects (a la Maven), except maybe direct dependencies. You'll still need a continuous build running continuously to produce fresh binary dependencies. I guess it's also best to use an SCM client that can do asynchronous commits in order to let you continue working while the commit is in progress.

Q: What if I want to modify an API but I wish that each projects modifies its own files?

Several options:

  • You could go through a deprecation cycle.
  • You could be doing the refactoring on one machine only (not always possible)
  • You could also plan it. Anyway an API breakage has to be planned with communications. Thus you could say: on that day, at such hour we're going to be committing this break and we have 1 day to fix all our dependent projects. When this happens you can turn off this "unbreakable build" feature for the day.

The interesting point here is that you *want* the API breakage to be detected as the default instead of the opposite.

Conclusion

It seems to me this would be particularly useful on big projects with lots of developers. It should also be useful to introduce continuous integration on an existing project as it lowers the discipline required by everyone. Obviously this is just an idea that I haven't tested yet. I'm very keen to see this in action. If any of you has any experience please share it. I'm planning to spend some time trying to implement it. If you're interested to help out, let me know too.

[ vmassol ] 11:11, Saturday, 4 December 2004

When working using a Time-boxing approach with JIRA there are some typical issue-smells that I have noticed appear frequently. In order to perform good deliveries it is important to fight them.

  • Issue smell 1: Too many unscheduled issues. This means that new issues are not assigned to iterations, i.e. that they are not planned to be fixed.
  • Issue smell 2: Open issue from past iterations. Any issue that is left from a previous iteration has to be rescheduled so that everyone knows when it is planned to be fixed. If some portion of the issue has been done, I've found that it is usually best to split the task into 2, so that the work done in the iteration it was scheduled is clearly shown in the release notes for that iteration and the unfinished part can be scheduled in a future iteration.
  • Issue smell 3: No iterations in changelog view. This means that past iterations that are finished have not been JIRA-released. The good thing about releasing an iteration is that it forces to solve the unfinished issues (see Issue smell 2). In addition it allows cleaning the roadmap view that becomes less cluttered by all past issues and which gives a clear view of what's left to be done. Last it provides an important feeling of achievement.
  • Issue smell 4: Issue types in issue description. I have often noticed that some JIRA projects were using some description conventions for some issue types. For example, using XXX - Code review for a code review issue on the XXX feature. In that case, a real JIRA issue type should be created. The reason is that by defining a proper JIRA issue type, it is now possible to perform operations on this new issue type: it will appear properly in the release notes under its own category, it can be searched for, etc.
  • Issue smell 5: Issue status are not in sync with the reality. This is often a big problem (especially with distributed teams) as people usually rely on JIRA to provide an exact view of the progress. If issues are found not in sync, there's a tendency to not "trust" JIRA anymore, which in turn leads to less using it and loosing visibility. One good strategy is to do Issue Driven Development (IDD). It goes like this: When a task is done and just before the code is checked in, ensure that the corresponding JIRA issue is marked as Resolved/Closed. If there's no issue, create one (unless the modification is a really minor one that the user should really not be concerned with). Then check in the code mentioning the issue number in the checkin comment (that allows for example using the JIRA CVS/Subversion plugins). Note: If you're using CVS/Subversion you could write a quick pre-commit hook that verifies that each comment has a reference to a JIRA issue.
  • Issue smell 6: Lots of resolved (but not closed) issues. Most projects I have seen do not use a Resolved state. However, people often mark the issue as resolved but not closed and the issue stays in this state for ages without anyone doing anything about it. So either remember to directly close issues or if you're using JIRA 3 create a custom workflow that do not have a Resolved state (if you're not using the resolved state of course!).

Let me know if you have found other important issue-smells when using JIRA!

[ vmassol ] 19:43, Wednesday, 17 November 2004

Introduction

Automated tests are good. Automated Functional tests are even better as they are the proof that your application is working. In addition, with automated functional tests you can also automate your delivery process. However, writing automated functional tests is hard. The main reason it is hard is because you need to control your execution environment (database, application server, etc).

Cargo is a framework that you can use to automatically install, configure and execute J2EE containers. Thus it allows you to control your execution environment (for the J2EE container at least) and permits completely automated functional tests for J2EE applications.

Example

Let's walk through an example. Imagine we wish to start up Tomcat 4.1.31 before such test runs. Here's what we could write:

public class MyTest extends TestCase
{
     private Container container;
 
     protected void setUp()
     {
          // (1) Optional step to install the container from a URL pointing to its distribution
          Installer installer = new ZipURLInstaller(
              "http://www.apache.org/dist/jakarta/tomcat-4/v4.1.31/bin/jakarta-tomcat-4.1.31.zip");
          installer.install();
  
          // (2) Create the Cargo Container instance wrapping our physical container
          container = new Tomcat4xContainer();
          container.setHomeDir(installer.getHomeDir());
      }
 
     public void testSomething()
     {
          // (3) Statically deploy some WAR
          Deployable war = container.getDeployableFactory().createWAR("src/testinput/my.war");
          container.addDeployable(war);
  
          // (4) Start the container
          container.start();
  
          // (5) Perform any test you wish here
          [...]
      }
 
     protected void tearDown()
     {
          // (6) Stop the container
          container.stop();
      }
}

Step 1 is optional. You can also rely on the container being already installed on the test machine if you wish. However, it's nice to completely automated the testing and assume nothing (or very little - We still need an OS and a JDK on the machine). In this example we're fetching the Tomcat 4.1.31 installation from the web. We could fetch it from our intranet or from a location on the machine or from our SCM.

In Step 2, we have not told Cargo what container Configuration to use. Thus Cargo will use a default Configuration and it will configure it so that your container will execute in a temporary directory that it will create in your OS system tmp dir. If you wish to control this you can use:

Configuration configuration = new CatalinaStandaloneConfiguration(container, "target/tomcat4x");
configuration.setProperty(ServletPropertySet.PORT, "8080");
container.setConfiguration(configuration);

in step 3, we create a Cargo wrapper around a physical WAR and we add it to our container so that it is deployed when the container starts.

We then start the container (step 4), perform any testing we wish (step 5) and ensure the container is always stopped at the end of our test (step 6).

If we wish to start and stop the container only once during our whole test suite we can use a standard JUnit TestSetup.

Conclusion

This is just a short introduction to Cargo to demonstrate how easy it is to start/stop a container. The API is of course richer. Also, we're showing here how to use Cargo for functional testing of J2EE application but Cargo is also meant to be used by any application that requires a container to be up and running. It could also be used by IDE plugin writers, etc.

For more information on Cargo, please see the Cargo website and join us on the Cargo mailing lists. You'll be warmyl welcomed! :-)

[ vmassol ] 15:50, Friday, 5 November 2004

How often are you trying to debug some Java application to find that you can't continue your debugging easily because the code is entering into some third-party library?

At that point, either the library is open sourced and you can rush to download the source, modify the code to add some System.out.println and spend 3-4 hours to find out how to rebuild the project.... or it's not open source and then there's much that you can do except trying to find out the reason with your sheer brain power!

How good would it be if there was an application (let's call it a Logifier) at which you could throw a jar and it would return a new aspectified jar on which it would have weaved some Logging aspect that you could configure!

This would allow us to realize the full power of aspects: an external Java application that was not built with logging can now be converted to log things for us...

So who wants to be the first to build such a handy application? :-) Does it already exist?

Update 7/11/04: I've just remembered reading about AntFlow on TSS. That would be an excellent way of implement this. Imagine a hot folder called "logifier" and any jar you drop in there is automatically logified using an AspectJ/AspectWerkz/etc Ant task and a common logging aspect such as this one! Now that would be cool. It could be a good coding exercise for the next OSSGTP.

[ vmassol ] 10:34, Monday, 1 November 2004

Analysis

I've just tried Omea Pro (build 353) and I've got to say it's very promising! It's hard to explain what Omea is... I think it can be viewed as two things:

  1. A search tool that aggregates all data from your computer (all files types including PDF, Word, Excel but excluding PPT, Outlook emails, ICQ/Miranda conversations, Outlook Tasks, Outlook contacts, RSS/Atom feeds, Newsgroups, etc)
  2. A productivity tool that you can use instead of all your different tools for managing all your incoming data (mails, files, feeds, newsgroups)

After using it for 2 days, here are the pros I have found:

  • The searching feature is excellent. I find it much better than Google Desktop or Lookout in term of relevance, breadth of search and organizing the results

Here are the thing to improve I have noticed (please remember that it is beta software):

  • It's resource hungry: after a few hours of using it, it easily reaches 300MB and more. It's also a little bit slow.
  • It has not reached a level where it is at least as good as the tools it gets its data sources from. For example, it's not as good as Outlook, it's not as good as a dedicated RSS feed reader, etc. I believe it will never be able to be as good as those specialized tools. JetBrains has recognized this by trying to make it bi-directional data-wise (your changes from Omea are reflected in Outlook and vice-versa). However this doesn't work for Newsgroups and RSS feeds for example.

Now the real questions is how should I use it? As a search tool? But then it's a bit heavy to be left sitting idle on my desktop. And it's too heavy to start it on demand (it currently takes 30 seconds to 1 minute to start - I'm sure it'll be improved in the future). As a productivity tool? Possibly, although it's missing some of the features I use in my specialized tools. For example:

  • I use the "Reading pane - Right" view of Outlook which gives me 3 vertical panes next to each other. Once you get used to this, it's hard to go back. I'm told this will be in the next version of Omea Pro.
  • I use NewzCrawler which let me see only feeds which have unread items in them, feed by feed (I don't read all blogs at the same speed).

Conclusion

I currently don't think I'll be able to keep Omea open all the time as it's too heavy to have both Outlook and Omea open at the same time. Also, I don't like the fact that I have to stop using my favorite Feed reader (NewzCrawler). I really do not want to manage 2 feed tools and tell each one which feeds I have already read. Of course, I could simply not use the feed feature of Omea. However I feel that using Omea just for searching makes it loose a lot of its attraction. If I just want a search tool, I can use Goodle Desktop or Lookout (even if they are less powerful they are probably good enough for my daily needs).

I think the real challenge for JetBrains is to make the tool good enough in each domain (mail handling, feed reading, etc) so that it can be used instead of the dedicated tool. That means that you would use Omea for day to day activities and the specialized tool from time to time only when you need one of the power feature. Of course, this is a huge challenge for JetBrains and honestly I am not sure if it is achievable.

Anyway, the tools is promising and intriguing enough so that I'll follow the different builds to see how it evolves.

If you're using it, please drop me a note on how you use it and how you handle it vs the specialized tools. Thanks

Update 04/11/2004: I've just tried build 358 and I am extremely pleased to report that the memory consumption has decreased a lot: whereas it wasa before 300MB for me, it's now 140MB. Good job!

[ vmassol ] 10:22, Thursday, 30 September 2004

On one of my projects at work we have moved to JIRA 3 (beta). We moved to benefit from the new custom workflow feature. Unfortunately it was missing one key feature we wanted: the ability to send notification emails on custom workflow transitions (I've just been told by Atlassian that this is a feature they're currently working on). To remedy this and thanks to Atlassian's support, I've decided to delve in the JIRA Java API and develop a workflow function plugin to implement email sending.

I have to say that JIRA 3's extensibility is great! JIRA can almost be seen as a full fledge foundation for developing project tracking applications, in the same spirit as Eclipse is a full fledge foundation for developing java applications (RCP). Both Eclipse and JIRA come with a default application using this API to demonstrate their power (the IDE for Eclipse, the issue tracker for JIRA). Note that the new plugin system in JIRA has several similiarities with the Eclipse plugin architecture. Of course, I'm sure JIRA still has a lot of ground to cover to expose a plugin API covering all domains of issue tracking (i.e. allowing to replace all parts of the JIRA issue tracker) but it's going in the right direction.

Here is a short tutorial on how to develop a workflow function plugin. Note that you should also check the Atlassian tutorial on how to develop plugins.

The source code is available here and the plugin jar is available here.

Setting up the project

Here's the directory structure I have chosen for my plugin. Please also note that I have used Maven to perform the build (extremely easy to setup as Atlassian is also using Maven and they have all their jars in a Maven remote repository on http://repository.atlassian.com).

A plugin is composed of several files (it is packaged as a JAR at runtime):

  • A plugin descriptor (atlassian-plugin.xml )
  • Java source files
  • Velocity templates for the plugin UI (the *.vm files)

The project.properties file simply adds the Atlassian Maven remote repo to the list of repos searched by Maven to download dependencies:

maven.repo.remote=http://repository.atlassian.com,http://www.ibiblio.org/maven

The project.xml contains the required JIRA dependencies and the definition of resources to include in the generated jar. Here's an extract:

<code>[...] <dependencies> <dependency> <groupId>atlassian-jira</groupId> <artifactId>atlassian-jira</artifactId> <version>3.0-beta</version> </dependency> <dependency> <groupId>osworkflow</groupId> <artifactId>osworkflow</artifactId> <version>17Aug2004</version> </dependency> <dependency> <groupId>propertyset</groupId> <artifactId>propertyset</artifactId> <version>1.3</version> </dependency> [...] <build> <sourceDirectory>src/main</sourceDirectory> <resources> <resource> <directory>src/etc</directory> <includes> <include>atlassian-plugin.xml</include> </includes> </resource> <resource> <directory>src/etc/templates</directory> <includes> <include>**/*.vm</include> </includes> </resource> </resources> </build> </code>

Generating the plugin jar is as simple as typing maven jar .

The Worflow Function plugin extension point

Here's what the atlassian-plugin.xml plugin descriptor contains:

<code><atlassian-plugin key="sendmail.jira.plugin.workflow.sendmail" name="SendMail Plugin"> <plugin-info> <description>Plugin for sending emails on custom workflow transitions.</description> <version>1.0</version> <application-version min="3.0" max="3.0"/> <vendor name="Vincent Massol" url="http://blogs.codehaus.org/people/vmassol/"/> </plugin-info> <workflow-function key="sendmail-function" name="Send Notification Mail" class="sendmail.jira.plugin.workflow.SendMailFunctionPluginFactory"> <description>Sends a notification email.</description> <function-class>sendmail.jira.plugin.workflow.SendMailFunction</function-class> <orderable>false</orderable> <unique>true</unique> <deletable>true</deletable> <weight>900</weight> <default>false</default> <resource type="velocity" name="view" location="sendmail-function-view.vm"/> <resource type="velocity" name="input-parameters" location="sendmail-function-input-params.vm"/> </workflow-function> </atlassian-plugin> </code>

What you have to understand:

  • A plugin is made of 2 java classes: a plugin factory class (SendMailFunctionPluginFactory ) which is in charge of setting up all that is necessary for the execution of the plugin feature, and the plugin execution class (SendMailFunction ).
  • A workflow plugin is expected to bundle 2 velocity template files: one for asking the user to input some data required by the plugin execution (this is the input-parametes velocity template, and one for displaying what the function will do. The later is visible if you click on a workflow transition in JIRA and then on the post-functions tab.

The Java API

The Plugin Factory class

Without further ado, here's the skeleton for the SendMailFunctionPluginFactory class:

public class SendMailFunctionPluginFactory extends AbstractWorkflowPluginFactory
    implements WorkflowPluginFunctionFactory
{
     public SendMailFunctionPluginFactory(FieldManager fieldManager)
     {
      }
 
     protected void getVelocityParamsForInput(Map velocityParams)
     {
      }
 
     protected void getVelocityParamsForView(Map velocityParams, 
          AbstractDescriptor descriptor)
     {
      }
 
     public Map getDescriptorParams(Map conditionParams)
     {
      }
}

Those 4 methods are called by JIRA itself:

  • The constructor is called when you click on the "add" button to add the function to your list of post-functions. The FieldManager instance can be used to get issue fields meta-data (it does not contain any issue data as there's no issue associated with the function yet - This will only happen when the function is triggered by an issue transition).
  • The getVelocityParamsForInput() method can be used to store some properties in the velocityParams map. These properties will then be accessible from the "input-parameters" Velocity template.
  • The getVelocityParamsForView() method can be used to store some properties in the velocityParams map. These properties will then be accessible from the "view" Velocity template. In addition the descriptor parameter provides access to the data entered by the user in the input phase (these data are stored in the workflow data structure itself).
  • The getDescriptorParams() method is the bridge between the data contained in the Velocity context and the data in the Workflow context. More precisely you put in there the code to extract the data that have been entered by the user in the Velocity context and you put the data in the workflow descriptor context. This descriptor context is the second parameter that is available in your getVelocityParamsForView() method.

Here's a look at the input-parameters velocity template:

<code><tr bgcolor=ffffff> <td align="right" valign="top" bgcolor="fffff0"> <span class="label">Group emails:</span> </td> <td bgcolor="ffffff" nowrap> <input type="text" name="groupEmails" value=""/> <br><font size="1">Comma-separated list of JIRA groups to send emails to.</font> </td> </tr> <tr bgcolor=ffffff> <td align="right" valign="top" bgcolor="fffff0"> <span class="label">Individual emails:</span> </td> <td bgcolor="ffffff" nowrap> <input type="text" name="individualEmails" value=""/> <br><font size="1">Comma-separated list of JIRA users to send emails to.</font> </td> </tr> </code>

As you can see, the variables groupEmails and individualEmails will hold the data entered by the user.

The Plugin Function class

Here's the code that implements the plugin feature (in our case the sending of the notification email):

public class SendMailFunction implements FunctionProvider
{
     public void execute(Map transientVars, Map args, PropertySet ps)
     {
      }
}

The execute() method is called by JIRA when an issue transition happens.

The parameters have the following meanings:

  • The transientVars parameter holds useful data such as the issue that was modified. You get a referernce to the issue by calling transientVars.get("issue"); . It contains also other piece of data such as the comment entered by the user, etc.
  • The args parameter holds all the data stored in the workflow context (aka the workflow descriptor). This is the data you have stored yourself in the getDescriptorParams() method explained above.
  • I'm not too sure what the ps parameter is used for. I