February 2004
[ vmassol ] 09:06, Wednesday, 18 February 2004

Warning: Don't take this comparison too seriously... ;-) It's a bit like comparing apples and oranges and I'm sure the analogy breaks quite quickly if you pursue it too far. However, I do believe that the contained message is true.

I've just realized that collaborative offshore and EJB have a common point: they are both using a distributed model. By collaborative offshore, I mean teams developing on both sides (onsite and offhsore) and interacting continuously to build a system.

I've been working on 2 big offshore projects so far for the past 2.5 years (working with an Indian partner) and I've found that there is an organizational model that does not work well: onsite people directly managing developers (Figure 1). In the same way, calling directly Entity Beans from the client side is a bad practice because 1) it involves a lot of network round-trips and thus is inefficient and 2) it does not allow changing the implementation without affecting the client (Figure 2).

Figure 1: Onsite Project lead managing directly offshore developers
Figure 1: Onsite Project lead managing directly offshore developers
Figure 2: Client calling Entity beans directly
Figure 2: Client calling Entity beans directly

What is the solution we have used for EJBs? Answer: introduce a facade (Session Bean) which "manages" the underlying components (Figure 3). I've found that it is the same with collaborative offshore: there is a strong need to always introduce a local Project Lead (Figure 4).

Figure 3: EJB client calling a facade
Figure 3: EJB client calling a facade
Figure 4: Onsite Project Lead interacting with a local Project Lead
Figure 4: Onsite Project Lead interacting with a local Project Lead

It probably sounds very obvious but very often there is an initial tendency by onsite managers new to offshore to directly manage offshore "resources" (in order to reduce support costs). So far, whenever they've tried, it has failed (although we told them it wasn't a good idea but it seems some people need to see it by themselves to believe... ;-)).

I wonder if there'll be a time in the future where our communications skills will be so great that it will allow direct managment across the wire. Probably... but this is still in the future...

Has this also been your experience?

[ vmassol ] 09:18, Monday, 16 February 2004

Applying a working build strategy for testing against a database is not easy. It depends on the complexity of the database model, it depends on the size of the teams. However, I've found that the strategy described below is the one that has worked the best for the projects I have been involved in:

  • Do not mix unit tests independent of the environment (i.e. where interactions with the environments are stubbed/mocked) with integration unit tests (IUT). They have to be separated and put in different different in the SCM. The reason is that the 2 kinds of tests do not support the same execution workflow. More below.
  • Have a database build project (in the sense of an Ant or Maven project) in your SCM. This is extremely important. The goal of this project is to provide the following build targets/goals:
    • create-schema: create the database schema from the ground up (in the database specified as properties)
    • load-static-data: loads static data (i.e. read only data)
    • load-minimal-data: loads a functionally minimal set of data. It should contain all data required functionally but only 1 or a few entries of each type. It's not supposed to reflect the state of the database when in production.
    • load-full-data: loads a full set of data as expected in production.
  • Put the database data (minimal + full sets) in your SCM as flat files (by opposition as keeping the data live in the database). The reason for this is because:
    • you get automatic notification of data changes by using your SCM send-email-on-commit feature that all good SCM have
    • it is build-friendly and allows automated and controlled builds
    • it is controlled, i.e. you know what you're doing with your data, who is modifying them, you can revert if need be, etc
  • Here's the workflow for executing IUT or functional tests. For each project and before the test suite runs:
    • execute database:create-schema
    • execute database:load-static-data
    • execute database:load-minimal-data

    Then, each test should also have the opportunity to load data in its setup (using DBUnit or similar). This is required for example to test special cases where the database is missing some required data and we wish to verify the exception handling part of the code. It is also required if the test requires more than the minimal data set (although that should be relatively infrequent).

    Note that the tests can also be ordered to save some database load time. Although not the best strategy I've found that this was sometimes required on some projects with complex database models.

On the other hand, here are strategies that have not been working so well for me in the past:

  • Have a live database where developers can directly update data. The problems encountered were:
    • it is not controlled. You do no know who's putting data and what is being modified. You cannot easily revert a change
    • it's difficult with distributed teams as you need to set up a replication mechanism. The problem is that often developers update their local database and forget to update the master database which leads to lots of build failures. The solution exposed above does not suffer from this problem.
    • It's hard to sync everyone on the exact same set of data. Some minimal data + variations works best.
  • Do not provide minimal data and let developers write from scratch the data they require for their tests and load these data before each test. This does work for small projects with simple database models but not for complex ones. There's really the need for a minimal data set.

Is that also your experience?

[ vmassol ] 08:23, Wednesday, 11 February 2004

I'm revisiting an old entry I posted about a year ago about Starteam woes. The reason is that the project I'm working on is delivering a first release soon and we'll be attacking the second leg of the journey... and it maye be time to lobby for a source repository change... :-). Thus I need to prepare my ammunitions again. By running it through you guys I hope to flesh out the inaccuracies of my points and possibly find new arguments in favor of... CVS. Yeah, I am biased!

Here's what I feel is wrong with Starteam:

  • No "clean checkout" option. That is, if a file is deleted from the StarTeam repository, even if you perform a checkout all, the deleted files will not be removed from your local working copy. Actually it is possible but only through the command line interface.
  • No ability to send email diffs on commits (using tools like CVSSpam)
  • No nice IDE integration such as the CVS integrations we can see in IntelliJ, Eclipse, NetBeans, JBuilder, etc. More specifically the ability to see exactly what files are not in sync with the repository.
  • Limited integration with the majority of development tools: limited integration in Maven (BTW that's because it's been developed on this project itself by Emmanuel Venisse that there is limited integration in Maven!), no JIRA integration, etc
  • No JIRA integration. It seems there's a nice CVS integration which allows to link source code to issues resolved by entering the issue number in CVS commits. As we're using JIRA we could use this!
  • No possibility to run an Ant or Maven build every time there's a commit (a la Damage Control). That would allow our build to be in a better shape.
  • Starteam is very slow on WAN links. It may be due to our project policy to use locking on files. In any case if we had CVS we wouldn't have used this locking which is hampering productivity.
  • No windows explorer integration. To use Starteam you need to open yet another GUI application and perform operations from there (unless you use the command line but nobody is using it here). CVS has a nice TortoiseCVS client.
  • With Starteam a major problem for our build is that people forget to check in directories. No wonder as the Starteam GUI client does not show at all new directories!
  • Starteam is quite expensive and as a result we have only a limited number of fixed licenses. Anyone using floating licenses gets disconnected every few minutes. Very annoying.
  • Starteam admin seems more complex than CVS's. We have often had problems of database in the past year.

Any more? Any inaccuracy in there (I'm sure there are as I am biased!)?

[ vmassol ] 20:22, Tuesday, 10 February 2004

I've just learnt that there was a name for "a computer-generated test that humans can pass but computer programs cannot". It's called a captcha. You can see those on some web sites during registration. Here's an example:

captcha.jpg

Some of my Octo workmates are developing a java framework for generating captchas called JCaptcha. What I find interesting is one possible use of captchas: preventing spam. More specifically the idea would be to use captchas to prevent blog spam. It means that people who enter blog comments would need to be humans. What I don't know is whether blog spam is being done manually by individuals or if it's automated. In any case this solution will prevent automated blog spam which is a good first step!

The JCaptcha project has just released a beta version. I guess one next step could be the creation of a MoveableType plugin. Then I would hope to convince Bob to let us try it on Codehaus blogs :-)

Update 18/04/04: It seems that James Seng beat me with this captcha idea. Not only the idea but also an MT implementation. In addition, he's created yet another MT plugin for preventing comment spam by implemeting an MT Bayesian filter.

[ vmassol ] 15:28, Tuesday, 10 February 2004

I've started thinking about what would be the best possible IDE back in 2001. At that time, I had tried Sun's Jini and liked it quite a lot. I linked the 2 concepts (IDE and Jini) and came up with the idea of a Jini-based IDE. At that time I started writing down some ideas (here and here). However I did not pursue this idea as creating a full fledged IDE is a master achievement and I did not have the time nor the wish to do so!

However, even today in 2003, I still think it had some nice ideas that I would like to see in existing and future IDEs. Here are some ideas about this Jini IDE:

  • Each module would be a Jini service. Examples of modules are: javac compiler module, RMI compiler module, java editor module, source repository module, java execution module, junit execution module, etc.
  • It would be lightweight. It would be able to bootstrap with a minimal jar containing only the "microkernel" (+ possibly a module cache manager). Thus you could move from one machine to another easily. To install it, simply click on a browser link and download this minimal core. The rest will be downloaded as need be when you need the modules.
  • As each module is a Jini module, there would be 3 possibilities to implement a module:
    • The Proxy contains only local methods and there are no communication between the proxy and the back-end service,
    • The Proxy is a “smart proxy”, i.e. there are both local and remote methods which communicate with the back-end service,
    • The Proxy is only a client stub, all the methods are remote
  • It would be completely distributed. For example, the compile menu of the IDE would list all compiler module implementations the IDE has been able to discover when contacting the different Jini lookup services.
  • It would be self-healing: if a module is no longer available on a given server, another replacement will be automatically discovered (by the magic of Jini leases).
  • By using Jini leases, the IDE would support hot-patching/uninterrupted services
  • It would be secure using Jini security for modules. This will allow to support both open source modules and commercial modules.
  • What would be nice would be to have a repository service which would automatically save edited code on the server side (in a user-private zone). This would enable remote building. Developers on the move would be able to get their environment set up rapidly on any machine. Same if you wish to share environment with someone else, etc.
  • It would be completely modular with caching done on the client side to improve performances. The modularity should allow the creation of module repositories on the web. It would also allow creating IDE "a la carte".

Of course, this is a bit utopic as we would need to overcome several difficulties:

  • definition of standard module interfaces. However lots of work has been done there already by Netbeans and Eclipse and that could be reused.
  • there would need to be some module certification tests to ensure a module is properly coded, does not hog the IDE, plays well with others, etc

Of course, this IDE of the future could be implemented in a technology other than Jini (P2P, Web services, etc). I still believe Jini is way ahead of web services but they are catching up slowly on security, transactions. To my knowledge there's still no notion of "leases" in web services, nor of code that moves around the network and can execute locally.

What do you think? Is that Jini-like IDE something you would also like to see in the future?

Update: This blog entry has been reposted on TSS (several interesting comments).

[ vmassol ] 10:34, Monday, 9 February 2004

Imagine you have a continuous build system in place and that it builds automatically your projects every few hours. When the team is large it can be quite challenging to coach all team members in being careful about the build and that before committing code, people need to run the build locally on their machine first. There are also other problems, like the build works locally but not on the continuous build machine.

Anyway, I've found that there are different kinds of projects. Some where the build is taken very seriously and a build-aware mentality quickly spreads and others where people do see the value of the build but have more difficulties taking it very seriously (leading to lots of build failures).

One idea that I've had is to use a physical artifact to represent a build success or a build failure. People like to see and touch things. Doing some research I've found the Ambient Orb which seems close enough to what I have in mind:

ambient-orb-alt1.jpg

(Image stolen from ThinkGeek)

The idea is that the orb will turn more and more red depending on the number of projects that failed to build during the past cycle.

I have yet to buy one and try it but I like the idea. Has anyone done this already? Can it be done easily with the Ambient Orb? Are there devices other than the Ambient Orb on the market (for example, I don't need the wireless radio network connection at all especially as I live in France)?

Update: This blog entry has been reposted on TSS.