|
|
December 2004
[
vmassol
]
12:28, Friday, 31 December 2004
Here's a non-ordered list of the main problems causing build-breaks that we had found on the current project I'm working on (Note that this list is now a year old and that we have fixed some of them - Unfortunately the majority still remains...). I've added some possible ideas on how to fix them.
- Build takes too long to execute (and thus it is executed less often)
- Fix the build by having more subprojects with binary dependencies and/or streamline the build to
ensure that only important build steps are run. Optimize it (f.e. offer different goals/targets: one
for a clean build and another one that does not perform a clean).
- Local build not executed
- Public API breakage in dependent project without warning
- Not enough continuous commits (all packed up at end of iteration)
- Team meetings to explain more the importance of continuous integration. Complementary idea: "unbreakable builds". The idea is that if you keep your changes to yourself and accumulate them, whenever you'll want to commit them,
the unbreakable build will likely reject your changes as they will break some other part of the code. Thus you'll need to spend several days to talk to other developers to not only fix your code but also fix theirs. Normally
after doing this several times, you should understand that it is in your best interest to commit frequently.
- No functional/integration automated tests (f.e. no local verification of ejb-jar deployments)
- Automated functional tests! Build a suite slowly over time, improving it at each iteration. And maintain it! Decide on a good data handling strategy (this is usually the main issue). Ensure that your data strategy keeps everyone in sync WRT DB data.
- Commit problems (Forget to commit some files, Pb due to SCM tool - Starteam: new directory do not appear in Starteam view!)
- Devs “building” with IDE but forgetting to use the automated build
- Checkstyle errors failing the build
- Coaching. More team meeting to decide what checkstyle errors we want to fail the build or not. Get a strong team buy-in. Complementary idea: "unbreakable builds".
- Failing unit tests
- It probably means that the unit tests are actually integration tests depending on database data. Ensure that unit tests are quick and fast and independent of the environment. Complementary idea: "unbreakable builds".
- [Maven] project.xml not up to date and missing dependency
- SCM diff emails on check-ins (team by team) in order for everyone to have the knowledge of what's happening. Complementary idea: "unbreakable builds".
- Database data modifications (voluntarily or involuntarily) leading to test breakage
- Continuous build not cleaned between different runs
- Fix it. Perform a clean build from time to time.
- Local SCM update not done before local build (in order to get the latest files)
- SCM diff emails on check-ins (team by team) in order for everyone to have the knowledge of what's happening. So you'll know better when to update our local workspace. Complementary idea: "unbreakable builds".
- Environment differences in local build vs central build
- Work continuously towards making the developer's environment as close as possible as the integration environment. Complementary idea: "unbreakable builds". This allows executing the build on the server and thus it runs in the same environment as the continuous build.
- No local deployments done before commits (f.e. no EJB deployments)
- Coaching (in order to ensure that developers do perform deployments on their machines before check-in) + add some checks in the build to automate the verification (they can be f.e. some hand-picked functional tests).
- Checkstyle errors hidden in tons of warnings
- Fix it. Newest versions of Checkstyle allow filtering on severity.
- Non-atomic commits and central build starting with in-flight commits
- Use a scheme a la CruiseControl (wait for some inactivity time on the SCM before trigging a build). Or change the SCM (for subversion for example). Note: We have tried to use CC with StarTeam but even though the infrastructure team increased CPU + RAM, StarTeam falls when it is polled by 3 or 4 CC builds in parallel... (Solution: Dump ST or ask Borland to come and tune the parameters). Complementary idea: "unbreakable builds". This forces "atomic" commits.
- [Distributed development] rsync issues: sometimes jars are corrupted or lost
- Fix the rsync process (Note: this is now no longer happening I believe)
- [Distributed development] VPN instability making it difficult to SCM-update
- Fixed mostly. However usage of Starteam is still extremely slow making hard to SCM-update from remote. Solutions: Use a less bandwidth/responsetime consuming SCM (f.e. Subversion), increase bandwidth (but the issue is mostly with response time which cannot be changed), or use a replication mechanism (I don't like this as I believe it introduces its own issues - I really much prefer everyone working directly on the same repository, especially as I know it works (I've done it in the past using CVS with a team of 30 developers and it was working fine).
- Errors when executing the application
- This is because there are no automated functional tests. Automate them!
[
vmassol
]
15:18, Wednesday, 29 December 2004
Let's create Unbreakable Builds
Out of my last two development projects, one had a strong sense of quality and excellence in general and continuous build failures were the exceptions (about 3-4 per week for a 30 developers team)
and the other one was quite the opposite and everyone was surprised when the continuous build was passing (there were about 5 build breaks a day as an average for a 40 developers team).
I'm sure this is also pretty common to other projects. Obviously the best is to build (pun intended)
a build awareness in the team. However, you'll need strong evangelists for this to happen who may not always be available and other circumstances may make this difficult.
A thought struck me about a year back: what if we were able to prevent the continuous build from failing by design. There's a French saying that goes something like
"it's better to cure than to heal". I think this is definitely a good idea to apply to continuous build failures. Why not make a continuous build system that cannot fail. At that time
I thought it was a nice idea (I had meant to blog about it but I forgot) but I could not see very well how it could work. Now a year after, I really think it's a nice idea and I'd like
to explore it.
The architecture
A potential basic architecture is shown in figure 1 (click to get a larger picture).
The general principle is to catch the commit data before they get committed to the SCM, to perform a build and to perform the actual commit only if the build is successful.
Here are the detailed steps:
- The developer performs a commit using his favorite SCM client tool. Note that it is best if the tool is able to perform the commit asynchronously so that the developer
can continue working on something else.
- The committed data are intercepted using a pre-commit hook script (all modern SCM support this). This script is in charge of doing 2 things:
- Finding out the list of projects to be built. Indeed, say that the commit contains 5 files belonging to 2 different projects. We need to rebuild these 2 projects. The algorithm
for finding out the projects to which belong the changes sources can be as simple as a mapping between the file paths (which contains the project name) and the project name.
- Creating a build job and pushing it on a queue. The reason for the queue is that building all the projects on the machine that hosts the SCM is not going to be scalable. We want the
SCM to be as responsive as before. Hence the queue.
- We need build machines to perform the actual build. They could be dedicated build machines that build continuously the build jobs. There could also be developer workstation. The concept is
to have one or several build kicker applications installed on those machines. The "continuous build kicker" will continuously get a job from the build job queue and build it, whereas the
"idle build kicker" will only pick a job to build when the machine is idle (hey, look around you and see how many machines are unused because the people are either on holiday, sick, in a meeting, etc. That's a lot of power).
- The build kickers start by updating their workspace to have the latest files for the projects associated with the changes files. Then they try to "merge" the changes files in their workspace (note:
this may be the tricky part to implement unless the SCM offers a way in the pre-commit hook to get the full file - I need to explore this). If they cannot succeed they stop with an error message that flows back to the user. This
can happen if someone else has been working on the same source and their change has made it to the SCM before ours has. If the merge succeeds, the build kicker starts the build. The build hasa to be
relatively quick so you should not build all the projects. I suggest building the modified projects and the ones that directly depend on them so that an API break can be detected (more on that
below)
- When the build is finished (or if an error occurs), the build kicker sends the result back to the pre-commit hook (using a RPC mechanism for example).
- If the result if positive, the pre-commit script either performs the real commit to the SCM
- The resulting message is returned to the user. In case of error the user would see for example the build console log
Advantages
Here are the following advantages of such a system:
- Does not break other developers upon a build failure. All developers can work uninterrupted even though they can still work on HEAD in a continuous integration fashion
- Lowers the effort required to get a CI system working thus it helps teams adopt CI
- Prevents breakage of APIs. Indeed in step 4 above, we've mentioned that a good strategy is for the build to build not only the projects that have changes but also all projects that directly uses those
projects (one level). This will allow detecting unwanted API breakages.
- Increase self-confidence when committing which (I hope) will make it easier to get developers to commit continuously
-
- Allows continuing working on one's own machine (instead of having to wait for the current build to free the CPU which is being used at 100%!). You know get your own PBS (Personal Build Server)
- Forces atomic commits!
Questions/Issues
I'm sure you're now burning with tons of remarks/questions showing why it wouldn't work :-) Here's what I've currently thought about. If you have any opinion or other questions, I'd love to hear them.
Q: What happens if someone else also commits a change to the same file?
It works in the same way as usual. The build kicker will try to "merge" the changes after having done a workspace update and if it cannot, the user will get an error explaining that the merge failed. The
user will then need to perform an update on his local machine and resolve the conflict.
Q: Imagine I perform a commit and I start working on a new feature. Then my commit is rejected because of a failure. How do I fix this without loosing my current changes?
Answer 1: This is actually relatively similar to what you're currently doing. Imagine you're committing something. Then you start working on something new and the continuous build tells you 2 hours later
that your change has broken something. The difference is that your changes have been committed so you can easily create a new workspace and fix it there. We could do the same here by having the
pre-commit hook actually make your changes available through a URL (sent in the commit answer) as a patch so that it is easy for you to apply it to a fresh new checkout.
Answer 2: You wait till the build is finished on the server. You can perform other activities like documenting, reading, thinking, designing, writing new classes, new tests, etc. Basically you work on
stuff that do not conflict with the past changes. Actually this is probably what you're currently doing when your build is running as it is eating all your CPU...
Q: Doesn't it take too long to build?
You need to ensure your build is taking as little time as possible. I think 5-10 minutes should be ok. The best way to achieve this is probably to use binary dependencies instead of rebuilding dependent
projects (a la Maven), except maybe direct dependencies. You'll still need a continuous build running continuously to produce fresh binary dependencies. I guess it's also best to use an SCM client that
can do asynchronous commits in order to let you continue working while the commit is in progress.
Q: What if I want to modify an API but I wish that each projects modifies its own files?
Several options:
- You could go through a deprecation cycle.
- You could be doing the refactoring on one machine only (not always possible)
- You could also plan it. Anyway an API breakage has to be planned with communications. Thus you could say: on that day, at such hour
we're going to be committing this break and we have 1 day to fix all our dependent projects. When this happens you can turn off this "unbreakable build" feature for the day.
The interesting point here is that you *want* the API breakage to be detected as the default instead of the opposite.
Conclusion
It seems to me this would be particularly useful on big projects with lots of developers. It should also be useful to introduce continuous integration on an existing project as it lowers the
discipline required by everyone. Obviously this is just an idea that I haven't tested yet. I'm very keen to see this in action. If any of you has any experience please share it. I'm planning to spend some
time trying to implement it. If you're interested to help out, let me know too.
[
vmassol
]
15:55, Saturday, 18 December 2004
Javapolis 2004 was a real success. In the past, I've attended
TheServerSide symposiums and Javapolis was very similar: packed with technical sessions and full of well-known speakers.
The setup was excellent (kudos to Stephan and his team) and the rooms were amazing. Look at the size of this screen!
Picture shot by (c) Philippe Kernevez
There were the usual suspects (Mike Cannon-Brookes, Rod Johnson, Cedric Beust, Gavin King, etc) but as it was happening close to France, it was
cool to see that almost all my OSSGTP fellows were also there (Ludovic Dubost, Henry Story, Benjamin Mestrallet, Francois Le Droff, Didier Girard).
In addition it was good to see Jerome Lacoste and Philippe Kernevez there.
I've had the pleasure of giving 2 sessions:
- Maven: A session explaining why you would use Maven and trying to show Maven from all its different angles.
I hope I have been successful in sharing my enthousiasm for Maven. The room was packed (I'd say around 400 to 600 people). Before starting with my session I've asked how many people are already
using Maven and I've counted about 20 (but at that time the room was only half-packed), so I'd say it was about 3-5%. My second was "How many are planning to use Maven" and I got a resounding
3/4th of the people raising their hand. That shows that Maven is still in it's early adoption phase and that it has some great potential.
- AOSD: Agile Offshore: The last day was for the business track presentations and I gave one which is my return of experience
of 3 years of doing offshore development using an agile methodology.
It seems the tracks were all recorded and you should be able to see them live very soon (I'm really excited to see that for myself as I've missed a few sessions during the first days of the conference
and looking at oneself in Video is a good way to improve one's presentation skills... ;-)).
See you next year at Javapolis 2005!
[
vmassol
]
11:11, Saturday, 4 December 2004
When working using a Time-boxing approach with JIRA there are some typical
issue-smells that I have noticed appear frequently. In order to perform good
deliveries it is important to fight them.
- Issue smell 1: Too many unscheduled issues. This means
that new issues are not assigned to iterations, i.e. that they are not planned
to be fixed.
- Issue smell 2: Open issue from past iterations. Any
issue that is left from a previous iteration has to be rescheduled so that
everyone knows when it is planned to be fixed. If some portion of the issue has
been done, I've found that it is usually best to split the task into 2,
so that the work done in the iteration it was scheduled is clearly shown in the
release notes for that iteration and the unfinished part can be scheduled in
a future iteration.
- Issue smell 3: No iterations in changelog view. This means that
past iterations that are finished have not been JIRA-released. The good
thing about releasing an iteration is that it forces to solve the unfinished
issues (see Issue smell 2). In addition it allows cleaning the roadmap view
that becomes less cluttered by all past issues and which gives a clear view
of what's left to be done. Last it provides an important feeling of achievement.
- Issue smell 4: Issue types in issue description. I have often
noticed that some JIRA projects were using some description conventions for
some issue types. For example, using
XXX - Code review
for a code review
issue on the XXX feature. In that case, a real JIRA issue type should be
created. The reason is that by defining a proper JIRA issue type, it is now
possible to perform operations on this new issue type: it will appear
properly in the release notes under its own category, it can be searched for,
etc.
- Issue smell 5: Issue status are not in sync with the reality. This
is often a big problem (especially with distributed teams) as people usually
rely on JIRA to provide an exact view of the progress. If issues are found
not in sync, there's a tendency to not "trust" JIRA anymore, which in turn
leads to less using it and loosing visibility. One good strategy is to
do Issue Driven Development (IDD). It goes like this: When a task is done and
just before the code is checked in, ensure that the corresponding JIRA issue
is marked as Resolved/Closed. If there's no issue, create one (unless the
modification is a really minor one that the user should really not be
concerned with). Then check in the code mentioning the issue number in the
checkin comment (that allows for example using the JIRA CVS/Subversion plugins).
Note: If you're using CVS/Subversion you could write a quick pre-commit hook that verifies
that each comment has a reference to a JIRA issue.
- Issue smell 6: Lots of resolved (but not closed) issues. Most projects
I have seen do not use a Resolved state. However, people often mark the issue
as resolved but not closed and the issue stays in this state for ages without
anyone doing anything about it. So either remember to directly close issues
or if you're using JIRA 3 create a custom workflow that do not have a Resolved
state (if you're not using the resolved state of course!).
Let me know if you have found other important issue-smells when using JIRA!
|