[]
Dependencies: To Version or Not To Version?
[
jdcasey
]
17:28, Friday, 22 April 2005
Many Maven users don't see the value of requiring version information for project dependencies. I'd like to take a second to try to shed some light on the Maven perspective on this...
The key thing to remember here is that Maven is a build system. It's a tool for build managers, and which developers use when building a development copy on their localhost. As a result, Maven may conform more closely to a build manager's idea of how best to define a project. Many developers don't understand the apparent anality - if you'll allow me to use a PC term here ;) - of versioned dependencies. Since I've spend quite a bit of time managing builds, let me try to explain this perspective a little.
The whole idea of having a build script/system is automation of many, many very mundane commands into a single, reproducible step. Many people overlook this: it's not about saving time when building a project, and it's not about having a faster build...although these are obvious advantages of not typing in dozens of commands by hand. The *real* advantage of having a build system is knowing that when you produce some artifact that represents your project, that it is done *exacly* the same way every time, and with exactly the same contents - unless explicitly changed in the build scripts.
This is motivated by more than a build manager's aesthetic sense; it's to ensure, for example, that what is deployed to your test environment is what gets deployed to your production environment, or what gets deployed in the event of a system crash. It's to ensure that the build manager (a person viewed as an externality by many development teams, not unlike a sysadmin) doesn't introduce subtle errors during a build. Such errors usually have nothing to do with the project source code, and can be extremely difficult to troubleshoot as a consequence. I'm sure you'll agree that typing in two dozen commands by hand is a little error-prone. :)
The bottom line is: what version of a given dependency are you using? If you perform one build today, and another tomorrow (when testing is finished, or when lightning strikes your production hosting facility), will those two builds contain the same versions of these dependencies? If, months from now when you're working on v2.x of your project, one of your biggest clients reports a critical bug in your 1.x line, can you reproduce the build in order to reproduce the client's problem? Nobody questions that they should be keeping versioned files in their source code. Why should the other components of your product be different? If you don't version something, it's because you don't care if it changes. In that case, why is it a part of the project and build?
If you're a build manager and you see someone using something like 'javamail.jar' as a dependency in their project, you get scared. Really scared. It's ridiculous to assume that the javamail project has reached perfection and therefore will have no more releases, so what version of javamail is this project dependent upon? What particular mix of bugs vs. functionality does this project expect from javamail? If you deploy this thing to a production environment, is someone going to come 'round tomorrow and ask why the hell the production copy is failing, and the test copy works fine?? There is a reason build managers are often dubbed 'buildmonkeys'. They should only have to manage the mapping of environment-specific information into the configurations of projects...details usually considered mundane. In a perfect world, buildmonkeys could be replaced by a medium-IQ machine process. In our world, build managers have to manage an extremely complicated mapping of environment specifics to projects. It doesn't help when projects give incomplete information about what they need in order to function correctly.
As an example for the developer crowd, imagine that you have a database table containing bank account information, of which a subset of rows is viewable by a given client from the web. Your webapp has the ability to uniquely identify a client, and should be able to supply enough information about his identity to supply only that client's information to the resulting web page. However, the db table you're using only uses (firstname, lastname) to identify a row, and the sensitive information within. If John Smith logs in, how can you be sure you're showing him the right bank account info?
Performing a build without versioned dependencies is similar to this. It's really a question of establishing identity, and maintaining that identity in order to provide a unique and coherent set of information. If you started your bank with only one John Smith, you're fine...until that John Smith logs in later, after you've picked up another client named John Smith. Then you may still supply the correct information to the user, but how can you be sure? You can't count on the continued uniqueness of "John Smith" any more than you can count on the continued uniqueness of "javamail.jar".
[TOPIC DRIFT :) ]
The reason to use Maven over Ant or some other build system is more a question of portability and consistency for your build method across multiple projects. It also helps you take advantage of many of these build-related concerns, without having to evolve your own solution. Maven also aims to be reasonably flexible, so how much of this knowledge you use is ultimately up to you. However, this is more of a discussion for another time.