IntelliB
[ vmassol ] 09:54, Thursday, 13 July 2006

(Updated 2006-07-14: Added section on discovering modules and added disclaimer at the end)

IntelliJ IDEA has revolutioned the IDE landscape by adding "intelligence" to IDEs. A few days ago I did a thought experiment by asking myself the following question "how feasible would it be to build a project without knowing any meta-data about it?". In other words, is it possible for a build tool to be intelligent enough to build a project without build files nor POMs. Said differently, is it possible to figure out a project's POM automatically? Let's review some required typical meta-data information and see how they could be guessed.

Source locations

It is possible to guess where sources are by looking for *.java files (for Java projects - The same applies for other project types). Now we still need to differentiate main sources from test sources but that's also relatively easy to do. We can check for classes extending JUnit's TestCase for example or the TestNG equivalent, or any other well-known testing framework.

Note: An interesting thing here is that to be intelligent we'd need the help of the community to add new rules to the discovery process. For example imagine that a new testing framework appears; we'd need to add it to the Test Discovery Rules. Thus, this type of intelligent build system would need to rely a lot on the community and thus would need to get its data from an online repository that could be edited by the community.

Dependencies

How do we detect project dependencies? One relatively way is to parse the sources that we have found above and find all external imports. Then query ibiblio to find matching package names (this information is present in Maven POMs on ibiblio). Now for guessing the version, there's no easy magic. A first approach would be to get the latest released version of the dependencies we've found.

Project type

Project types can easily be guessed by looking at some files. For example if a web.xml file is present then it's a WAR project, if an application.xml one is found then it's an EAR project, if a jnlp file is found then it's a JNLP project, etc.

SCM

SCM can easily be guessed by looking for special files on the filesystem of the project. For example we would look for .cvs directories for SCV and for .svn files for Subversion, etc

Developers

Once we got the SCM URL we can then query the SCM to get the list of all developers.

Project name

The project name could be the name of the top level directory and the version could be set arbitrarily to 1.0. Actually we could even check ibiblio to see if the project is already on ibiblio, get the latest version there and increase the minor number by one as a first order guess. Another strategy would be to query the SCM and look for tags and deduce existing versions by parsing those tags (there are some usual conventions for naming tags so it should be possible to make a good guess).

Modules and artifacts

Discovering the different modules of a project is probably one of the hardest thing to do. If you look at different projects in the wild I believe there are not that many directory structures out there. Maybe 10-15. Thus it should be possible to register knowledge of these structures and let the tool discover which ones matches the closest with the project at hand. This would also allow to deduce the different artifacts that have to be generated. Of course it won't be perfect as there are projects which generate several artifacts and which may be in the same module. Again it's a question of doing 80% of the job and leaving 20% to be done manually.

Additional information

Of course, the information found above are just guesses. In most cases they could be correct but of course we would need to offer a way for the user to edit them and to add any missing information.

Conclusion

I believe it should be possible to create such an intelligent meta-build project which could be used to generate files for one of the existing build system such as Maven, Ant, etc. For example it could create an internal POM file on which Maven could then be executed to produce the build results. At a minimum such a tool could be used to convert existing projects to Maven. I wonder how intelligent it could be but I guess it could go pretty far.

Disclaimer: Of course, such a tool would be bad from a conventions stand point. One of the great strength of Maven has been to standardize the directory structure of projects. I can go to any Maven project and I know exactly where stuff will, what will be generated, etc.

Are there other information which you think could be guessed automatically? Can you think of better algorithms to guess some of the information shown above?

TrackBack
Comments

test

--test, July 16, 2006 01:51 PM

"At a minimum such a tool could be used to convert existing projects to Maven".

I think Maven is all about conventions. So I really don't understand where you are going with this? I have converted several projects from Ant to Maven, and I have to tell that I've met more than one problem on the way. And none of these problems had anything to do with recognizing SCM systems or java paths.
No, it was all about restructuring the projects into subprojects and creating maven packages out of properitary and external code. I don't think you can automize that.
It was also about getting the Maven plugins to work (Not all Maven plugins are top quality and well documented)

I think what we really should focus on is the dependencies hell in Maven(I know that there is a lot of people doing an effort here. And it will be interesting to see what they can come up with), and also keep improving/creating the plugins for M2.

--lazee, July 18, 2006 09:23 PM
Post a comment









Remember personal info?