Atom Feed SITE FEED   ADD TO GOOGLE READER

Always use jarjar to package implementation dependencies

jarjar is a sweet Java packaging tool that allows you to embed one .jar file in another. But rather than just smashing the jars together in one big archive, jarjar renames the embedded .jar's classes so that they live in the main jar's namespace. For example, Guice's ProxyFactory.java file has an impressive collection of imports:
package com.google.inject;

import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
import com.google.inject.internal.*;
import java.lang.reflect.*;
import java.util.*;
import net.sf.cglib.proxy.Callback;
import net.sf.cglib.proxy.CallbackFilter;
import net.sf.cglib.proxy.Enhancer;
import net.sf.cglib.proxy.MethodProxy;
import net.sf.cglib.reflect.FastClass;
import net.sf.cglib.reflect.FastConstructor;

class ProxyFactory implements ConstructionProxyFactory {
...
}

These imports include classes from net.sf.cglib and com.googe.common.collect. But when we package it up with jarjar, everything gets prefixed with the Guice package name: com/google/inject:
     0 Thu Jan 01 22:13:36 PST 2009 META-INF/
1726 Thu Jan 01 22:13:34 PST 2009 META-INF/MANIFEST.MF
11357 Sun May 25 20:34:04 PDT 2008 LICENSE
101 Sun May 25 20:34:04 PDT 2008 NOTICE
5975 Thu Jan 01 22:13:20 PST 2009 com/google/inject/AbstractModule.class
2466 Thu Jan 01 22:13:20 PST 2009 com/google/inject/Binder.class
806 Thu Jan 01 22:13:20 PST 2009 com/google/inject/Binding.class
414 Thu Jan 01 22:13:22 PST 2009 com/google/inject/BindingAnnotation.class
...
136 Sun May 25 20:34:04 PDT 2008 com/google/inject/internal/cglib/proxy/Callback.class
238 Sun May 25 20:34:04 PDT 2008 com/google/inject/internal/cglib/proxy/CallbackFilter.class
28315 Sun May 25 20:34:04 PDT 2008 com/google/inject/internal/cglib/proxy/Enhancer.class
5712 Sun May 25 20:34:04 PDT 2008 com/google/inject/internal/cglib/proxy/MethodProxy.class
5535 Sun May 25 20:34:04 PDT 2008 com/google/inject/internal/cglib/reflect/FastClass.class
1642 Sun May 25 20:34:04 PDT 2008 com/google/inject/internal/cglib/reflect/FastConstructor.class
...
8144 Sat Nov 29 14:11:22 PST 2008 com/google/inject/internal/collect/ImmutableList.class
9826 Sat Nov 29 14:11:26 PST 2008 com/google/inject/internal/collect/ImmutableMap.class
6026 Sat Nov 29 14:11:24 PST 2008 com/google/inject/internal/collect/Lists.class
12551 Sat Nov 29 14:11:26 PST 2008 com/google/inject/internal/collect/Maps.class

By using jarjar, Guice encapsulates these library dependencies. This is fantastic! Many problems are avoided by encapsulating the library dependency:
  • Guice users don't need to tell their classpath, IDE or build.xml files about cglib or Google Collections.

  • Other versions of the libraries won't conflict with the Guice version. We can ship one binary and support users of either extremely old or extremely new versions of cglib. Even if cglib broke compatibility every release (it doesn't), we aren't impacted.

  • We can freely change our own libraries. Should we add a dependency on paranamer in a future release, our users don't need to reconfigure their build scripts.

  • Most importantly, we can patch the libraries to fit our needs. If we want a special build of google-collections that includes our own hacks and tweaks, that's just fine. Our build doesn't even need to be compatible with the public version.


Standard jars for API dependencies


Guice's dependency on aopalliance doesn't use jarjar. Guice users implement the aopalliance interfaces, so the version independence and encapsulation offered by jarjar doesn't make much sense.
I was thinking about this recently as well and there are other solutions to the problems you mention and could certainly lead to more bloat.

Solutions of the problems you mention
1) I know it's a hot topic, but Maven helps here.
2) OSGi ftw!
3) Again, Maven makes this easier.
4) Jarjar probably is the best way to go here. But I usually prefer not to try and modify other projects apis as then I have to manage merging new releases etc.

The problem that you introduce by using jarjar on every library package is that you bloat your jar and could result in bloating of other library jars. For instance, if there are libraries A, B and C. C depends on B and google-collections, B depends on A and google-collections and A depends just on google-collections. B uses jarjar to repackage A and google-collections because it shouldn't be using the google-collections packaged with A. C does the same thing with B and google-collections. Now, if I use library C I've got a ton of extra baggage because of all the copying of google-collections, none of which I'm supposed to use because they're internal dependencies so if I also want to use google-collections I need to ship that with my package as well.

Granted we've avoided jar/dependency hell, but we've gone and bloated our jars and runtime environment.

Now I'm not saying you should never use jarjar, I think for applications it could make a lot of sense. And if you choose to use jarjar for your library, I think you should provide the jar in 2 formats, A.jar and A-all.jar. At the very least you should carefully consider the implications of using jarjar for your users rather than indiscriminately using it for everything.
More people need to learn about and understand OSGi.
JarJar is great, but I think issue 27 blocks widespread adoption. Every time I have an error about not being able to assign a Collections ImmutableList to a Guice ImmutableList there is a a fair amount of swearing that follows. I certainly don't want to end up with N different versions of a given class on the classpath because people "always use jarjar to package implementation dependencies". IDE content assist would be a mess. Ick.