Saturday, 19 December 2009

How to code multiple dispatch in standard Java

This posting continues the theme of multiple dispatch related to the binary method problem and shows a basic double dispatch implementation in standard Java. An appropriate method for computing whether two Shape objects overlap is selected based on runtime argument type. More Shape classes can be added without modifying existing code for shapes (rectangle, square, circle, ect.) already written. Doing it with a drop-through instanceof list is contrasted with low-overhead selection by argument type from an array of methods.


Multiple and predicate dispatching

Single, dynamic dispatching is used by OOP languages to call a designated method based on the runtime type of an object currently at a reference. When the language also features static (compile time) type checking the object's actual type will generally be either the nominal type of the reference or a subclass of that type. In either case, if the object does not have a method matching the static signature of the called method a compile time or runtime error results. For applications written using so called managed-code an exception is thrown at runtime. Using native code languages that do not provide this level of implicit type checking during execution, the error may vary from the method operating on spurious data to uncontrolled execution and possible program termination by the operating system.

In a language that features static type checking and whatever the actual mechanism involved, we can model single dispatching by giving all objects a reference to a table of methods for their class. The compiler stores the table offset of any particular method and when a call against a reference nominally to an object of a particular class is encountered, code is generated to retrieve the entry point for the method from the table. In the model a base class and each subclass has its own table - perhaps not the case as implemented. Overriding a method in a subclass the base-class method is replaced by the subclass method in the subclass' table. If a nominal superclass reference cites a subclass object at runtime, the subclass overridden method entry point is got from the table and this method is executed.

Multiple dispatch extends this method selection mechanism to include the runtime type of method arguments. Extending the model: single dispatching in a statically type-checked environment requires a single method selection table with one dimension. Without static checking an additional dimension is needed related to runtime type. Multiple dispatch requires a table with as many additional dimensions as there are arguments significant to the selection process. The table can be implemented as one per class or one per base class and there can be a table for each multi-method or a single combined table with other tables referenced by its elements.

To prove the concept we will look at implementing a single-method table for a method with a single significant argument in addition to object type. This implementation can be extended to include additional multi-methods by introducing additional tables.

Predicate dispatch extends the multiple dispatch concept by allowing the runtime values of arguments and logical relationships between types and/or values to be significant for method selection. In OOP languages the multiple dispatch and predicate concepts are quite blurred. A reference can cite an object of its nominal class or a subtype and the value of a reference is the object it refers to, one value of which is its type. Ability to include logical selection statements that deal with type and/or other values appears to be the distinguishing factor.


Implementing multiple dispatch

There are a number of possibilities. To do multiple dispatch according to the book we are going to have to use the java.lang.reflect package. If you haven't tried any of this stuff already to get an idea of what it entails you can Google 'Java Reflection' and have a look at some of Sun's postings.

For a more low level view Dennis Sosnoski's articles on IBM developerWorks go down into the world of bytecode and beyond, up to replacing reflection with runtime code generation.

As Dennis shows, using standard reflection is slow, very slow. Reflection may have a use for class initialization but what we are going to look at here is somthing far simpler and more akin to a native code approach - a DIY method table. A great advantage of implementing the table in Java is that it is pretty well guaranteed to be robust but there a number of difficulties to be overcome - methods not being a first-class data type for one.


The binary method problem

A binary method is a method that operates on two or more objects of the same type. It has at least two arguments, perhaps one implicit - the reference to the current instance, the other the same type or to a related class. Suppose we write a base class with a binary method, how can we allow for this method being overridden in a way that works for instances of any subclass including those that have not yet been written and without changing code in the base class?

Using instanceof and typecasting is one way, discussed briefly later, but what is needed is something more akin to overriding methods but related to argument types as well as the instance type. So, how can you override a method substituting covariant arguments that can be used straight off with no typecasts? Without too much difficulty it turns out.

In their paper presenting MultiJava, a Java extension supporting open classes and multiple dispatch, the authors cite the example of a Shape class with subclasses Rectangle, Circle etc. Shape has a binary method intersect(Shape) that returns true if two shapes overlap. I am going to nick this example in a slightly modified form to show how we can implement multiple dispatch in bog standard code. The methodology is kept simple and easy to explain. Perhaps a more generalized solution could be posted later but currently it is worth noting that open classes and multiple dispatch are strongly related to not implementing the Visitor design pattern, enough said!


Using double dispatch

First let's firm up what we are trying to achieve. In Shape nothing is known about rules for checking subclass intersection but in the first subclass, Rectangle say, we can write a method intersect(Rectangle).

Using single dispatching we would override intersect(Shape) from the base class:
    @Override public boolean intersect(Shape argument) {
        if (argument instanceof Rectangle)
            return intersect((Rectangle)argument);
        return argument.intersect(this);
    }
In fact this is programmed double dispatching. If the current class does not have a method to compute intersection with the actual type of the argument, argument.intersect(this) is called to get the result: the method called depends on the argument type. Under this scheme each new subclass includes methods to compute intersection with an instance of itself and all the types that have already been written.

This approach is fine: new shapes can be added as required without recoding any existing shapes and there is no problem with the kind of objections often made to using instanceof selection. The only drawback is that if a class does not implement a method to handle intersection for all the other classes written to date, after an instance of one of those other classes calls its intersect() the subsequent call to argument.intersect(this) gets unterminated recursion and stack overflow. An unavoidable fact of life, but things do not look quite so good when we get up to ten or twelve shapes:
    @Override public boolean intersect(Shape argument) {
        if (argument instanceof Rectangle)
            return intersect((Rectangle)argument);
        if (argument instanceof Square)
            return intersect((Square)argument);
        if (argument instanceof Circle)
            return intersect((Circle)argument);
        if (argument instanceof Triangle)
            return intersect((Triangle)argument);

        //..........................

        return argument.intersect(this);
    }
If this sort of thing is considered an 'anathema to Object Oriented Design' for unary methods why should it be OK for binary methods?

Apart from the call to argument.intersect(this) there is really no element of multiple dispatching in the intersect(Shape) method. Case selection using instanceof or even a more efficient isA construct [here] does not qualify. Multiple dispatching supplies immediate direction of execution to an appropriate method based on argument type. What is needed is a way of calling the required intersect(type) method directly.

Initially we need to map type to the required offset in a table of methods and the least complicated way of doing it is to provide each class with an a property mtOffset() : method table offset - the offset of references to methods with an argument of that class' type in one or more method tables.


The Shape base class

Let's see how this works out in the abstract base class Shape:
public abstract class Shape {

    public static final int MAX_SHAPES = 10;

    protected static int classMTOffset;

    protected static int getOffset() {
        return classMTOffset++;
    }

    protected abstract int mtOffset();
For simplicity the maximum number of shape classes that can be written is the constant MAX_SHAPES. It looks probable that the only way to make the size of the multi-method table fit runtime requirements exactly is to give it a nominal initial size and then expand it if required. For the moment we skip this complication. The method getOffset() is called by each subclass in turn when the class is initialized. getOffset() returns the value of the classMTOffset variable which is also incremented. Subclasses then implement the abstract mtOffset() method to return the value got during initialization.

Now we come to the base type for intersect() methods:
    protected static abstract class Intersect
                                <T extends Shape, S  extends Shape> {

        abstract boolean intersect(T a, S b) ;

     // abstract boolean contains(T a, S b, int tolerance); // possibly
    }
No first-class method type in Java so methods are implemented in nested classes. I am going to call these types Method Classes. There are pros and cons. A big pro is that any number of methods can be defined in a single class. An intersect(Shape) multimethod only requires a single sub-method but we could include a method contains(T, S) returning true if the S type shape fits entirely within T type shape for example, or similar overloaded methods with additional arguments, a tolerance factor maybe.

The characteristic shared by all intersect(T a, S b) methods as overridden in Intersect subclasses is that they all have two arguments that are covariant with Shape. Argument type is constrained by the declaration of the generic parameters to be a Shape or a subclass. Constraint is optional but shows the intention. IntersectRectangle for example is declared as:
     protected static class IntersectRectangle
                            extends Intersect<Rectangle, Rectangle> {

         @Override public boolean intersect(Rectangle t, Rectangle r) {
             //..............
         }
     }
The important factor is that declaring Intersect as a generic class allows intersect(T, S) to be overridden in this way. If for example, we write
    Intersect ref = new IntersectRectangle();
    Boolean b = ref.intersect(t, s);
the method called is intersect(Rectangle t, Rectangle r). Not a typecast in sight but the danger is that any subclass instance can be passed as an argument and passing the wrong kind gets a runtime error. We can design around this problem but a con is that this for the enclosing-class object must be passed as an argument. Of course it gets passed anyway. The dot before a method call can be seen as passing the reference that precedes it.

In this implementation method classes are static nested classes rather than inner classes. Inner classes would be nice: the method body has implicit access to outer-class fields but, because every instance of an inner class carries a reference to the outer-class instance that created it, every outer-class object would need its own method table. Even worse, the execution cost of creating this table is added to the construction cost for every instance. Here a single table is used located in Shape:
 
    protected static Intersect[][] intersectTable =
                                     new Intersect[MAX_SHAPES][];
 
The table has two dimensions one for each argument type. If a method has more arguments subject to multiple dispatch the table is implemented with that number of dimensions. The array of method classes for each class is added to the table when the class is initialized. The index of a method-class array depends on the value of mtOffset() for the class, which results from class initialization order and is arbitrary.

The method-class array for all classes is the same size irrespective of how many Intersect descendants the class implements. If a method to suite a particular combination of arguments is not present the corresponding element will be null, a characteristic used implementing the intersect(Shape) method:
    public boolean intersect(Shape s) {
        if (this == s) return true;

        // get method table offsets for the two types
        int thisMTO = this.mtOffset();
        int sMTO = s.mtOffset();

        // thisMTO indexes the table for the current class of object
        // sMTO the offset for a method to process the argument's type
        if (intersectTable[thisMTO][sMTO] != null) {

            // this object has a method so call it
            return intersectTable[thisMTO][sMTO].intersect(this, s);
        }

        // otherwise call the argument's method
        return intersectTable[sMTO][thisMTO].intersect(s, this);
    }

}  // end of Shape
There is no possibility that the wrong method is called - it is selected programmatically rather than by some scatterbrained programmer ;).

The same strategy shown for instanceof method selection is used to allow additional subclasses to be written: if a null element is found in the current instance's table it does not have a method. It is assumed that the argument implements a suitable method but if it does not a null pointer exception will be thrown rather than stack overflow: still an error but a more traceable result.


Implementing a Rectangle class

MT_OFFSET is initialized by calling Shape.getOffset() and this class' entry in the multy-method table is done by Rectangle.initMT().

public class Rectangle extends Shape {
    public static final int MT_OFFSET = getOffset();
    public static final boolean CLASS_INITIALIZED = initMT();

    private static boolean initMT() {
        intersectTable[MT_OFFSET] = new Intersect[MAX_SHAPES];
        intersectTable[MT_OFFSET][MT_OFFSET] = new IntersectRectangle();
        return true;
    }

    @Override protected int mtOffset() {
        return MT_OFFSET;
    }

    public final int top, left, width, height, right, bottom;

    public Rectangle(int left, int top, int width, int height) {
        this.left = left;
        this.top = top;
        this.width = width;
        this.height = height;
        right = left + width;
        bottom = top + height;
    }

    public static class IntersectRectangle
            extends Intersect<Rectangle, Rectangle> {

        @Override public boolean intersect(Rectangle t, Rectangle r) {

            return (t.right > r.left && t.left < r.right &&
                    t.bottom > r.top && t.top < r.bottom);
        }
    }

    @Override public String toString() {

        return left + ", " + top + ", " + width + ", " + height;
    }
}
A new vector of sub-method references is created at the MT_OFFSET for this class. The class only declares a single sub-method of the intersect(Shape, Shape) multi-method leaving other entries in the vector blank. The implemented method is selected by the inherited intersect(Shape) method when this is a Rectangle and its argument is also a Rectangle.  

Implementing other shapes

Each new Shape implements sub-methods of intersect(Shape, Shape) to compare instances of the class with the instances of that class and other shapes implemented to date. Square declares two sub-methods:
public class Square extends Shape {

    public static final int MT_OFFSET = getOffset();
    public static final boolean CLASS_INITIALIZED = initMT();

    static boolean initMT() {
        intersectTable[MT_OFFSET] = new Intersect[MAX_SHAPES];
        intersectTable[MT_OFFSET][Rectangle.MT_OFFSET] =
            new IntersectRectangle();
        intersectTable[MT_OFFSET][MT_OFFSET] =
            new IntersectSquare();
        return true;
    }

    @Override protected int mtOffset() {
        return MT_OFFSET;
    }

    public final int top, left, side, right, bottom;

    public Square(int left, int top, int side) {
        this.left = left;
        this.top = top;
        this.side = side;
        right = left + side;
        bottom = top + side;
    }

    static class IntersectSquare extends Intersect<Square, Square> {

        @Override public boolean intersect(Square t, Square s) {
            return (t.right > s.left && t.left < s.right &&
                    t.bottom > s.top && t.top < s.bottom);
        }
    }

    static class IntersectRectangle extends Intersect<Square, Rectangle> {

        @Override public boolean intersect(Square t, Rectangle r) {
            return (t.right > r.left && t.left < r.right &&
                    t.bottom > r.top && t.top < r.bottom);
        }
    }

    @Override public String toString() {

        return left + ", " + top + ", " + side;
    }
}
Circle has three intersect() sub-methods:
public class Circle extends Shape {

    public static final int MT_OFFSET = getOffset();
    public static final boolean CLASS_INITIALIZED = initMT();

    static boolean initMT() {
        intersectTable[MT_OFFSET] = new Intersect[MAX_SHAPES];
        intersectTable[MT_OFFSET][Rectangle.MT_OFFSET] =
            new IntersectRectangle();
        intersectTable[MT_OFFSET][Square.MT_OFFSET] =
            new IntersectSquare();
        intersectTable[MT_OFFSET][MT_OFFSET] = new IntersectCircle();
        return true;
    }

    @Override protected int mtOffset() {
        return MT_OFFSET;
    }

    public final int left, top, radius, cx, cy ;

    public Circle(int left, int top, int radius) {
        this.left = left;
        this.top = top;
        this.radius = radius;
        cx = left + radius;
        cy = top + radius;
    }

    static class IntersectCircle extends Intersect<Circle, Circle>{
        @Override public boolean intersect(Circle t, Circle c) {
            int dx = c.cx - t.cx;
            int dy = c.cy - t.cy;
            double distance = Math.sqrt(dx*dx + dy*dy);
            return (distance < t.radius + c.radius);
        }
    }

    static class IntersectRectangle extends Intersect<Circle, Rectangle>{

        @Override public boolean intersect(Circle t, Rectangle r) {
            return t.intRect(r.left, r.top, r.right, r.bottom);
        }
    }

    static class IntersectSquare extends Intersect<Circle, Square>{

        @Override public boolean intersect(Circle t, Square s) {
            return t.intRect(s.left, s.top, s.right, s.bottom);
        }
    }

    @Override public String toString() {
        return left + ", " + top + ", " + radius;
    }

    private int distance(int x, int y) {
        int dx = cx - x;
        int dy = cy - y;
        return (int)Math.sqrt(dx*dx + dy*dy);
    }

    private boolean intRect(int left, int top, int right, int bottom) {
        int lr = left - radius;
        int rr = right + radius;
        int tr = top - radius;
        int br = bottom + radius;
        if (cx <= lr || cx >= rr || cy <= tr || cy >= br)
            return false;
        if (cx <= left) {
            if (cy <= top)
                return distance(left, top) < radius;
            if (cy >= bottom)
                return distance(left, bottom) < radius;
        } else if (cx >= right) {
            if (cy <= top)
                return distance(right, top) < radius;
            if (cy >= bottom)
                return distance(right, bottom) < radius;
        }
        return true;
    }
} 

Testing it out

First sketch some shapes and modify the test function from the implementing equals() article then code main to compare the shapes.

By the way, I make no claims that the intersect algorithms are efficient or even correct: Let's see how this works out in the abstract base class Shape:
public class TestIntersect {

    public static void interPrint(Shape a, Shape b) {
        String ac = a.getClass().getCanonicalName();
        if (ac == null)
            ac = "?";
        else
            ac = ac.substring(ac.indexOf('.') + 1);
        String bc = b.getClass().getCanonicalName();
        if (bc == null)
            bc = "?";
        else
            bc = bc.substring(bc.indexOf('.') + 1);
        System.out.println(ac + " " + a.toString() + " a intersect() " +
               bc + " " + b.toString() + " b is " + a.intersect(b));
        System.out.println("aoff: " + a.mtOffset() + " boff: " +
           b.mtOffset()+ ", " + bc + " b intersect() " + ac + " a is " +
           b.intersect(a) );
    }

    public static void main(String[] args) {
        Square square1 = new Square(10, 10, 40);
        Circle circle1 = new Circle(30, 30, 20);
        Rectangle rect1 = new Rectangle(40, 40, 20, 80);
        interPrint(square1, circle1);
        interPrint(circle1, rect1);
        interPrint(square1, rect1);
        System.out.println();
        Square square2 = new Square(60, 120, 40);
        Circle circle2 = new Circle(94, 85, 20);
        Rectangle rect2 = new Rectangle(103, 5, 20, 80);
        interPrint(square2, rect1);
        interPrint(square2, circle2);
        interPrint(circle2, rect2);
        interPrint(square2, rect2);
    }
}
Works O.K. Note that the Square's mtOffset() is zero because it is initialized first. Changing instance construction order in main() will change these values but functionality is not affected.  

Square 10, 10, 40 a intersect() Circle 30, 30, 20 b is true  
aoff: 0, boff: 2. Circle intersect() Square is true  
Circle 30, 30, 20 a intersect() Rectangle 40, 40, 20, 80 b is true  
aoff: 2, boff: 1. Rectangle intersect() Circle is true  
Square 10, 10, 40 a intersect() Rectangle 40, 40, 20, 80 b is true 
aoff: 0, boff: 1. Rectangle intersect() Square is true
   
Square 60, 120, 40 a intersect() Rectangle 40, 40, 20, 80 b is false  
aoff: 0, boff: 1. Rectangle intersect() Square is false 
Square 60, 120, 40 a intersect() Circle 94, 85, 20 b is false  
aoff: 0, boff: 2. Circle intersect() Square is false  
Circle 94, 85, 20 a intersect() Rectangle 103, 5, 20, 80 b is false  
aoff: 2, boff: 1. Rectangle intersect() Circle is false  
Square 60, 120, 40 a intersect() Rectangle 103, 5, 20, 80 b is false 
aoff: 0, boff: 1. Rectangle intersect() Square is false  

Summary
 
Generic nested classes provide a way of declaring methods capable of multiple dispatch selection in standard Java. Without performance testing and as-is the technique looks faster than selection from a list of isA statements.

Thursday, 10 December 2009

Test

Decided on this syntax highlighter. Needed some typeface tweaking but has the great advantage that text can be copied without the line numbers if anyone would like to try some of my crazy code.

public class Customer {

    private String firstName;
    private String lastName;
    private Date birthday;
    private String street;
    private String city;
    private String zipcode;

    // follows the Eclipse generated result - two nulls are equal, but a
    // couple of trivial static functions can take out many lines of code
    public static boolean eqFields (Object a, Object b) {
        if (a != null)
            return a.equals(b);
        else
            return (b == null);
    }

    protected boolean equalFields(Customer rhs) {
        return eqFields(firstName, rhs.firstName) &&
               eqFields(lastName, rhs.lastName) &&
               eqFields(birthday, rhs.birthday) &&
               eqFields(city, rhs.city) &&
               eqFields(zipcode, rhs.zipcode);
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        return (obj instanceof CustomerOrg &&
                ((Customer)obj).equalFields(this));
    }

    public static int hash(Object obj) {
        return ((obj == null) ? 0 : obj.hashCode());
    }

    @Override
    public int hashCode() {
        int result = 21 + hash(birthday);
        result = result * 31 + hash(city);
        result = result * 31 + hash(firstName);
        result = result * 31 + hash(lastName);
        result = result * 31 + hash(street);
        return result * 31 + hash(zipcode);
    }
}

Sunday, 6 December 2009

The dangers of overloading

Posted the equals(Object) solution from my last blog on the artima discussion for this topic. As a contributor demonstrated, if the protected field comparison methods are called from outside an instance some very strange results can be produced. The answer is of course that protected methods should not be called from outside a class even if Java's same-package rule permits access. Default access gives plenty of scope for any required kludge without ignoring this last vestige of encapsulation.

These strange results should certainly discourage giving these methods public access or overloading an equals() with the class name, as seen in at least one published implementation. As things stand it is still quite possible to get the protected methods to return a symmetrical false when comparing the same object. All down to overloading, substitution and the aforementioned weak encapsulation but worth a look.

The first thing to do is write a Point base and a ColoredPoint subclass, as in the artima article, but use the new equals() implementation rather canEqual. In the base-class field comparison is factored of to fEquals(Point) and equals(Object) calls the received object's version of fEquals.

public class Point {

private final int x;
private final int y;

public Point(int x, int y) {
this.x = x;
this.y = y;
}

// accessors are optional for this discussion

@Override public boolean equals(Object obj) {
return (obj instanceof Point &&
((Point)obj).fEquals(this) );
}

protected boolean fEquals(Point obj) {
return (this.x == obj.x && this.y == obj.y);
}


@Override public int hashCode() {
return (41 * (41 + getX()) + getY());
}
}

Point can now be subclassed without overriding equals() and instances of all types will provide symmetry when compared for equality. ColouredPoint overrides equals() and overrides fEquals(Point) to return false - the required return comparing a ColoredPoint with a Point.

public class ColoredPoint extends Point {

private final Color color;

public ColoredPoint(int x, int y, Color color) {
super(x, y);
this.color = color;
}

// ....................

@Override public boolean equals(Object obj) {
return (obj instanceof ColoredPoint &&
((ColoredPoint)obj).fEquals(this));
}

@Override protected boolean fEquals(Point obj) {
return false;
}

protected boolean fEquals(ColoredPoint obj) {
return (this.color.equals(obj.color) && super.fEquals(obj));
}

@Override public int hashCode() {
return (41 * super.hashCode() + color.hashCode());
}
}

ColoredPoint gets a fEquals(ColoredPoint) overloading and again equals() calls the received object's version. Using this technique we can write as many Point and ColoredPoint subclasses as we like that do or do not override equals().

To show two references to the same object giving a symmetrical false the same ColoredPoint is assigned to a Point reference and a ColoredPoint reference:

public static void main(String[] args) {
Point p1 = new Point(1, 1);
ColoredPoint cp1 = new ColoredPoint(1, 1, Color.PINK);
Point p2 = cp1;

First do a check that equals(Object) is working O.K. Of the following the first two pairs should return symmetric false comparing a Point with a ColoredPoint. The last pair should get a symmetric true comparing the same ColoredPoint. A same-object check has not been put in equals so the full field comparison is done:

System.out.println("p1.equals(p2) is " + (p1.equals(p2)));
System.out.println("p2.equals(p1) is " + (p2.equals(p1)));
System.out.println("p1.equals(cp1) is " + (p1.equals(cp1)));
System.out.println("cp1.equals(p1) is " + cp1.equals(p1)));
System.out.println("p2.equals(cp1) is " + (p2.equals(cp1)));
System.out.println("cp1.equals(p2) is " + (cp1.equals(p2))+ "\n");

Output is:
p1.equals(p2) is false
p2.equals(p1) is false
p1.equals(cp1) is false
cp1.equals(p1) is false
p2.equals(cp1) is true
cp1.equals(p2) is true


Everything works as expected but now the overloaded fEquals() methods are tested:

System.out.println("p1.fEquals(p2) is " + (p1.fEquals(p2)));
System.out.println("p2.fEquals(p1) is " + (p2.fEquals(p1)) + "\n");
The p1 reference is a Point type that references a Point so Point's fEquals is called and the x, y fields are compared: true expected. The ClouredPoint argument referenced by p2 at runtime has no effect at compile or runtime. Runtime argument type never has any effect on which method gets called.

For the second comparison the p2 reference is also a Point type. The compile time call is to fEquals(Point) but p2 references a ColoredPoint and its overriden version of fEquals(Point) is called at runtime: false expected, this method always returns false.

Output is:
p1.fEquals(p2) is true
p2.fEquals(p1) is false
The next two comparisons behave in a similar way:

System.out.println("p1.fEquals(cp1) is " + (p1.fEquals(cp1)));
System.out.println("cp1.fEquals(p1) is " + (cp1.fEquals(p1)) + "\n");
p1.fEquals(cp1) behavior is the same as p1.fEquals(p2), true expected. In cp1.fEquals(p1) gets a call to ColoredPoint's version of fEquals(Point): always false.

Output is:
p1.fEquals(cp1) is true
cp1.fEquals(p1) is false
The second result is obvious but now the references to the same ColoredPoint are compared:

System.out.println("p2.fEquals(cp1) is " + (p2.fEquals(cp1)));
System.out.println("cp1.fEquals(p2) is " + (cp1.fEquals(p2)));
}
gets:
p2.fEquals(cp1) is false
cp1.fEquals(p2) is false
cp1.fEquals(p2) is false is particularly unintuitive unless we remember that the runtime type of an argument has no effect on which method gets called - ever, unless said method is an exception handler that is.


The binary method problem

Elsewhere multiple and predicate dispatching add another dimension to overridden methods by making the argument type/s direct execution to an appropriate method. A Java extension that supports predicate dispatching is [here] with a covering paper [pdf].

One thing that multiple dispatching is good at is solving the binary method problem but at least a limited solution is possible in standard Java. Suppose we write a base class that has a method with an argument referencing an instance of the class or a same type array argument: a binary method. How can we allow for this method being overridden so that it works for instances and arguments of any subclass including those that have not yet been written and without changing code in the base class? The equals() implementation is a special case solution to the binary method problem using programmed double dispatching.


Diverse identifiers

Overloading is simply a convenience that saves having to think up a new method name. It forms the basis of operator overloading but may not be appropriate when methods are subject to being overridden in a subclass. Overloading adds nothing to functionality and as shown above can get unwanted side effects when an overloaded method is overridden. Maybe we need to consider whether this kind of polymorphism is generally helpful or whether it only adds ambiguity in some and perhaps in many cases - not for the compiler of course but when trying to interpret someone else's code or our own at a later time.

There is no reason why fEquals should not be given different identifiers in different classes, pointEquals(Point) and coloredPointEquals(ColoredPoint) for example. On the other hand doing this gets another common OOP problem: linkedColoredPointEquals(something) - huge identifiers that wrap code onto the next line. When implementing equals() there is never any scope for calling the protected method against a reference with an inappropriate nominal type. Perhaps all that is needed to provide better clarity is a more explicit overloaded method name, equalFields(...), for example.

Sunday, 29 November 2009

Overriding equals(Object): an optimally correct solution

Overriding Object's apparently innocuous equals(Object) method can be a source of difficulties. The first problem is that there appears to be no single implementation that meets all requirements. The second problem is the way that HashMap, the prime suspect for causing errors if the letter of the equals() specification is not met, uses equality to maintain its structure.

When equals() is overridden the initial requirement is to ensure that the Object in equals(Object) belongs to a class that is comparable with the current object. Two different implementations can be found in publications from authorities on the subject. They have some utility but both cause problems if implemented outside their limited applicable usage. Neither can provide a correct implementation of equals() for all inheritance requirements in a class hierarchy. A third published implementation can supply an equals() with correct behaviour in the two most commonly required cases but brings a considerable execution overhead when equals() is overridden in a subclass.

A simple equals() configuration is described that allows a class to have any of the three possible equality relationships with its super classes. For the two most commonly required relationships modifying the standard equals() implementation found in Java library classes to provide correct behaviour is trivial, simply a matter of factoring-off field comparisons to a separate method. Overriding equals() in a subclass is similarly trivial. If no modification to equality is required the new class can be derived without any additional coding related to equals() functionality. The approach brings almost no penalty for execution efficiency and additional coding is minimal.


Background


Overriding equals() allows a different view of object equality from Object's same-object implementation to be introduced. Used with hashCode(), equals() also provides the basis for object inclusion, exclusion and retrieval from a HashMap, a fast storage and retrieval structure appropriate for many applications.

Making an overridden equals() work according to plan brings two requirements: a well defined set of objects that the new method applies to and compliance with the so called equals() contract - in fact a definition of the required equality relationship between all objects. Its requirements are reflexivity, symmetry, transitivity, consistency and that a comparison with a null should always return false. These properties define an equality relationship found generally in most applications outside of quantum mechanics and we are on fairly safe ground.

An exceptionally lucid expose of the equals() contract and discussion of problems with published equals() implementations can be found at

http://www.angelikalanger.com/Articles/JavaSolutions/SecretsOfEquals/Equals.html


The problem identified

The equals() problem occurs anywhere we want to derive a subclass that has a concrete superclass other than Object, for which equals() has already been overridden and add some fields that are significant when computing subclass equality.

Figure 1 A moot choice

As shown in Figure 1 - ClassA, we would probably like to inherit from an existing class getting all its methods and functionality without writing any code, taking advantage of normal OOP provision for code reuse. The requirement to override equals() may prevent this course because the standard equals() implementation found in many Java library classes does not work in this situation.

To get a better understanding of the problem we can formalize this requirement of an object model that makes object equality other than basic identity a property of all objects as:

The first code reuse requirement: ability to derive a class that declares zero or more new significant fields and is not a member of its superclass comparison set.

It is apparent there are four possible requirements and this statement covers two of them. The remaining two are:

Second code reuse requirement: ability to derive a class that declares no new significant fields and is a member of its superclass comparison set.

Third code reuse requirement: ability to derive a class that declares new significant fields and is a member of its superclass comparison set.

We now need a definition for comparison set.


A comparison-set model

A comparison set is a set of classes having a root class for which instances of the root and some but possibly not all of its subclasses are compared for equality using significant fields declared or present in the root. For completeness we must also say that significant fields declared in a subclass may also figure in a comparison but this is an unusual condition satisfying the third code reuse requirement. When an object is compared with an object of a class not in its comparison set equals() is unable to make a meaningful comparison and must always return false.

Figure 2 sketches a class hierarchy with classes in a comparison set coloured green. Other classes are not in the green set and supply a potential root for other comparison sets.

Figure 2 A comparison set

A class that is a descendant of Object and does not override equals() is in a comparison set with Object as its root. The particular contract of Object's equals() - no two objects are equal, makes the standard implementation viable for overriding equals() in a direct subclass, A in Figure 2, but not in a sub classes B & C of a class in a derived comparison set. The standard implementation meets the first code reuse requirement in a class with no superclasses that have used it to override equals(), elsewhere it fails.

The Figure illustrates two important properties of comparison sets. Firstly, all classes in the set are descendants of a single root. The equals() implemented in the root must include a condition that determines whether the argument in equals(Object) is a member of the set of comparable objects for which a field comparison should be done or whether it is of another kind for which equals() should return false.


The 'standard' equals() implementation

The standard implementation employs instanceof the-root to exclude objects not in the comparison set. The instanceof statement can appear in a number of different guises, as illustrated using a class that declares a value field with an overloaded == operator:
    @Override public boolean equals(Object obj) {
        return obj instanceof Root &&
            value == ((Root) obj).value;
    }

    @Override public boolean equals(Object obj) {
        if (obj instanceof Root) {
            return (value == ((Root) obj).value);
        }
        return false;
    }

    @Override public boolean equals(Object obj) {
        if (this == obj) return true;
        if (! (obj instanceof Root)) return false;
        return (value == ((Root) obj).value);
    }
All configurations exclude instances of any class that is not a descendant of Root from field comparison and return false. The same-object check shown in the third configuration is of course optional and can be used in any implementation.

Which brings us to the second property of comparison sets and the problem with the standard equals() implementation: not all classes that are descendants of the Root are members of the same comparison set. The instanceof condition succeeds for instances of all subclasses irrespective of new fields. A superclass object can be found equal to a subclass object if fields declared in the superclass are equal to inherited fields in the subclass. Presenting the same superclass object to an overridden equals() in the subclass its instanceof will reject the superclass object and return false.
    @Override public boolean equals(Object obj) {
        if (obj instanceof Subclass) {
            Subclass sco = (Subclass) obj;
            return ( .......... );
        }
        return false;
    }
The symmetry requirement of object equality is broken and in the first case no proper equality assessment has been done making the result a nonsense.

There are of course occasions when we want to derive a class from a superclass without overriding equals(): the second code reuse requirement. A subclass might introduce a next field for example, a reference to an instance of the subclass so that a linked list of objects can be constructed. The standard implementation meets the second requirement. It fails when presented with a subclass instance that uses new fields to check equality and is not in the comparison set. It does not enable us to implement ClassA in Figure 1. Can an additional or modified condition be introduced to exclude instances of these classes from field comparison?


equals() using class comparison


Another commonly seen equals() implementation compares the runtime Class reference of the two objects to determine comparison set membership:
obj != null && obj.getClass == this.getClass
replaces obj instanceof Root in the condition. It could be used to derive Figure 1 - ClassA if used in the existing and new classes but does not allow any further subclasses that belong to one or other comparison set to be derived. All objects that are not of the same class are excluded and equals() returns false for these objects.

This means that superclass objects are excluded from subclass superclass comparisons and get a false return. If anything this characteristic is more dangerous in use than the instanceof problem. When equals() is not overridden in a subclass its functionality is modified by proxy to get a constant false return for superclass objects. A person who derives the subclass and has not modified equals() may be left unaware of this side effect until the application fails at a later time. A class comparison equals() implementation meets the first code reuse requirement. It cannot meet the second.


The third code reuse requirement


Not surprisingly, overriding equals() from a concrete superclass is seldom seen in Java code but there is one pertinent and frequently quoted example: java.sql.Timestamp. This class is a java.util.Date subclass that overrides equals to include a nanos field. Comparing a Date with a Timestamp returns true if the dates match irrespective of the value of nanos. Comparing a Timestamp with a Date always returns false even with matching dates: 'because the nanos component of a date is unknown.'

If equals() was overridden in both classes using the implementation proposed here there would be an open choice as to whether to make both comparisons return false or to allow a Timestamp to be equal to a Date if and only if its nanos field was zero (or any other single value). The methodology also allows a Timestamp to have symmetrical equality with a Date if dates match or if a Timestamp is within 500,000 nanos of a Date but these options break the equals() specification requirement for transitivity while the zero nanos option does not.

Here the valid option where a Timestamp with a zero nanos field may be equal to a Date supplies an example of a third possible equality relationship between a sub and superclass and the third if seldom needed code reuse requirement.


Alternatives to the standard implementation

An alternative to producing a flawed equals() in a subclass using the standard or class comparison implementations is to use composition, as illustrated by Figure 1 - ClassB. Using composition to implement structural elements of a program or even a data type where it fits well is one thing. Being forced to use it to implement equals() is something else. Do this and we can be faced with another set of no-win choices: write a lot of methods that do nothing except call composed object methods or include a method that returns the composed object - a possible solution if the object is immutable, otherwise doing this is likely to cause more problems that it solves. There may still be a problem even when the returned object is a clone - the clone does not reflect legitimate modifications to the original.

The standard equals() implementation used in library classes leaves no satisfactory alternative to composition when equals() is overridden but otherwise it is not a desirable alternative.

Taking on board the fact that neither of the commonly used implementations work satisfactorily an artima article posts a working solution for overriding equals() to get two of the code reuse requirements identified above:

http://www.artima.com/lejava/articles/equality.html

The third code reuse requirement is seldom needed and is therefore relatively unimportant but a problem with this implementation is that it uses a canEqual(Object) method that introduces an additional instanceof check and execution overhead for all calls to equals() even when comparing same-class objects. Consequently it cannot be seen as an optimal replacement for the standard implementation that can be put in place as a matter of course to allow subclasses to be derived at a later time.

Another downside is that the basic cost of a call doubles for an overridden equals in a subclass: super equals() is called to get superclass field comparison involving a second call to canEqual() and duplicate instanceof tests. The primary indication for using hashed storage is to provide fast access making this implementation not ideal for specific use even when the requirement to override equals() is known in advance.

The artima article shows a way forward and that a solution is possible but additional requirements to be addressed here are to devise an equals() implementation with a minimal code footprint and execution overhead that can be used as a matter of course in any class that overrides equals(). The implementation must allow fully working subclasses to be derived in all three cases where they may be required to meet OOP standards for code reuse.

We have now identified the requirements that an optimal equals() implementation must meet and can assess how the proposed alternative stands up.


Devising a solution

Figure 3 shows an inheritance structure containing four comparison sets.

Figure 3 Green, red, yellow and blue comparison sets

Here we consider a base class that is a concrete subclass of Object. Using an abstract base has only minor implications and then only if it declares significant fields. The simplest possible class will supply examples: one with a single integer field is used. The number of fields or even a complete audit of the current state of a data structure as used by HashMap has no bearing on the problem. This class is called GreenBase in the diagram and is the root of a green comparison set.

With the exception of ZGreen, GreenBase descendants in the green comparison set do not override equals(). They introduce no new fields that figure in comparison and are comparable using the equals() inherited from GreenBase. They can of course introduce other fields, the one on the left could be a LinkedGreen with a field referencing the next object in a list.

Implementing these classes requires that the equals() implementation satisfies the second requirement for code reuse: ability to derive a class that declares no new significant fields and is a member of its superclass comparison set. From the Figure we can see that there is a secondary requirement - that the condition used to determine comparison set membership should include all possible descendants of the base on all branches.

The standard implementation satisfies both these requirements using instanceof the-base but now we come to the apparently conflicting code reuse requirement: ability to derive a class that declares zero or more new significant fields and is not a member of its superclass comparison set. The equals() in GreenBase inherited by green subclasses can do an initial assessment for comparison set membership using instanceof or some similar isA condition but if it is going to make provision for derived comparison sets it must exclude instances of RedBase, YellowBase and BlueBase that supply the base class for new sets. All these classes are also GreenBase subclasses and instances will pass the initial assessment.

This problem raised the question as to what additional isA or class comparison check could be put in equals() to exclude instances on the same or a different branch from the root. After some thought and deciphering of the contracts of various Class methods it was decided that there wasn't one. We could perhaps give all comparable objects an additional property, say Class comparisonBase() or a similar final field, but doing this adds additional coding and additional logic in equals().

Otherwise only the received object can do the necessary checks but it was also found that when the decision on membership was passed to that object checking ancestry was no longer a requirement. The same is true when the received object's class has the third equality relationship, ZGreen in Figure 3, but as discussed below an additional isA check is then required.

The following sections assess how this implementation fairs with regard to the criteria for minimum code size and effect on execution efficiency over the standard implementation.


Minimizing the code footprint

The requirement here was to develop an implementation that was altogether trivial to code and required a minimum of extra coding over the standard equals(). GreenBase supplies a standard where differences are not overwhelmed by other code when the two implementations are shown side by side:


Excluding braces from the calculation the difference amounts to the two extra lines of code used by the fEquals(GreenBase) method. The field comparison method can have any name even equals(GreenBase) but it is suggested that standardizing the name to fEquals will facilitate use of this implementation and that fEquals should be overloaded when equals() is overridden in descendants to maintain consistency in use.

The extra coding is nothing compared with the complication of using composition to get a subclass equivalent and the chore of writing many do-nothing methods in a real life situation.


Minimizing the overhead

In GreenBase the equals() implementation introduces the smallest possible overhead above the standard, only the cost of calling fEquals(). The initial task is to exclude equals(Object) arguments that are instances of any class not a GreenBase or a subclass. The instanceof operator includes a check that the argument is not null and provides an economical solution:
public class GreenBase
{
    private final  int value;

    public GreenBase(int value) {
        this.value = value;
    }

    public int getValue() {
         return value;
    }

    @Override public boolean equals(Object obj) {
        return obj instanceof GreenBase &&
                ((GreenBase) obj).fEquals(this);
    }

    protected boolean fEquals(GreenBase obj) {
         return  ( value == obj.value );
    }

    @Override public int hashCode() {
        return 21 + value;
    }

}
Having determined that the equals(Object) argument is a GreenBase or a subclass and therefore a type that has an fEquals(GreenBase) method the next step is to call the argument's version passing a reference to the current object. This call is the only addition to the overhead for an implementation with field comparison done inline.

The decision on equality is passed to the received object. It is already known that the object is a GreenBase or a subclass. If the object's class is a member of the same comparison set it will have the same version of fEquals(GreenBase), which does the field comparison and returns the result, otherwise an overridden version simply returns false.


Deriving a new comparison set

RedBase introduces a new significant field, is not comparable with a GreenBase or instances of any green subclass and forms the root of the red comparison set:
public class RedBase extends GreenBase
{
    private final  long x;

    public RedBase(int value, long x) {
        super (value);
        this.x = x;
    }

    public long getX() {
         return x;
    }

    @Override public boolean equals(Object obj) {
        return obj instanceof RedBase &&
            ((RedBase) obj).fEquals(this);
    }

    protected boolean fEquals(RedBase obj) {
        return (x == obj.x && super.fEquals(obj));
    }

    @Override protected boolean fEquals(GreenBase obj) {
        return false;
    }

    @Override public int hashCode() {
        return super.hashCode() * 31 + (int) (x ^ (x >>> 32));
    }

}
RedBase overrides equals() with a new instanceof condition excluding instances of GreenBase and its other subclasses and calls a new fEquals(RedBase) overloading of the field comparison method on the received object. fEquals(GreenBase) is overridden to return false.

When a RedBase equals() receives a GreenBase or an instance of any of its other subclasses, including those that may have override equals(), the instanceof RedBase condition fails and false is returned. When the equals() in a GreenBase or another subclass is passed a RedBase either its overriden instanceof fails − it is a member of another derived comaprison set, or it is a member of the GreenBase comparison set and calls the object's overridden version of fEquals(GreenBase). Either case gets a symmetric false result for the two comparisons − exactly what is required.

Implementing classes that form the root of the other two comparison sets shown in Figure 3 follows the same pattern. BlueBase for example, inherits RedBase's constant false fEquals(GreenBase). It will override fEquals(RedBase) to return a constant false and implement a new fEquals overloading to deal with objects in the blue comparison set.

The logic is completely straightforward, but now we move on to implementing the unusual equality relationship where a class declares a new field that is used for comparison with instances of the same class and its subclasses but comparison with a superclass and other subclasses in the comparison set is permitted.


Equality of the third kind

ZGreen introduces a z field that is significant for equality. Comparison with instances of GreenBase and all other classes in the green comparison set is allowed.

The only circumstance where doing this makes any sense is if GreenBase is seen as supplying some kind of default z. If z is a string the default might be "" or null. If it is an integer it could be zero but the default can be any value from the range, yes even 42 is a possible candidate for an integer default. No approximate or wildcard value is permitted for the default or transitivity will no longer apply.

A default of zero seems to make a little sense so ZGreen is implemented with an integer z field and all other objects in the green comparison set are seen as having z == 0:
public class ZGreen extends GreenBase
{
   private final  int z;

    public ZGreen(int value, int z) {
        super (value);
        this.z = z;
    }

    public int getZ() {
         return z;
    }


    @Override public boolean equals(Object obj) {
        if (obj instanceof GreenBase) {
            if (obj instanceof ZGreen)
                return ((ZGreen) obj).fEquals(this);
            return (z == 0 && ((GreenBase) obj).fEquals(this));
        }
        return false;
    }

    protected boolean fEquals(ZGreen obj) {
        return (z == obj.z && super.fEquals(obj));
    }

    @Override protected boolean fEquals(GreenBase obj) {
        return z == 0 && super.fEquals(obj);
    }
}
In GreenBase the first instanceof condition admits objects that are GreenBase or a subclass and false is returned for any other object. Here a second grouping of the comparison set is made to include instances of ZGreen and its subclasses using a second instanceof comparison. Other objects that may or may not belong to the green comparison set are treated separately.

ZGreen arguments are dealt with by calling the object's fEquals(ZGreen). For other types, a check for the default z value is made and if this succeeds the object's fEquals(GreenBase) is called. The local fEquals(GreenBase) is overridden to deal with direct calls from other objects in the comparison set when their equals() receives a ZGreen argument. Other GreenBase subclass objects will not be calling fEquals(GreenBase) because a ZGreen will fail their initial instanceof condition.

hashCode() is not overridden - the inherited version returns a value based on the inherited value field so that GreenBase and ZGreen objects with a matching value both have the same hashcode.


Implementing an isA comparison


In ZGreen the second instanceof comparison includes a redundant check that the Object argument is not null. It can be eliminated by substituting an isA comparison using the Class method isInstance(Object).
    @Override public boolean equals(Object obj) {
        if (obj instanceof GreenBase) {

            // if obj isA ZGreen
            if (ZGreen.class.isInstance(obj))
                return ((ZGreen) obj).fEquals(this);
            return (z == 0 && ((GreenBase) obj).fEquals(this));
        }
        return false;
    }
Interpreting the meaning of Class' isInstance() method is quite brain numbing. One trick is to correct the English in your head to what it actually means - 'obj.getClass().hasInstance(this)' for example. Another trick is to read the statement right to left: this isInstance obj.getClass(). Failing these you can put your own isA() method in the base class - the only con is execution efficiency:
public boolean isA(Object obj) {
        if (obj == null) return false;      // optional check here - may be better done by caller
        return obj.getClass().isInstance(this);
    }
Summary

The equals() implementation shown here produces a correct comparison that complies with the equals() meta-contract for all of the three possible equality relationships that a subclass can have with a superclass. Analysis using the comparison-set model demonstrates that there are only three such relationships that equals() needs to support.

It is suggested that any additional overhead from calling a factored-off field comparison method is insignificant in most cases and that the ability to produce subclasses with a fully functional overridden equals(), allowing OOP code reuse without employing composition indicates that this implementation should be used as a matter of course when coding any class that overrides equals().

There are other related topics that may be addressed in a later posting: using an abstract superclass, the HashMap put(K, V) gotcha and implications of the Liskov Substitution Principle. Currently all that needs to be said on the LSP is that the second Liskov and Wing paper excludes relationships between objects from general discussion but groups this aspect with other 'safety properties (nothing bad happens)'. Certainly the LSP does not appear to address anything beyond Object's view of equality. It is down to us to make sure 'nothing bad happens' when equals() is overridden but it will certainly help if the implementation used works in the first place.

Barbara Liskov, Jeannette M Wing - A Behavioral Notion of Subtyping - ACM 1994

Test listing:
public class TestEquals {

    public static void eqPrint(Object a, Object b) {
        String ac = a.getClass().getCanonicalName();
        if (ac == null) ac = "?"; else ac = 
            ac.substring(ac.indexOf('.') + 1);
        String bc = b.getClass().getCanonicalName();
        if (bc == null) bc = "?"; else bc = 
            bc.substring(bc.indexOf('.') + 1);
        System.out.println(ac + " a equals() " + bc + " b: " +
            a.equals(b));
        System.out.println(bc + " b equals() " + ac + " a: " + 
            b.equals(a));
        System.out.println("Hashcodes are equal: " +
                              (a.hashCode() == b.hashCode()));
    }
    public static void nl() {System.out.println();}
    public static void main(String[] args) {
        Object a = new GreenBase(42);
        eqPrint(a, "");                     // false - false
        eqPrint(a, new GreenBase(42));      // true  - true
        eqPrint(a, new GreenSubA(42));      // true  - true
        eqPrint(a, new GreenSubA(41));      // false - false
        Object b = new GreenBase(42) {
            String s = "local class";
        };
        eqPrint(a, b);                      // true  - true
        nl();
        a = new GreenSubA(42);
        eqPrint(a, new GreenSubA(42));      // true - true
        eqPrint(a, new GreenSubB(42));      // true - true
        nl();
        a = new RedBase(42, 7);
        eqPrint(a, new RedBase(42, 7));     // true  - true
        eqPrint(a, new RedBase(41, 7));     // false - false
        eqPrint(a, new GreenBase(42));      // false - false
        eqPrint(a, new GreenSubA(42));      // false - false
        nl();
        a = new ZGreen(42, 0);
        eqPrint(a, new GreenBase(42));      // true  - true
        eqPrint(a, new GreenSubA(42));      // true  - true
        eqPrint(a, new ZGreenSubA(42, 0));  // true  - true
        eqPrint(a, new GreenSubA(41));      // false - false
        eqPrint(a, new RedBase(42, 0));     // false - false
        eqPrint(a, new RedSubA(42, 0));     // false - false
    }

    public static class GreenBase {

        private final int value;

        public GreenBase(int value) {
            this.value = value;
        }

        public int getValue() {
            return value;
        }

        @Override public boolean equals(Object obj) {
            return obj instanceof GreenBase &&
                    ((GreenBase) obj).fEquals(this);
        }
        protected boolean fEquals(GreenBase obj) {
            return (value == obj.value);
        }
        @Override public int hashCode() {
            return 21 + value;
        }
    }

    public static class GreenSubA extends GreenBase {
        GreenSubA next;
        public GreenSubA(int value) {
            super (value);
        }
    }
    public static class GreenSubB extends GreenBase {
        GreenSubB next;
        public GreenSubB(int value) {
            super (value);
        }
    }

    public static class RedBase extends GreenBase {

        private final long x;

        public RedBase(int value, long x) {
            super(value);
            this.x = x;
        }

        public long getX() {
            return x;
        }

        @Override public boolean equals(Object obj) {
            return obj instanceof RedBase &&
                    ((RedBase) obj).fEquals(this);
        }
        protected boolean fEquals(RedBase obj) {
            return (x == obj.x && super.fEquals(obj));
        }
        @Override protected boolean fEquals(GreenBase obj) {
            return false;
        }
        @Override public int hashCode() {
            return super.hashCode() * 31 + (int) (x ^ (x >>> 32));
        }
    }

    public static class RedSubA extends RedBase {
        RedSubA next;
        public RedSubA(int value, long x) {
            super(value, x);
        }
    }

    public static class ZGreen extends GreenBase {

        private final int z;

        public ZGreen(int value, int z) {
            super(value);
            this.z = z;
        }

        public int getZ() {
            return z;
        }

        @Override public boolean equals(Object obj) {
            if (obj instanceof GreenBase) {
                if (obj instanceof ZGreen) {
                    return ((ZGreen) obj).fEquals(this);
                }
                return (z == 0 && ((GreenBase) obj).fEquals(this));
            }
            return false;
        }
        protected boolean fEquals(ZGreen obj) {
            return (z == obj.z && super.fEquals(obj));
        }
        @Override protected boolean fEquals(GreenBase obj) {
            return z == 0 && super.fEquals(obj);
        }
    }

    public static class ZGreenSubA extends ZGreen {
        ZGreenSubA next;
        public ZGreenSubA(int value, int z) {
            super(value, z);
        }
    }
}

Monday, 22 June 2009

Using Processing in NetBeans (6.5)

Boyd mentioned passing a reference to the current PApplet to separate classes from a Processing app. For anyone who has not yet tried combining Processing and NetBeans, points I have noted are here - there may be better alternatives. This post shows an example that implements a clipboard Copy and Paste subsystem. The code can be used in any processing application. Three variants are discussed: implementation as an inner class in a NetBeans (or PDE) environment, as a separate class in NetBeans and, using the resulting .jar from the Processing IDE (PDE). Initially a NetBeans Processing application is needed.

Creating a NetBeans Processing app.

1) In NetBeans make a New project, say 'Processing Template', with Create Main Class say 'processing.ProcessingApp'. (if you change the package name in the wizard, the package is created OK but the package processing; statement may be missing from the listing: just type it in)

2) Right click on the Libraries node in the project tree: Add JAR/folder, navigate to the folder where Processing core.jar is located, ...\processing-1.0.2-expert\ processing-1.0.2-expert\lib on my system. Select 'core.jar', and Open.

3) Modify the class ProcessingApp as shown:


package processing;

import processing.core.*;

/**
*
* @author Martin Humby
*/
public class ProcessingApp extends PApplet {

public static void main(String[] args) {
PApplet.main(new String[]{"processing.ProcessingApp"});
}

// Code that can optionally be copied to and compiled from the PDE goes here.
// The PDE only requires explicit import of packages it does not import
// implicitly: import java.awt.datatransfer and java.awt.image for the example
// below. Duplicating implicit imports is OK, for the standard imports
// that is. The PDE does not like public void setup() { } - delete public

} // end of ProcessingApp

Note that the string literal in main() must match the package name and class name. If you use Copy, Paste - Refactor to get a version copy and want to be able to run the copy, refactor does not modify the string and the new package and/or class-name must be changed by hand (otherwise the copy runs the current version of the original - confusing!).

To get a slightly more convenient way of accessing the library path to core.jar, select Tools - Libraries from the NetBeans menu bar. New Library, Library Name: 'Processing', Library Type: Class Libraries. Make sure Processing is selected in the Libraries tree and do Add JAR/Folder as in 2) above. Now right click the Libraries node for the project select Add Library. Scroll down the list of Global Libraries and select Processing.

Making the outline shown above a module template or plugin that can be installed into the IDE appears to have disadvantages. Seems to me the easiest way to keep a framework for Processing apps. handy is simply to save the outline as a standard project. Open the outline and use right-click Copy from the project node to get and rename a new project complete with the core.jar path already set up. You can of course include other Processing libraries with the outline.

Passing an explicit PApplet

An inner class that might be factored off to a separate class is a Clipboard Copy and Paste subsystem. Hiding its implementation means passing the current PApplet instance to its Copy and Paste functions. Writing subsystem access methods and the inner class as static ensures a separate class can be got with minimum modifications. Here the static keyword excludes passing an implicit this reference to the inner class: same situation as in a separate class. A difference is that enclosing class static fields remain accessible without qualification. Need to put this in if a straight transfer is going to work, (g.format == PApplet.ARGB) for example, or add such qualifications later.

To try this approach Copy the 'Processing Template' as a new project say 'Processing Clipboard' and modify the class ProcessingApp as shown. You can of course use Refactor to rename the class if you wish and modify the string in main() to suit.

package processing;

import java.awt.*;
import processing.core.*;
import java.awt.datatransfer.*;
import java.awt.image.BufferedImage;
import java.io.IOException;

/**
*
* @author Martin Humby
*/
public class ProcessingApp extends PApplet {

public static void main(String[] args) {
PApplet.main(new String[]{"processing.ProcessingApp"});
}

static final int WIDTH = 500;
static final int PANEL_HT = 160;
static final int HEIGHT = PANEL_HT + 500;
static final int DRAWING_HEIGHT = HEIGHT - PANEL_HT;
static final int PANEL_TOP = DRAWING_HEIGHT;

public void setup() {
size(WIDTH, HEIGHT);
background(255);

// include a panel below drawing area
fill(128);
rect(0, PANEL_TOP, WIDTH - 1, PANEL_HT - 1);
}

// draw() is required to get keyboard monitoring apparently
public void draw() {
}

public void keyReleased() {
// CTRL+C: Copy to Clipboard
if (key == 3) {
copyImageToClipboard(0, 0, width, height - PANEL_HT, this);
}

// CTRL+V: Paste Clipboard
if (key == 22) {
PImage img = getClipboardImage(this);
if (img == null)
return; // not an image on clipboard

// blank drawing area if required
background(255);

// paste image centred on area above panel
int w = img.width;
int h = img.height;
int dx = (width - w) / 2;
int dy = (height - PANEL_HT - h) / 2;
image(img, dx, dy);

// restore the panel area
fill(128);
rect(0, PANEL_TOP, WIDTH - 1, PANEL_HT - 1);
}
}

/////////////////////////////////////////////////////////////////////////////
//
// COPY AND PASTE SUBSYSTEM
//

/**
* Clipboard copy and paste: the following references were used
* http://processing.org/hacks/hacks:clipboard
* http://www.devx.com/Java/Article/22326/0/page/1 et seq.
*
*/

public static void copyImageToClipboard(int top, int left, int width,
int height, PApplet p) {

// could pass this.g directly but passing this for both functions gets
// consistency in use
PGraphics g = p.g;

// g is needed to get format
BufferedImage bImage = new BufferedImage(width, height,
(g.format == PApplet.ARGB) ?
BufferedImage.TYPE_INT_ARGB : BufferedImage.TYPE_INT_RGB);
g.loadPixels();

// some explicit identifiers clarify setRGB() usage:
// setRGB(bImageX, bImageY, bothW, bothH, source[], srcOffset, sourceW);
bImage.setRGB(0, 0, width, height, g.pixels, left + top * g.width, g.width);

TransferablePImage imageSelection = new TransferablePImage(bImage);
Toolkit.getDefaultToolkit().
getSystemClipboard().setContents(imageSelection, null);
}

// return a PImage rather than an awt.Image
public static PImage getClipboardImage(PApplet p) {

Transferable t =
Toolkit.getDefaultToolkit().getSystemClipboard().getContents(null);
try {
if (t != null && t.isDataFlavorSupported(DataFlavor.imageFlavor)) {
BufferedImage bimg = (BufferedImage)
t.getTransferData(DataFlavor.imageFlavor);
int w = bimg.getWidth();
int h = bimg.getHeight();

// make it ARGB to be on the safe side - makes no difference
// unless clipboard image has some tranparency could be?
PImage pImg = p.createImage(w, h, PApplet.ARGB);
pImg.loadPixels();
bimg.getRGB(0, 0, w, h, pImg.pixels, 0, w);
pImg.updatePixels();
return pImg;
}
} catch (UnsupportedFlavorException e) {
} catch (IOException e) {
}
return null;
}

static class TransferablePImage implements Transferable {

// the class-name makes more sense as a separate class
private Image image; // an awt.Image

public TransferablePImage(Image image) {
this.image = image;
}

public Object getTransferData(DataFlavor flavor)
throws UnsupportedFlavorException {
if (!flavor.equals(DataFlavor.imageFlavor)) {
throw new UnsupportedFlavorException(flavor);
}
return image;
}

public boolean isDataFlavorSupported(DataFlavor flavor) {
return flavor.equals(DataFlavor.imageFlavor);
}

public DataFlavor[] getTransferDataFlavors() {
// in some cases there may be more than one flavour?
return new DataFlavor[]{DataFlavor.imageFlavor};
}
}
} // end of ProcessingApp

To test it open say Paint and load an image. Copy a selection smaller than 500 x 500, and paste to the Processing app. Copy and paste it back to Paint - the white border now surrounds the pasted area.


Factoring off the Clipboard Subsystem

It is optional whether a separate TransferablePImage class is put in a new package but doing this will make life easier if you want to use it from the Processing IDE. Right click Source Packages - New - Java Package, 'utilities' then
utilities - New - Java Class - 'TransferablePImage'.

Copy the inner class to the new file and modify it copying the subsystem interface methods as static class methods:

package utilities;

import java.awt.*;
import java.awt.datatransfer.*;
import java.awt.image.BufferedImage;
import java.io.IOException;
import processing.core.*;

/**
* Copy or paste Processing PImage to / from system clipboard
*
* @author Martin Humby
*/
public class TransferablePImage implements Transferable{

private Image image;

public TransferablePImage(Image image) {
this.image = image;
}

// static methods now part of class

public static void copyImageToClipboard(int top, int left, int width,
int height, PApplet p) {

... body as inner class

}

public static PImage getClipboardImage(PApplet p) {

... body as inner class

}

... other methods as inner class
}


To test the new class make a copy of ProcessingApp, say ProcessingApp1, using copy and paste to the processing package. Do not forget to change the class name in the string literal! Add import utilities.TransferablePImage; Delete the COPY AND PASTE SUBSYSTEM. Modify lines that call Copy and Paste to:

TransferablePImage.copyImageToClipboard(0, 0, width, height - PANEL_HT, this);

PImage img = TransferablePImage.getClipboardImage(this);

Optionally delete ProcessingApp1 imports that are no longer used and proceed as before.


Using TransferablePImage from the PDE

Create a new NetBeans project 'processing utilities' with Create Main Class unchecked. Fix up the libraries path to core.jar. Copy and Paste the utilities package from its current location to the new project's Source Packages node. Build the project: the location of the resulting processing_utilities.jar file is shown in the NetBeans Output window.

Make a new folder called maybe 'jars'. A good place could be in the default My Docments\Processing folder. Copy the processing_utilities.jar file to the new folder. Open the Processing IDE and make a new Sketch. Copy in the code from ProcessingApp1.java from after the main() method. Do not include the last curly bracket that belongs to the enclosing class. Delete public from the setup() method.

From the PDE menu bar select Sketch - Add File... navigate to the 'jars' folder and open processing_utilities.jar. The IDE reports 'One file added to the sketch.' No import statement is required and putting one in gets a compilation error. Run the sketch - the result should be identical to the NetBeans version. You can of course include several other classes in processing_utilities.

For more on this subject see http://dev.processing.org/libraries/


Summary

Most code that compiles in the Processing IDE will compile in NetBeans with a PApplet wrapper and fixed up imports. There are a few exceptions. The non-Java color data type is an integer and needs changing to int. A second minor but annoying problem is the fact that Java has the default type for floating point as double. Processing makes the type implied but overflow checked on compile (neither check for values effectively zero apparently).

Processing: float h = 4.13566733e-15; Java: float h = 4.13566733e-15f; Compiler simplicity maybe but after inserting a few dozen 'f's you begin to wonder whether it generates float g = 42; as a promotion. Java of course insists that PApplet methods that get overridden have the same public access as in PApplet but adding this is trivial.

Similarly, most code that will compile in NetBeans compiles in the PDE. To avoid problems particularly when there are errors in the code, delete public access for setup(). Processing is perfectly happy with anonymous classes, which may save a few lines of code here and there. Currently generics are not supported. A pity. I like Java generics. They seem to fit nicely with the object model without providing a sledgehammer capability for wholesale obfuscation. Of course, this opinion may be a reaction to an excess of _traits and spiky brackets elsewhere but generics could be useful to reduce code size in some applications. A set of minimalistic controls managed and drawn by Processing for example:


There is little problem implementing a tiny control management subsystem using a state machine: there are less than a dozen states for a complete set and some can be excluded - no drag and drop for example. Similarly, a single Control type with action and graphics closures, implemented as classes and assigned to its attributes on a mix and match basis, keeps coding to a minimum. Controls manage themselves for radio-button selection and which control gets input (focus). Problems come up when programmatic access to a control is needed - to modify text in controls that show drawing status and size for example.

Arguably this is the wrong way to do it. Perhaps closures should monitor status in the wider environment of the enclosing PApplet but doing this means that controls that can otherwise be entirely passive require a specific action closure or active graphic to do the monitoring. A single line of code, in the draw() function say, can modify a passive control. Alternatively a new closure class is required to implement monitoring.

Control action and graphic attributes are coresponding base classes. Access to a specific attribute means either retaining a reference to the control and casting attributes to their actual type or retaining a reference to the attribute itself. Neither way is ideal as far as code size and clarity goes. Generics can do the casting once for a control reference and all specific attribute types are then accessible directly through that reference. There must be other situations where using generics from the PDE could be useful.