Saturday, October 22, 2011

Java Compile Time Method Binding

Introduction: It appears to be a simple process to determine which method a particular method invocation refers to, but it still needs pages of documentation in the java language specification to address this, especially to take care of auto-boxing/auto-unboxing and variable arguments. In this article, I will highlight how an actual method is bound at compile time to a particular method invocation.

Java method invocation instructions: Upto version 1.6, java had four types of method invocation instructions, namely invokestatic, invokevirtual, invokeinterface and invokespecial. Below are the short descriptions
  1. invokestatic: used to invoke static methods
  2. invokevirtual: this is the most used invoke instruction. It is used for invoking non-static methods and thus have support for method overriding
  3. invokeinterface: very similar to invokevirtual, but used to invoke interface methods.
  4. invokespecial: used to invoke constructors, super class methods and private methods.
The following will give an example of each. I will first list the source code and then what it becomes when compiled.



And here is the byte code (modified to make it human readable).
<clinit> ()V {
    0 ldc #8 <Hello>                                      //push String constant "Hello"
    2 invokestatic #10 <com/geekyarticles/java/MethodInvocationDemo/method1(Ljava/lang/String;)V> //call method2()
    5 return
}

<init> ()V {
    0 aload_0                                             //push the this object in the stack
    1 invokespecial #17 <java/lang/Object/<init>()V>      //invoke super class constructor
    4 aload_0                                             //push the this object in the stack
    5 ldc #8 <Hello>                                      //push String constant "Hello"
    7 invokevirtual #19 <com/geekyarticles/java/MethodInvocationDemo/method2(Ljava/lang/String;)V> //call method2()
    10 return

}

method1 (Ljava/lang/String;)V {
    0 return
}

method2 (Ljava/lang/String;)V {
    0 return
}


First thing what are <clinit> and <init> method. <clinit> is a the static block of your class. Its a special method that encloses the code written in the static block. It also includes code for initalization of static field members if any. Note that the method name does not satisfy the java method name constraint. This is to make sure that it does not conflict with any method defined in the class. The <init> method is a constructor. Our class has only one constructor, and hence only one init method. Note that it invokes its super class constructor through the invokespecial instruction.

Now let us consider the invokestatic and invokevirtual instructions. Note in both cases, you need to specify the class in which the method is declared and the types of the arguments. The invokevirtual supports overriding at runtime. Which means, the actual method would be invoked based on the runtime type of the object that has been invoked. However, the method signature must match exactly (which would be the case in case of overriding). It means, the exact method (within all the overloaded methods) to be invoked is resolved at the compilation time. I will now discuss the steps the compiler must take to determine the exact method to bind to.

Determining the class or interface to search: This is the simplest part of the method binding. It simply is the declared type of the reference type on which the method has been invoked. If the method has been invoked without an object, the eclosing type of the method in which the method has been invoked is used. Static methods can also be called on the class iteself. In the byte code, the static methods are always invoked with invokestatic method and never an object receives the call.

From java 1.5, autoboxing and varargs were supported. To have full backward compatibity, method binding has been implemented in three different step. Only if a step fails, the next step is used. If after going through all the steps, the method still could not be bound, a compile time error occurs.

Identifying potentially applicable methods: Every step starts with finding the potentially applicable methods. A method is potentially applicable if
  • The name of the method matches with the one used in the invocation/
  • The method is accessible where it is used.
  • The arity (number of arguments) matches the number of arguments the method can accept. Let n be the number of arguments in the method definition. Then for non-vararg methods, the number of arguments in the invocation must be n. In case of vararg methods, the number of arguments must be greater than or equal to (n-1)
  • If the method invocation uses explicit type parameters and the member considered is a generic method, then the number of type parameters in the method definition matches the number of type parameters in the invocation

Once all the potentially appicable methods are discovered, they are used in the steps to filter out the ones applicable in a particular step.

Infering type arguments: In case of generic methods, the methods type arguments first need to be infered before any applicability test is done. Infering is necessary if type parameters are not passed explicitly. The process of infering the type parameters is too complex to fit in this article (I will discuss that in detail in a different article). For now, it is enough to know that the process of inference is solving inequalities with subtype-supertype constrains that are imposed by the actual arguments passed to the method during invocation. However, the following example will show what I mean. Note capture closure of the type parameters to the class happens before this step.



Phase 1: Identifying matching arity methods applicable by subtyping: In this step, normal subtyping check is done. In the earlier version, only this step existed. However, it has also been modified to handle generics.

This step is pretty simple.
  1. First all the applicable methods are found out
  2. If the method is generic, the type parameters are infered (if not explicitly specified)
  3. The actual parameters passed are checked for subtype relationship (or widening primitive conversion relationship) with the methods formal type parameters after type inference, if any
  4. If the method is generic, the infered types (or the explicitly specified types) are checked for their bounds, as specified in the method declaration
If all of the above conditions are met for at least one method, this step succeeds, so the following steps are not followed. If more than one method passes these tests, the most specific method is chosen. I will discuss later in this article how this is done. If no method can be called most specific, its a compilation error.

Phase 2: Identifying matching arity methods applicable by method invocation conversion: This is where the autoboxing comes into picture. If a method can pass the subtype test (or widening primitive conversion test) after boxing or unboxing, it is considered to be applicable. Other tests are just like phase 1.

Phase 3: Identifying applicable variable arity methods: This is the last step and only used if the above steps fail to find any matching methods (but not if no matching method can be unambiguously chosen as the most specific one). The matching condition is exactly as phase 2 except that if the method is vararg, a variable number of arguments is allowed.

Choosing the most specific method: Method m1 is more specific than m2 if and only if for argument ai of m1, and bi of m2, ai can be converted to bi by method invocation conversion which is either by superclass relationship or by widening primitive conversion. Please note that it has to be true for all values of i. It is quite possible that neither of m1 or m2 are more specific than the other. Following is an example of compile time error caused due to the fact that no single method can be unambiguously chosen as the most specific.



0 comments:

Post a Comment