Saturday, July 30, 2011

An Anatomy of Java Generics

1. What is Generics: Generics is a feature in java language to create parameterized types. A parameterized type is a type with parameters passed to it, mostly to enable compiler to check for errors which otherwise would have been a runtime exception.

For example, let us consider programs below. In the first one, generics is not used. We are using java collection API to store and retrieve objects. In this program, it is impossible for the compiler to know what actually is stored in the collection. It is the programmer's job to take care that the objects stored are of expected type and that they are not cast to a wrong type in the time of retrieval. Not only this takes a lot of nasty boiler plate code, it also obfuscates the purpose of the collection all together. Generics attempts to solve this by passing parameters to type to specify what the types are related to. As we will see.

In, we are trying to cast the object of String to an Integer. However, since the return type of the get method is Object, the compiler has no way to know that there is a problem. At runtime, a ClassCastException is thrown.

Now let us look at In this case, we are telling the compiler that we are creating a List of String objects, not just any object. That way, not only can the compiler already tell that the get method will return only a String, it also raises an error, in case we try to add an Interger to the List. In the rest of the article, we will discuss the various semantics and constrains of java generics.

This tutorial is designed for people with some exposure to java generics. It would be helpful to know a little bit about java generics. However, even if you do not, you can catch up with a little more effort.

2. Generic Class Declarations: The following code gives an example of a generic class declaration.

Here E and T are called type variables. They represent some type. There is also a restriction on the relation between E and T, the T is a subtype of the type E, whatever it is. Like normal local variables, type variables cannot be used before they are declared. For example the following would be an error.

public class GenericClass<T extends E, E> { ... } //Error

It is possible to set a constrain to extend from more than one classes/interfaces by separating them with an & symbol, as shown in the following

public class GenericClass<E, T extends List<E>& Comparable<E> > { ... }

If there are more than one classes (and not interface) in the & separated list, they all must be in superclass-subclass relationship. Otherwise, it will be an error.

3. Wildcard Type Parameters: Wildcard type parameters are introduced to support constrains not so strict. The following shows an example

Wildcards are applicable in references, method definitions and in method invocation. Note that they are not allowed while creating an object or while creating a class.

3.1. Bounds: If the wildcard is just a ?, its an unbounded wildcard. It means its upper bound is Object and lower bound is null type. If wildcard is of the form ? extends T, it has an upper bound T. On the other hand, if the wildcard is of type ? super T, it has a lower bound.

3.2. Type Argument Containment and Equivalence: A type argument is said to contain another, if the set of types represented by the first one is the super set (not to be confused with super type) of the set of types represented by the second.

  1. ? extends T contains ? extends S if and only if T is a sub type of S
  2. ? super T contains ? super S if and only if T is a super type of S
  3. T itself contains T
  4. ? extends T contains T
  5. ? super T contains T

3.3. Capture Conversion: A non static method in a generic class can use its type parameters to form its own parameter list. In that, it can also use wildcards. The reference type of the class can also pass wildcard in the type arguments. To cater that, the compiler creates unnamed imaginary types called captures [that's why the errors like the capture#2 of ? extends E blah blah come up]. A capture is created by accumulating all the constrains including the wildcard types passed during reference creation and the wildcards used in the method declaration. If the arguments of the method follow all the constrains thus obtained, the method is called.

Note that if after capture conversion, any parameter in the method is a wildcard type without a lower bound and the method's argument itself is not a wildcard without a lower bound, there is no way the method can be called. This is because the method must accept only those arguments which are subtypes of the parameter list. However, if the wildcard type has no lower bound, there is no way the compiler can confirm that. However, if the method itself declares wildcards without lower bounds, there is no problem. Because it will be implemented to cater that. The following code shows why.

4. Generic Methods: Generic methods are methods which declare their own type variables. During method invocation, those variables are substituted with real types. The types to substitute with may be passed explicitly during method invocation or they can be inferred implicitly. In both the cases, the type arguments must pass the consistency checks.

4.1. Generic Method Definition: To define a generic method, you need to declare the type parameters before the return type declaration in angle brackets, as shown in the following example.

4.1.1. Implicit Type Inference: In the above example, the type is implicitly inferred. The types are inferred with the following steps:

  1. First a set of initial constrains are formed (after capture conversion)
  2. If the parameters passed do not contain any wildcards, the intial constrains are the solution. If in this case the initial constrains are not satisfied, its a compile time error.
  3. If the initial constrains contain wildcards, the reflexive relations thus formed are solved for. If multiple solutions exist, the most specific one is accepted.

4.1.2. Explicit Type: Types can be explicitly passed as shown in the example.

It is not possible to pass explicit type arguments without specifying the class (in case of static method)/object (in case of non-static method) the method is called on.

5. Type Erasure: During compilation, the information about parameterized types are erased and the type is flattened to a non-generic type. Note that the parameterized types are instances of generic types when they are passed with parameters. Generic types retain their type variable declarations so that they can be instantiated as parameterized types during compilation process, when they are referred from binary packages. The erasure happens in the following way -

  1. Parameterized types are stripped to raw types
  2. Type variables are replaced by their leftmost bounds. (we are not talking about their declarations, only use)

5.1 Reifiable Types: Reifiable types are the types that do not change after type erasure. Only the below mentioned types are reifiable.

  1. Instances of non-generic type declaration
  2. Raw types
  3. Primitves
  4. Parameterized types with only unbounded wildcards as parameters
  5. Arrays of the reifiable types

6. Backward Compatibility: For backward compatibility, it is possible to assign raw type instances to parameterized references or assign instances of parameterized types to raw type references. Same is applicable for method argument passing. This is called unchecked conversion. This causes a problem called heap pollusion.

6.1 Heap Pollusion: Due to unchecked conversion, it is possible to have wrong type of objects in the generic types potentially causing a runtime exception. This is why, the compiler raises a warning for unchecked operation. The following code shows an example.

7. Java 7 Enhancement: From java 7, during instantiation of objects of parameterized types, diamond operator can be used for type inference. I already explained it here (see diamond operator).


Debasish Ray Chawdhuri said...

Did a correction in the explanation to method argument wildcards.

Post a Comment