Thursday, August 11, 2011

Manipulating Java Class Files with BCEL - Part One : Hello World!

What is BCEL: Apache BCEL or Byte Code Engineering Library is a library that enables simpler manipulation of java byte code. Now the question is, why manipulate byte code? There can be a million of reasons. For example, you might want to insert some profiling code in the class file. Or you might want to write your own language that compiles to java byte code. You can also provide some attractive extension to some framework you are creating. Or you can even be more creative than I am and do something that I cannot think of. But for that, you must first understand how java class files work.

Since it is a BCEL tutorial, get the BCEL library first from here


Manipulating java byte code directly is not trivial in nature, so I decided to break the tutorial into a series. This one is the first - the hello world. Keep in touch to learn more.

Structure of Java Class File: As with any file format, java class file also has a format. The class file format is as shown below. It gives the overall structure of the whole class. This is picked up directly from the JVM specification.


    ClassFile {
        u4 magic;
        u2 minor_version;
        u2 major_version;
        u2 constant_pool_count;
        cp_info constant_pool[constant_pool_count-1];
        u2 access_flags;
        u2 this_class;
        u2 super_class;
        u2 interfaces_count;
        u2 interfaces[interfaces_count];
        u2 fields_count;
        field_info fields[fields_count];
        u2 methods_count;
        method_info methods[methods_count];
        u2 attributes_count;
        attribute_info attributes[attributes_count];
    }


The first question now would be what are u2, u4 etc. u2 is an unsigned integer of 2 bytes, and of course u4 is an unsigned integer of 4 bytes. cp_info, field_info etc. are complex structures of variable length. I will cover their details in the following.

magic: Magic is a four byte integer with a fixed value. It distinguishes other kinds of files from class files. Its value is always 0xCAFEBABE.

minor_version, major_version: pair of two byte integers giving the version information. The minor and major version must fall in a range for the JVM to load the class.

constant_pool: Is a store for all the constants used in the class. The constant pool stores every constant including the class name, method name, field name, super class name, class references etc. It also stores all the constants used in the class as literals. This is in most cases the biggest section in the class file.

access_flags: These are all the access flags of the class. The access flag field is a two byte integer with specific bits assigned as specific flags representing whether the class is public, whether the class is final, whether it is an interface etc.

this_class: Information about this class.

super_class: Information about this class.

interfaces: List of all interfaces the class implements.

fields: List of all fields in this class.

methods: List of all methods in this class.

attributes: Stores whether the class is deprecated and if the class has a source file.

That's it. Now we will look into a little detail about how the constant pool, fields and methods are stored. We would not have to concentrate on every detail of the bit patterns used to store these, as BCEL will take care of them. So, only an overall understanding will suffice.

But, before we dive in, let us look a very simple program written using BCEL to get a description of the class. I will also use the output of this program to clarify the understanding. I have only concentrated on constant pool, fields and methods. This program shows how easily we can access the details of the class with BCEL.



Constant Pool Revisited: Let us now check the constant pool. There are many types of constants in the constant pool, some are class, some are string, some are UTF8 character sequence etc. Constants point one another for additional information. For example, a class constant points to a UTF8 character sequence for its name. For example, in this case, constant number 1 is a class constant which has a name reference to 2. Hence the name of the class is constant number 2 which is "com/geekyarticles/bcel/DisplayDetails". Note that the package separator here is '/'.

The field's description is pretty simple, no explanation needed. Note that the name of the field is actually present in constant pool at number 5. However, BCEL is differencing this location and showing the name in place.

Methods: As seen in the example, the class here has two methods. We know about the main method, but what is <init> ? It is the name of the construtor in the class file. The constructor's name in the class file is always this. Note that it is not a valid java identifier. This ensures that it is not possible to define a method in the class that conflicts with the constructor's name.

At the end, we have printed the code of each method. The code surely looks scary, but don't be paniced, you are not supposed to understand them yet. I will cover them slowly. At this stage, see that some attributes are also printed for each method's code. The attributes are line numbers, and local variable scope maps. We will deal with these attributes later.

In the class file structure, the code of the method is also just another attribute, but BCEL shows it separately.


JVM Code execution: Unlike most real machines, which are register based systems, JVM is a stack based system. What does it mean? Well for example, in an i386 processor, adding 2 and 3 would look something like (depending on the assembler, the terminologies may differ)
  • mov ax 2
  • mov bx 3
  • add
This will put the sum in the ax register. In java however, instead of registers, there is an operand stack. Hence the same thing would be like the following,
  • ldc 5
  • ldc 7
  • iadd
where 5 and 7 would be (lets assume) the constant number in the constant pool having values 2 and 3. The ldc command pushes a constant into the operand stack. Here two constants are first pushed into the operand stack. The iadd operation pops two integer values from the operand stack, adds them and then pushes the result back into the operand stack. The following program will be a hello world program demonstrating the use of BCEL in creating a class.

What we are trying to create: The following program shows what we are trying to create the byte code for. It means, when the following program compiles, it would create a byte code that would be very close to which we would create through BCEL.



Yes, as simple as it could be. To do this, we must first get a reference to the static field out in the class java.lang.System. Then we need to call the method println on it passing the argument.

Getting reference to the out field is just one instruction.
  • getstatic[constant number for field reference]
Once the reference is obtained, as usual, it is pushed into the operand stack. Now we must also push the argument to the method. This a simple ldc, as we have a constant as the argument.
  • ldc [constant number for the string argument]
Now we are ready to call the method. A non-static method in a class is always invoked through an invokevirtual instruction. Hence the following will do.
  • invokevirtual [constant number for method reference]
At the end, the method must return. The return is must, even if the return type is void, which is the case here.
  • return
If the return type is not void, the operand stack must contain a value, which is returned.

Now that we understand the basics, we will accumulate these into code in GenerateClass.java



Note here that we have added a constant in the constant pool whenever we have needed one. BCEL automatically ensures that the same constant is not added twice. The ConstantPool.add() method returns the index of the constant in the constant pool, which can be directly used in the instructions.

Descriptor: Java class files have very specific format for describing the type of a field or a method. During class file manipulation, whenever the type of a field or method needs to be provided, it needs to be in descriptor format.

Field Descriptor: Field descriptor is field's type. In class file, the signture is as follows:
SignatureJava Type
Bbyte
Cchar
Ddouble
Ffloat
Iint
Jlong
Sshort
Zboolean
[array. Hence [B is a byte array, [Ljava/lang/Object; is an object array
L<classname>;object of class <classname>. The package name is slash separated.


Method Descriptor: Method descriptor is written in the form (parameterList)returnType. Hence, a method String myMethod(int x, Object [] y) has the descriptor (I[Ljava/lang/Object;)I. A void return type is represented by V.



after running this programm, a class file should be generated. Now you should be able to do java com.geekyarticles.bcel.SyntheticClass and it should say You are a real geek!. Now that must be true, if the computer says so!

Here I will end the part one of this tutorial. More to come next.

1 comments:

Anonymous said...

Hey, very useful article. I have a question on when to do this, if I have to add new field in java class. is it during application build or application startup? I have web application developed using springs.

Post a Comment