JavaParty Logo

JavaParty

A distributed companion to Java
Current release 1.9.5

Bernhard Haumacher, Thomas Moschny and Michael Philippsen

Contents
General
 Features
 Requirements
 DownloadNew!
 Mailing ListsNew!
 Setup
 Quick Tour
 Command Reference
 API
Language
 Syntax
 Object Model
 Transparent Threads
 Distributed ThreadsCool!
 RMACool!
 Synchronization
 Object Location
 Migration
 Remote Threads
 Replicated ObjectsCool!
 Multi-Application
Tuning
 Debugging
displayed pageuka.transport
 KaRMICool!
 KaRMI API
 Myrinet/GM
 OpenPBS
Examples
 Hello JavaParty
 ObjectModel
 BenchmarksNew!
Other info
 Papers
 Trouble Shooting
 History


See also
 CJ
 Generic Java


Powered by
Apache Ant
SourceForge
Subversion

Fast Object Serialization uka.transport

Introduction

In the Java API an object serialization mechanism is provided that is able to transparently convert object structures into a linear byte sequence representation and vice versa. For sending objects across the network, this representation can be used. To make his classes serializable, the Java programmer only needs to declare his classes to implement the empty tagging interface java.io.Serializable.

public class Param implements java.io.Serializable {
   int    intValue;
   float  floatValue;
   Object objectReference;
}
A simple serializable class

Although transparency of object serialization is one of Java's beauties, its runtime costs are phenomenal. Since RMI uses Java's built-in object serialization mechanism, usually one third of the overall invocation time of a remote method is spent in marshaling and unmarshaling method arguments. On current hardware, a millisecond is easy to spend in serialization alone. For details please refer to the section "JavaParty papers".

// Param p, InputStream in, OutputStream out
new ObjectOutputStream(out).writeObject(p);
Param q = (Param) new ObjectInputStream(in).readObject();
Writing/reading an object into/from a stream

For high performance distributed computing inefficient serialization cannot be tolerated. Therefore, we've looked into the details of the official serialization, which is mostly implemented in Java itself, and derived an optimized replacement for it. In some benchmarks, we could eliminate 81-97% of the overhead caused by the regular Java serialization. For details please refer to the section "JavaParty papers".

Object Serialization vs. Fast Marshaling

The regular Java serialization is justifiable for persistent object storage, because it guarantees that objects can be reconstructed from their byte sequence representation even with future Java releases, and even with modified application classes. Our optimized serialization does not make such guarantees. Instead, it is designed for exchanging objects between running VMs that share the same class base.

Look and Feel

From the programmer's point of view our optimized serialization is slightly more difficult to use (you know, there is no free lunch.)

The main difference is that instead of java.io.Serializable a newly introduced interface uka.transport.Transportable has to be implemented. In contrast to java.io.Serializable, the interface uka.transport.Transportable is not empty:

package uka.transport;

public interface Transportable
    extends java.io.Serializable, Cloneable 
{
    // *** object marshaling ***
    public void marshal(MarshalStream s) 
      throws java.io.IOException;

    // *** object unmarshaling ***
    // public Transportable(UnmarshalStream s) 
    //   throws java.io.IOException;

    public void unmarshalReferences(UnmarshalStream s) 
      throws java.io.IOException, ClassNotFoundException;

    // *** local object copy ***
    public Object deepClone(DeepClone _helper) 
      throws CloneNotSupportedException;
}
The interface uka.transport.Transportable

Each class whose objects should be marshaled efficiently must provide an implementation for the methods declared in the Transportable interface. Subclasses of a "transportable" class must override these methods, if they add fields to the base class. If a subclass does not add additional fields, it nevertheless must provide a constructor related to marshaling. For details see below.

Usage of the efficient serialization with transportable objects is almost identical to the regular Java serialization. Just replace java.io.OutputStream with uka.transport.MarshalStream and java.io.InputStream with uka.transport.UnmarshalStream. In the following example, please assume that class TParam implements interface uka.transport.Transportable.

// TParam p, InputStream in, OutputStream out
new MarshalStream(out).writeObject(p);
TParam q = (TParam) new UnmarshalStream(in).readObject();
Writing/reading a transportable object

The following sections will explain how to efficiently implement the methods required by the interface uka.transport.Transportable. If you feel that it is unreasonable to write marshaling code for your classes, please have a look at the bottom of this page: Such marshaling code can be generated automatically.

Declaring Transportable Class

Subsequently, we will assume class TParam to contain three fields. Two of them are of basic type, and is of reference type. TParam should be declared transportable:

public class TParam implements Transportable {
   int    intValue;
   float  floatValue;
   Object objectReference;

   ...
}
Class TParam

For switching over from regular Java serialization to efficient marshaling, you have to do the following:

  1. Declare the class public.

    The uka.transport serialization is implemented as a pure Java API without special access privileges. Therefore access to a transportable class has to be granted, because the serialization mechanism needs to allocate objects of that class.

  2. Replace java.io.Serializable by uka.transport.Transportable in the interface list of the class.

Implementing marshaling methods

Marshaling a transportable object is done by the marshal stream calling the object's marshal(MarshalStream) method. A reference to the marshal stream is provided as argument. The object itself is responsible for writing its fields to the stream.

Since class TParam contains fields of basic type and reference type, the marshaling process is split into two phases:

Values of basic type. Values of basic type are marshaled efficiently by writing them in blocks of known size. The size in bytes of the data block written to the MarshalStream has to be announced in advance. Marshaling of basic typed fields must adhere to the following procedure:

  1. Announce the size of the data block to be written.

    Method reserve(int) of MarshalStream is called with the size in bytes as argument. The number of bytes marshaled in one chunk may not exceed TransportConstants.REQUEST_MAX bytes.

  2. Look up the stream buffer and the current position within the buffer.

  3. Insert the field values into the byte array buffer.

    Class BasicIO provides a variety of static methods to insert values of basic type into an array of bytes. It also manages the actual position within the buffer.

  4. Commit the write.

    Method deliver(int) commits the write of the data to the stream. The value of the parameter must match the value passed to method reserve() in the first step.

Values of reference type. After all basic typed values of a class have been written, references to other objects are marshaled. For each reference method writeObject(Object) of the MarshalStream is called.

To reuse the serialization code in subclasses, it is recommended to split it into three methods:

  • marshal(MarshalStream) is declared in interface Transportable and is called directly by the marshal stream. This method is responsible for announcing the correct size of primitive typed fields and calls the other two methods.

  • marshalPrimitives(byte[], int) is used to insert all fields of basic type into the byte array buffer at the current position.

  • marshalReferences(MarshalStream) marshals all fields of reference type to the stream.

The following example shows marshaling methods and declarations that enable correct marshaling of objects of class TParam.

protected static final int _SIZE = 
    uka.transport.BasicIO.SIZEOF_float + 
    uka.transport.BasicIO.SIZEOF_int;

public void marshal(uka.transport.MarshalStream _stream)
    throws java.io.IOException
{
    _stream.reserve(_SIZE);
    byte[] _buffer = _stream.getBuffer();
    int    _pos    = _stream.getPosition();
    marshalPrimitives(_buffer, _pos);
    _stream.deliver(_SIZE);
    marshalReferences(_stream);
}

protected void marshalPrimitives(byte[] _buffer, int _pos)
    throws java.io.IOException
{
    _pos = uka.transport.BasicIO.insert(_buffer, _pos, floatValue);
    _pos = uka.transport.BasicIO.insert(_buffer, _pos, intValue);
}

protected void marshalReferences(uka.transport.MarshalStream _stream)
    throws java.io.IOException
{
    _stream.writeObject(objectReference);
}
Marshaling: methods and declarations for class TParam

Implementing unmarshaling methods

Due to the fact that the uka.transport serialization is written in pure Java, it can not allocate uninitialized objects. When declaring a class to be transportable, a special constructor has to be provided that allocates a new object out of an UnmarshalStream.

Similar to marshaling, unmarshaling is also done in two phases:

  • A special constructor is called that is responsible for initializing all basic typed fields of the object with values unmarshaled from the stream. This constructor must have the signature <init>(UnmarshalStream).

    There is no way in Java to enforce a class implementing a certain interface to also provide a particular constructor. This rule can only be established as programming convention and is not enforced by the compiler.

  • Method unmarshalReferences(UnmarshalStream) is called after initialization via the constructor described above. It is responsible for unmarshaling the values of reference type from the stream and storing them to the corresponding fields.

    Unmarshaling references has to be done after construction of the object is complete. This will enable cyclic references to be resolved and stored into reference fields of the object currently being unmarshaled.

For easy reuse of unmarshaling code in subclass, it is useful to spit the unmarshaling of basic types into two different constructors one calling the other. Please refer to the example below:

public TParam(uka.transport.UnmarshalStream _stream)
    throws java.io.IOException, ClassNotFoundException
{
    this(_stream, _SIZE);
    _stream.accept(_SIZE);
}

protected TParam(uka.transport.UnmarshalStream  _stream, int _size)
    throws java.io.IOException, ClassNotFoundException
{
    _stream.request(_size); 
    byte[] _buffer = _stream.getBuffer();
    int    _pos    = _stream.getPosition();
    floatValue = uka.transport.BasicIO.extractFloat(_buffer, _pos);
    _pos += uka.transport.BasicIO.SIZEOF_float;
    intValue = uka.transport.BasicIO.extractInt(_buffer, _pos);
    _pos += uka.transport.BasicIO.SIZEOF_int;
}

public void unmarshalReferences(uka.transport.UnmarshalStream _stream)
    throws java.io.IOException, ClassNotFoundException
{
    objectReference = (java.lang.Object) _stream.readObject();
}
Unmarshaling: methods and declarations for class TParam

Implementing methods for deep object cloning

In a transparent local invocation on a remote object it is necessary to create deep clones of the objects passed as arguments. A deep clone of an object is identical to the result of a consecutive marshal/unmarshal operation on that object. Since it is extremely inefficient deeply clone an object by marshaling it into a memory buffer and reading it back, the uka.transport serialization also provides a facility for fast deep object cloning.

The following example shows how to implement the deep cloning part of interface Transportable. Please note that the methods shown below don't have to care about fields of basic type. Because they are based on the cloning mechanism provided by the regular Java API, these fields are already initialized correctly:

public final Object deepClone(uka.transport.DeepClone _helper)
    throws CloneNotSupportedException
{
    Object _copy = clone();
    _helper.add(this, _copy);
    ((TParam) _copy).deepCloneReferences(_helper);
    return _copy;
}

protected void deepCloneReferences(uka.transport.DeepClone _helper)
    throws CloneNotSupportedException
{
    this.objectReference = 
       (java.lang.Object) _helper.doDeepClone(this.objectReference);
}
Deep clone: methods and declarations for class TParam

Automatic generation of marshaling methods

Since it is an extremely boring task to write marshal and unmarshal methods, there is a tool that enables automatic generation of this code.

Assume you would like to enhance class Param from the beginning with fast marshaling. After you have compiled it to a class file, you can invoke the marshaling code generator with the following command:

javaparty retro TParam

This produces a file called TParam.h in the current directory. This file contains all necessary declarations to make the class transportable.

  • Insert all declarations from file TParam.h into the original source code of class TParam.

  • Declare class TParam public.

  • Replace java.io.Serializable by uka.transport.Transportable in the implements clause. Since Transportable extends Serializable you do not need both declarations for use of either the regular serialization or the fast marshaling.

For a quick start on SuSE Linux 8.0 with tcsh, you can copy and paste the following commands to your shell to see the transport generator at work:

mkdir transport
cd transport
cat > Param.java <<EOF
public class Param implements java.io.Serializable {
   int    intValue;
   float  floatValue;
   Object objectReference;
}
EOF
mkdir classes
javac -d classes Param.java
javaparty -cp classes retro Param
ls -las

On other unix systems, the above should work as well. On Windows, there is no convenient wrapper invocation script 'javaparty', so you have to invoke the generator by hand and care about the location of the JavaParty related jar files:

java -classpath jpc.jar;jp.jar;karmi.jar;%CLASSPATH%
      gjc.v6.RetroTransport TParam

Anyway, you should get output similar to the following:

drwxr-xr-x    3 user    group          107 May  6 15:28 ./
drwxr-xr-x    3 user    group           60 May  6 15:28 ../
-rw-r--r--    1 user    group         2381 May  6 15:28 Param.h
-rw-r--r--    1 user    group          124 May  6 15:28 Param.java
drwxr-xr-x    2 user    group           62 May  6 15:28 classes/

The file Param.java contains the original class, and the Param.h contains the additional methods, that must be inserted into Param.java to get the fast marshaling functionality. Do not forget to also change the implements declaration and the visibility modifier of the class as denoted above.

Compatibility

uka.transport is 100% compatible with the regular Java serialization mechanism. This means that you can marshal all objects that are serializable by means of regular Java object serialization. This is achieved by a fall-back mechanism that uses the regular Java object serialization if an object is encountered during marshaling that is not declared transportable.

Note: Due to the large overhead of regular object serialization, you will loose all performance benefits, if the fall-back mechanism is triggered.

Tips and tricks

Since cycle detection for references is time consuming, you may experience performance problems when multidimensional arrays are marshaled. Multidimensional arrays are implemented in Java as arrays of references to other arrays.

Consider the following piece of code that is legal Java:

int array = new int[] {1, 2, 3, 4};
int[][] marray = new int[][] {array, array};
Multidimensional array with aliasing

The multidimensional array marray consists of two references to the same one-dimensional array array. If marray is marshaled, it is guaranteed that an array with the same structure is read at the other end.

Especially for large arrays you may want to save the overhead for exactly analyzing the alias structure of arrays during marshaling. You can prevent this overhead by replacing the calls to writeObject() in your marshalReferences() method with calls to specialized methods in the class ValueIO:

protected void marshalReferences(MarshalStream _stream)
    throws java.io.IOException
{
    ValueIO.writeValue(_stream, marray);
}
Marshaling multidimensional arrays as values


For comments and bug reports please use the JavaParty users mailing list.
Page design & maintenance: Bernhard Haumacher.
Last update: Fri Mar 30 18:46:00 GMT+01:00 2007
Java is a trademark of Sun Microsystems.