- Introduction
In the Java API an object serialization mechanism is provided
that is able to transparently convert object structures into a
linear byte sequence representation and vice versa. For sending
objects across the network, this representation can be used. To
make his classes serializable, the Java programmer only needs to
declare his classes to implement the empty tagging interface
java.io.Serializable.
public class Param implements java.io.Serializable {
int intValue;
float floatValue;
Object objectReference;
}
|
|
A simple serializable class
|
Although transparency of object serialization is one of Java's
beauties, its runtime costs are phenomenal. Since RMI uses Java's
built-in object serialization mechanism, usually one third of the
overall invocation time of a remote method is spent in marshaling
and unmarshaling method arguments. On current hardware, a
millisecond is easy to spend in serialization alone. For details
please refer to the section "JavaParty papers".
// Param p, InputStream in, OutputStream out
new ObjectOutputStream(out).writeObject(p);
Param q = (Param) new ObjectInputStream(in).readObject();
|
|
Writing/reading an object into/from a stream
|
For high performance distributed computing inefficient
serialization cannot be tolerated. Therefore, we've looked into
the details of the official serialization, which is mostly
implemented in Java itself, and derived an optimized replacement
for it. In some benchmarks, we could eliminate 81-97% of the
overhead caused by the regular Java serialization. For details
please refer to the section "JavaParty papers".
- Object Serialization vs. Fast Marshaling
The regular Java serialization is justifiable for persistent
object storage, because it guarantees that objects can be
reconstructed from their byte sequence representation even with
future Java releases, and even with modified application
classes. Our optimized serialization does not make such
guarantees. Instead, it is designed for exchanging objects between
running VMs that share the same class base.
- Look and Feel
From the programmer's point of view our optimized serialization
is slightly more difficult to use (you know, there is no free
lunch.)
The main difference is that instead of java.io.Serializable
a newly introduced interface uka.transport.Transportable
has to be implemented. In contrast to
java.io.Serializable, the interface
uka.transport.Transportable is not empty:
package uka.transport;
public interface Transportable
extends java.io.Serializable, Cloneable
{
// *** object marshaling ***
public void marshal(MarshalStream s)
throws java.io.IOException;
// *** object unmarshaling ***
// public Transportable(UnmarshalStream s)
// throws java.io.IOException;
public void unmarshalReferences(UnmarshalStream s)
throws java.io.IOException, ClassNotFoundException;
// *** local object copy ***
public Object deepClone(DeepClone _helper)
throws CloneNotSupportedException;
}
|
|
The interface uka.transport.Transportable
|
Each class whose objects should be marshaled efficiently must
provide an implementation for the methods declared in the
Transportable interface. Subclasses of a "transportable"
class must override these methods, if they add fields to the base
class. If a subclass does not add additional fields, it
nevertheless must provide a constructor related to marshaling. For
details see below.
Usage of the efficient serialization with transportable objects
is almost identical to the regular Java serialization. Just
replace java.io.OutputStream with
uka.transport.MarshalStream and
java.io.InputStream with
uka.transport.UnmarshalStream. In the following example,
please assume that class TParam implements interface
uka.transport.Transportable.
// TParam p, InputStream in, OutputStream out
new MarshalStream(out).writeObject(p);
TParam q = (TParam) new UnmarshalStream(in).readObject();
|
|
Writing/reading a transportable object
|
The following sections will explain how to efficiently
implement the methods required by the interface
uka.transport.Transportable. If you feel that it is
unreasonable to write marshaling code for your classes, please
have a look at the bottom of this page: Such marshaling code can
be generated automatically.
- Declaring Transportable Class
Subsequently, we will assume class TParam to contain
three fields. Two of them are of basic type, and is of reference
type. TParam should be declared transportable:
public class TParam implements Transportable {
int intValue;
float floatValue;
Object objectReference;
...
}
|
|
Class TParam
|
For switching over from regular Java serialization to
efficient marshaling, you have to do the following:
Declare the class public.
The uka.transport serialization is implemented as
a pure Java API without special access privileges. Therefore
access to a transportable class has to be granted, because the
serialization mechanism needs to allocate objects of that
class.
- Replace java.io.Serializable by
uka.transport.Transportable in the interface
list of the class.
- Implementing marshaling methods
Marshaling a transportable object is done by the marshal stream
calling the object's marshal(MarshalStream) method. A
reference to the marshal stream is provided as argument. The
object itself is responsible for writing its fields to the
stream.
Since class TParam contains fields of basic type
and reference type, the marshaling process is split into two
phases:
Values of basic type. Values of basic type are marshaled
efficiently by writing them in blocks of known size. The size in
bytes of the data block written to the MarshalStream has
to be announced in advance. Marshaling of basic typed fields must
adhere to the following procedure:
Announce the size of the data block to be written.
Method reserve(int) of MarshalStream is
called with the size in bytes as argument. The number of bytes
marshaled in one chunk may not exceed
TransportConstants.REQUEST_MAX bytes.
Look up the stream buffer and the current position within
the buffer.
Insert the field values into the byte array buffer.
Class BasicIO provides a variety of static methods
to insert values of basic type into an array of bytes. It also
manages the actual position within the buffer.
Commit the write.
Method deliver(int) commits the write of the data
to the stream. The value of the parameter must match the value
passed to method reserve() in the first step.
Values of reference type. After all basic typed values
of a class have been written, references to other objects are
marshaled. For each reference method writeObject(Object)
of the MarshalStream is called.
To reuse the serialization code in subclasses, it is
recommended to split it into three methods:
marshal(MarshalStream) is declared in
interface Transportable and is called directly by the
marshal stream. This method is responsible for announcing the
correct size of primitive typed fields and calls the other two
methods.
marshalPrimitives(byte[], int) is used to insert
all fields of basic type into the byte array buffer at the
current position.
marshalReferences(MarshalStream) marshals all
fields of reference type to the stream.
The following example shows marshaling methods and declarations
that enable correct marshaling of objects of class
TParam.
protected static final int _SIZE =
uka.transport.BasicIO.SIZEOF_float +
uka.transport.BasicIO.SIZEOF_int;
public void marshal(uka.transport.MarshalStream _stream)
throws java.io.IOException
{
_stream.reserve(_SIZE);
byte[] _buffer = _stream.getBuffer();
int _pos = _stream.getPosition();
marshalPrimitives(_buffer, _pos);
_stream.deliver(_SIZE);
marshalReferences(_stream);
}
protected void marshalPrimitives(byte[] _buffer, int _pos)
throws java.io.IOException
{
_pos = uka.transport.BasicIO.insert(_buffer, _pos, floatValue);
_pos = uka.transport.BasicIO.insert(_buffer, _pos, intValue);
}
protected void marshalReferences(uka.transport.MarshalStream _stream)
throws java.io.IOException
{
_stream.writeObject(objectReference);
}
|
|
Marshaling: methods and declarations for class TParam
|
- Implementing unmarshaling methods
Due to the fact that the uka.transport serialization
is written in pure Java, it can not allocate uninitialized
objects. When declaring a class to be transportable, a special
constructor has to be provided that allocates a new object out of
an UnmarshalStream.
Similar to marshaling, unmarshaling is also done in two phases:
A special constructor is called that is responsible for
initializing all basic typed fields of the object with values
unmarshaled from the stream. This constructor must have the
signature <init>(UnmarshalStream).
There is no way in Java to enforce a class implementing a
certain interface to also provide a particular
constructor. This rule can only be established as programming
convention and is not enforced by the compiler.
Method unmarshalReferences(UnmarshalStream) is
called after initialization via the constructor described
above. It is responsible for unmarshaling the values of
reference type from the stream and storing them to the
corresponding fields.
Unmarshaling references has to be done after construction
of the object is complete. This will enable cyclic references
to be resolved and stored into reference fields of the object
currently being unmarshaled.
For easy reuse of unmarshaling code in subclass, it is useful
to spit the unmarshaling of basic types into two different
constructors one calling the other. Please refer to the example
below:
public TParam(uka.transport.UnmarshalStream _stream)
throws java.io.IOException, ClassNotFoundException
{
this(_stream, _SIZE);
_stream.accept(_SIZE);
}
protected TParam(uka.transport.UnmarshalStream _stream, int _size)
throws java.io.IOException, ClassNotFoundException
{
_stream.request(_size);
byte[] _buffer = _stream.getBuffer();
int _pos = _stream.getPosition();
floatValue = uka.transport.BasicIO.extractFloat(_buffer, _pos);
_pos += uka.transport.BasicIO.SIZEOF_float;
intValue = uka.transport.BasicIO.extractInt(_buffer, _pos);
_pos += uka.transport.BasicIO.SIZEOF_int;
}
public void unmarshalReferences(uka.transport.UnmarshalStream _stream)
throws java.io.IOException, ClassNotFoundException
{
objectReference = (java.lang.Object) _stream.readObject();
}
|
|
Unmarshaling: methods and declarations for class TParam
|
- Implementing methods for deep object cloning
In a transparent local invocation on a remote object it is
necessary to create deep clones of the objects passed as
arguments. A deep clone of an object is identical to the result
of a consecutive marshal/unmarshal operation on that object.
Since it is extremely inefficient deeply clone an object by
marshaling it into a memory buffer and reading it back, the
uka.transport serialization also provides a facility for
fast deep object cloning.
The following example shows how to implement the deep cloning
part of interface Transportable. Please note that the
methods shown below don't have to care about fields of basic
type. Because they are based on the cloning mechanism provided by
the regular Java API, these fields are already initialized
correctly:
public final Object deepClone(uka.transport.DeepClone _helper)
throws CloneNotSupportedException
{
Object _copy = clone();
_helper.add(this, _copy);
((TParam) _copy).deepCloneReferences(_helper);
return _copy;
}
protected void deepCloneReferences(uka.transport.DeepClone _helper)
throws CloneNotSupportedException
{
this.objectReference =
(java.lang.Object) _helper.doDeepClone(this.objectReference);
}
|
|
Deep clone: methods and declarations for class TParam
|
- Automatic generation of marshaling methods
Since it is an extremely boring task to write marshal and
unmarshal methods, there is a tool that enables automatic
generation of this code.
Assume you would like to enhance class Param from the
beginning with fast marshaling. After you have compiled it to a
class file, you can invoke the marshaling code generator with the
following command:
javaparty retro TParam
This produces a file called TParam.h in the
current directory. This file contains all necessary declarations
to make the class transportable.
Insert all declarations from file
TParam.h into the original source code of
class TParam.
Declare class TParam public.
Replace java.io.Serializable by
uka.transport.Transportable in the implements
clause. Since Transportable extends
Serializable you do not need both declarations for
use of either the regular serialization or the fast
marshaling.
For a quick start on SuSE Linux 8.0 with tcsh, you can copy and
paste the following commands to your shell to see the transport
generator at work:
mkdir transport
cd transport
cat > Param.java <<EOF
public class Param implements java.io.Serializable {
int intValue;
float floatValue;
Object objectReference;
}
EOF
mkdir classes
javac -d classes Param.java
javaparty -cp classes retro Param
ls -las
On other unix systems, the above should work as well. On
Windows, there is no convenient wrapper invocation script
'javaparty', so you have to invoke the generator by
hand and care about the location of the JavaParty related jar
files:
java -classpath jpc.jar;jp.jar;karmi.jar;%CLASSPATH%
gjc.v6.RetroTransport TParam
Anyway, you should get output similar to the following:
drwxr-xr-x 3 user group 107 May 6 15:28 ./
drwxr-xr-x 3 user group 60 May 6 15:28 ../
-rw-r--r-- 1 user group 2381 May 6 15:28 Param.h
-rw-r--r-- 1 user group 124 May 6 15:28 Param.java
drwxr-xr-x 2 user group 62 May 6 15:28 classes/
The file Param.java contains the original class, and
the Param.h contains the additional methods, that must be
inserted into Param.java to get the fast marshaling
functionality. Do not forget to also change the implements
declaration and the visibility modifier of the class as denoted
above.
- Compatibility
uka.transport is 100% compatible with the regular Java
serialization mechanism. This means that you can marshal all
objects that are serializable by means of regular Java object
serialization. This is achieved by a fall-back mechanism that uses
the regular Java object serialization if an object is encountered
during marshaling that is not declared transportable.
Note: Due to the large overhead of regular object
serialization, you will loose all performance benefits, if the
fall-back mechanism is triggered.
- Tips and tricks
Since cycle detection for references is time consuming, you may
experience performance problems when multidimensional arrays are
marshaled. Multidimensional arrays are implemented in Java as
arrays of references to other arrays.
Consider the following piece of code that is legal Java:
int array = new int[] {1, 2, 3, 4};
int[][] marray = new int[][] {array, array};
|
|
Multidimensional array with aliasing
|
The multidimensional array marray consists of two
references to the same one-dimensional array array. If
marray is marshaled, it is guaranteed that an array with
the same structure is read at the other end.
Especially for large arrays you may want to save the overhead
for exactly analyzing the alias structure of arrays during
marshaling. You can prevent this overhead by replacing the calls
to writeObject() in your marshalReferences()
method with calls to specialized methods in the class
ValueIO:
protected void marshalReferences(MarshalStream _stream)
throws java.io.IOException
{
ValueIO.writeValue(_stream, marray);
}
|
|
Marshaling multidimensional arrays as values
|