How Serialization Detects When a Class Has Changed
In order for serialization to gracefully detect when a versioning problem has occurred, it needs to be able to detect when a class has changed. As with all the other aspects of serialization, there is a default way that serialization does this. And there is a way for you to override the default.
The default involves a hashcode. Serialization creates a single hashcode, of type long
, from the following information:
- The class name and modifiers
- The names of any interfaces the class implements
- Descriptions of all methods and constructors except
private
methods and constructors - Descriptions of all fields except
private
,static
, andprivate transient
Serialization Depends on Reflection
The dependence on reflection is the hardest of these to eliminate. Both serializing and deserializing require the serialization mechanism to discover information about the instance it is serializing. At a minimum, the serialization algorithm needs to find out things such as the value of serialVersionUID
, whether writeObject( )
is implemented, and what the superclass structure is. What's more, using the default serialization mechanism, (or calling defaultWriteObject( )
from within writeObject( )
) will use reflection to discover all the field values. This can be quite slow.
Comparing Externalizable to Serializable
Of course, this efficiency comes at a price. Serializable
can be frequently implemented by doing two things: declaring that a class implements the Serializable
interface and adding a zero-argument constructor to the class. Furthermore, as an application evolves, the serialization mechanism automatically adapts. Because the metadata is automatically extracted from the class definitions, application programmers often don't have to do anything except recompile the program.
On the other hand, Externalizable
isn't particularly easy to do, isn't very flexible, and requires you to rewrite your marshalling and demarshalling code whenever you change your class definitions. However, because it eliminates almost all the reflective calls used by the serialization mechanism and gives you complete control over the marshalling and demarshalling algorithms, it can result in dramatic performance improvements.
Serialization Fields
Serializable
class, serialPersistentFields
. This field must be initialized with an array of ObjectStreamField
objects that list the names and types of the serializable fields. The modifiers for the field are required to be private, static, and final.For example, the following declaration duplicates the default behavior.
class List implements Serializable { List next; private static final ObjectStreamField[] serialPersistentFields = {new ObjectStreamField("next", List.class)}; }
In this section, I describe a slightly simplified version of the serialization algorithm. I then proceed to a more complete description of the serialization process in the next section.
Writing
Because the class descriptions actually contain the metadata, the basic idea behind the serialization algorithm is pretty easy to describe. The only tricky part is handling circular references.
The problem is this: suppose instance
A
refers to instanceB
. And instanceB
refers back to instanceA
. Completely writing outA
requires you to write outB
. But writing outB
requires you to write outA
. Because you don't want to get into an infinite loop, or even write out an instance or a class description more than once you need to keep track of what's already been written to the stream. (Serialization is a slow process that uses the reflection API quite heavily in addition to the bandwidth)
ObjectOutputStream
does this by maintaining a mapping from instances and classes to handles. WhenwriteObject( )
is called with an argument that has already been written to the stream, the handle is written to the stream, and no further operations are necessary.If, however,
writeObject( )
is passed an instance that has not yet been written to the stream, two things happen. First, the instance is assigned a reference handle, and the mapping from instance to reference handle is stored byObjectOutputStream
. The handle that is assigned is the next integer in a sequence.TIP: Remember the
reset( )
method onObjectOutputStream
? It clears the mapping and resets the handle counter to 0x7E0000 .RMI also automatically resets its serialization mechanism after every remote method call.Second, the instance data is written out as per the data format described earlier. This can involve some complications if the instance has a field whose value is also a serializable instance. In this case, the serialization of the first instance is suspended, and the second instance is serialized in its place (or, if the second instance has already been serialized, the reference handle for the second instance is written out). After the second instance is fully serialized, serialization of the first instance resumes. The contents of the stream look a little bit like Figure 10-5.
|
Reading
From the description of writing, it's pretty easy to guess most of what happens when readObject( )
is called. Unfortunately, because of versioning issues, the implementation of readObject( )
is actually a little bit more complex than you might guess.
When it reads in an instance description, ObjectInputStream
gets the following information:
- Descriptions of all the classes involved
- The serialization data from the instance
The problem is that the class descriptions that the instance of ObjectInputStream
reads from the stream may not be equivalent to the class descriptions of the same classes in the local JVM. For example, if an instance is serialized to a file and then read back in three years later, there's a pretty good chance that the class definitions used to serialize the instance have changed.
This means that ObjectInputStream
uses the class descriptions in two ways:
- It uses them to actually pull data from the stream, since the class descriptions completely describe the contents of the stream.
- It compares the class descriptions to the classes it has locally and tries to determine if the classes have changed, in which case it throws an exception. If the class descriptions match the local classes, it creates the instance and sets the instance's state appropriately.
ReadResolve and ReadReplace
One way of eliminating the extra instances and some of the unnecessary heap allocation would be to do something like this:
public class Gender implements Serializable {
public final static Gender MALE = new Gender("Male");
public final static Gender FEMALE = new Gender("Female");
private String name;
private Gender(String name) {
this.name = name;
}
Object writeReplace() throws ObjectStreamException {
if (this.equals(MALE)) {
return SerializedForm.MALE_FORM;
} else {
return SerializedForm.FEMALE_FORM;
}
}
private static class SerializedForm implements Serializable {
final static SerializedForm MALE_FORM = new SerializedForm(0);
final static SerializedForm FEMALE_FORM = new SerializedForm(1);
private int value;
SerializedForm(int value) {
this.value = value;
}
Object readResolve() throws ObjectStreamException {
if (value == MALE_FORM.value) {
return Gender.MALE;
} else {
return Gender.FEMALE;
}
}
}
}
This also guarantees that in all cases where genderInstance.equals(MALE) is true, genderInstance == Gender.MALE is also true.
No comments:
Post a Comment