Serialize compressed objects to save space and other issues we may find

This is the idea: if our applications are storing objects on hard disk (or sending them through the network), we should save space and bandwidth if we compress them. In fact, the idea is so good that one of the innovations in ASP.NET 4.0 is that the out-of-process session storing (this is when we store the session of a remote server or SQL Server) has a new option that allows compressing the information before sending it. This is useful for web sites that need to increase the information transfer in a server farm. As a downside, however, doing so overloads processors in the servers due to the compression and decompression processes.

Before studying this in detail, we are going to see how we can implement this process in our code effortlessly:

The serialize code would be:

public static void SerializeAndCompress(object obj, string arch) {
    using(FileStream fs = new FileStream(arch, FileMode.OpenOrCreate, FileAccess.Write)) {
        using (GZipStream zs = new GZipStream(fs, CompressionMode.Compress, true)) {
            BinaryFormatter bf = new BinaryFormatter();
            bf.Serialize(zs, obj);
        }
    }
}

We send the object to be serialized to this method and also the path where we want to save it, then the method serializes the object to save it in binary format by using a BinaryStream and a GZipStream. It’s simple.

Of course, instead of saving it on disk we could have saved it on memory using a MemoryStream or any other location, but the code is basically the same.

The deserialize code would be:

public static object DecompressAndDeserialize(string arch) {
    using (FileStream fs = new FileStream(arch, FileMode.Open, FileAccess.Read)) {
        using (GZipStream zs = new GZipStream(fs, CompressionMode.Decompress, true)) {
            BinaryFormatter bf = new BinaryFormatter();
            return bf.Deserialize(zs);
        }
    }
}

This is the opposite to the previous code. In other words, we indicate the file and it is read by using a binary formatter which uses in turn a GZipStream.

If we have any class to be serialized, such as this:

[Serializable]
public class SerializableClassTest
{
    public string Property1 {get; set;}
    public int Property2 {get; set;}
    public long Property3 {get; set;}
}

and then we write this:

SerializableClassTest sct = new SerializableClassTest();
sct.Property1 = "Hello";
sct.Property2 = 2;
sct.Property3 = 3;

SerializeAndCompress(pcs, @"D:ComprTest.bin");

We’ll get the file “ComprTest.bin” in the root of D: with the object whole state. We can use this file when we want to recreate again the object on memory using the other method:

SerializableTestClass pcs2 = (SerializableTestClass) DecompressAndDeserialize(@"D:ComprTest.bin");

Done. We’ll get the object pcs2 with the same previous state, even if this is done days later or on a different machine (this is the core idea about serialization).

Watch out! We could get the opposite effect

Let’s study results comparing the same previous object with and without compression.

The uncompressed file uses 229 bytes, and the compressed file uses 294 bytes. Compressed files use more space!

This is because the compression process uses several previous elements included as a header in the file, this header is used later in the decompression process. And these header elements use some space (a little more than 100 bytes). So in this case, the file is not compressed significantly and, in addition, we are appending the header, so as a result we have a bigger file! :-(

Let’s try replace the “Hello” in the Property1 for the first 4 paragraphs of “Don Quixote, which are 5.046 characters.

Now the uncompressed file uses 5.400 bytes and the compressed file uses 2.968 bytes, this means that now the process was very efficient. The compressed file uses up only 55% percent of uncompressed one. It’s not bad.

Conclusions

Using compression to serialize objects which are going to be stored or sent is good if the object contains a lot of information, but is not recommended for small objects. In addition we should take into account the resources or processors in our servers.

I hope you can make the most of this article in practice!

JM Alarcon

ASP.NET/IIS MVP

ASP.NET/IIS MVP since 2004. MsC Mechanical engineering and business consultant specialist, he has written several books and hundreds of articles on computer science and engineering in specialized press such as PC World, Windows Magazine, dotNetMania....
No comments yet.
Add Comment Register



Leave a Reply