Up the GZipStream Without A Paddle
Let me say that, despite the snarky title of this post, my experience with GZipStream has been very positive. I had to find some obscure code to solve one of the issues, and thought I'd share that resolution and my overall findings here.
So I'm working on a web application where there is a huge amount of user-specific data; very, very, very granular detail about each user. Anywhere from 60kb - 250kb worth of text per user. Thousands of rows returned from the database. Too much to store in session. So - mostly as an exercise - I decided to (a) read the data in on login, (b) use GZipStream to compress this data, (c) store it in Sql, (d) retrieve it when needed for each user (based on a custom Authentication Ticket, and SessionID, etc.), and (e) decompress it for actual usage. That way, it's not carried in session, and although RAM usage on the box is a bit higher than I'd really like, at least it's stable.
When you use GZipStream to compress data, all is good. However, when you go to decompress your data, you have to know HOW BIG the original uncompressed data was. I suppose I could have a column in Sql, "Uncompressed_Size" or something. But how lame is that? Then I discovered that when the data is compressed by GZipStream, the info I needed is written into the footer of the compressed datablob. It can be extracted and used. Very cool.
Also, I ended up rolling my own serializer-type code that used that old standby, delimited strings and (my friend and yours) the "Split" command. Yes, I know, serializing objects is way cool, but in this case there's no reason to Serialize other than convenience. I'm not communicating with disparate systems. And I know that if the object ever changes, I will have to handle that in a couple of places in the code. Still, the difference is enough to warrant that. Here's the numbers:
Uncompressed bytes when Serialized: 94,688.
Compressed bytes when Serialized: 54,258.
Uncompressed bytes when character-delimited: 72,920.
Compressed bytes when character-delimited: 13,871.
So there it is; what can I say, other than "~" is your friend and delimiter.
Compression code:
This code would be in the object, which I'm calling OriginalObjectType, that holds the data. It returns a byte array that can be persisted to Sql.
Public Function Compress() As Byte()
Try
Dim resultsStream As New MemoryStream
'Generate the delimited string. The DelimitedObjectString
'function returns a string; it loops through all the items
'in the object and packs the delimiters on.
Dim DelimitedObjectString As String = GenerateStringFromObject(Me)
'Convert the string into a byte array.
Dim encoding As New System.Text.UnicodeEncoding()
Dim MyBytes As Byte() = encoding.GetBytes(DelimitedObjectString)
'Use the new MemoryStream for the compressed data.
Using compressedzipStream As New GZipStream(resultsStream, CompressionMode.Compress, True)
compressedzipStream.Write(MyBytes, 0, MyBytes.Length)
compressedzipStream.Flush()
End Using
Return resultsStream.ToArray()
Catch ex As Exception
'Exception handling goes here.
End Try
End Function
Decompression code:
So in your calling application code, read the compressed data in from the table, load it into a MemoryStream, and then call the Decompress method to reconstitute the object. There are other ways to do this, but I needed the compressed object in a MemoryStream for reasons that I'm not disclosing here.
(This would be in the SqlDataReader.Read() loop.)
Dim Obj As New OriginalObjectType
Dim tempStream As New System.IO.MemoryStream(sqlDR.GetSqlBinary(0).Value)
Obj = Obj.Decompress(tempStream)
Public Function Decompress(ByVal CompressedObj As MemoryStream) As OriginalObjectType
Try
'Determine the original uncompressed size of the object.
'This info is stored in the footer of the compressed stream.
Dim footerBuffer() As Byte
footerBuffer = New Byte(4) {}
CompressedObj.Position = CType(CompressedObj.Length, Integer) - 4
CompressedObj.Read(footerBuffer, 0, 4)
Dim UncompressedLength As Integer = BitConverter.ToInt32(footerBuffer, 0)
'Reset compressed stream's position to zero.
CompressedObj.Position = 0
Using zipStream As New GZipStream(CompressedObj, CompressionMode.Decompress, True)
Dim decompressedBuffer(UncompressedLength) As Byte
zipStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)
zipStream.Close()
'The RegenerateFromString function does the splits, and
'loops through the resulting array to reconstitute the
'original object. It returns the OriginalObjectType.
Dim encoding As New System.Text.UnicodeEncoding()
Return RegenerateFromString(encoding.GetString(decompressedBuffer))
End Using
Catch ex As Exception
'Exception handling goes here.
End Try
End Function
All in all, I found this new class in v2.0 of the Framework to be very handy. Lotsa fun.
P.S. Other potential titles of this blog post:
"Islands in the GZipStream"
"I Scream You Scream We All Scream for GZipStream"
"Dream a Little GZipStream"
"Row, Row, Row Your Boat . . . " Ah, I can't finish that one. Too . . . lame!