db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Data Compression for Query Processing
Date Tue, 06 Sep 2011 20:13:46 GMT
Derby already does a kind of compression on character strings as its 
native in-memory representation of character data is the java 16 bit
standard, but on disk it uses a modified utf8 compression that optimizes
storage for those characters in the ASCII range.  This is all handled in
the read and write external routines which convert from in memory 
representation to on disk representation.

Reka Thirunavukkarasu wrote:
> On Tue, Sep 6, 2011 at 11:36 PM, Mike Matrigali <mikem_app@sbcglobal.net> wrote:
>> The encryption points work at
>> a page level and the system counts on the number of bytes being the
>> same.  The reason being that we find pages by page number * number of
>> bytes per page.  If you decide to go at it from this level you will
>> need to implement an underlying filesystem to map the pages.  I don't
>> think this is very interesting as I believe you can get this effect
>> "for free" on a number of OS's by just picking a compressed filesystem and
>> put derby on that file system.
>> If I were working on this I think I would look at the point where each
>> datatype that you are trying to compress is read and written to disk.
>> Start with looking at the various "readExternal*" and "writeExternal*"
>> routines for each datatype.  Start by understanding the current on
>> disk formats of the datatype and then propose the new on disk formats
>> for the datatype.
> Are there any way to do compression before executing query?For example
> during parsing the query
> or query optimisation.Because we want to investigate whether the
> compression affects query execution
> speed.
>> Note the result of this work would not be appropriate for submission
>> as a complete project would suggest how user would control whether or
>> not to compress.  And Final format should allow the system to tell
>> the difference between the formats.   There are many options here, for
>> instance we could trace compression at the following levels:
>> per single column value
>> per single column in table (ie. metadata indicates column is compressed in
>> this table)
>> per all columns in a table (ie. metadata indicates all columns in table
>> compressed)
>> per database (ie. metadata in database says all data is compressed).
>> The system is not set up well to track per single column value.  It would
>> not be too difficult to track at table level, with the creation
>> of new internal datatypes that would inherit from each other.  ie.
>> a CompressedSQLChar that inherits from a SQLChar.
>> Understanding code in the following directory is a good start:
>> C:/derby/s1/java/engine/org/apache/derby/iapi/types
>> Rick Hillegas wrote:
>>> Hi Reka,
>>> I would recommend looking at the Derby logic for encrypting databases. You
>>> can probably get column compression to work by putting your (de)compression
>>> logic alongside the (de)encryption touchpoints.
>>> Hope this helps,
>>> -Rick
>>> On 9/6/11 9:21 AM, Reka Thirunavukkarasu wrote:
>>>> Hi Rick,
>>>> Thank you for your immediate reply.We are trying to achieve attribute
>>>> level compression
>>>> (in your words more compact storage of columns).Attribute level
>>>> compression is best
>>>> from the query processing point of view.Attributes fall in to three
>>>> major category Integer,
>>>> floating point and character string.We have to apply three different
>>>> compression techniques
>>>> for each data types.But for demonstration purpose we will apply
>>>> compression to only character
>>>> string attributes.We will test it in a database which has only
>>>> character string.This is our main goal.
>>>> Thank you.
>>>> On Tue, Sep 6, 2011 at 8:19 PM, Rick Hillegas<rick.hillegas@oracle.com>
>>>>  wrote:
>>>>> Hi Reka,
>>>>> Can you give us more detail about what you are trying to achieve? That
>>>>> may
>>>>> help us figure out what the right touchpoints are. Are you trying to
>>>>> achieve
>>>>> any of the following:
>>>>> 1) More aggressive garbage-collection of deleted rows...
>>>>> 2) More compact storage of columns...
>>>>> 3) More compact storage of rows...
>>>>> 4) More compact storage of pages...
>>>>> 5) Something else...
>>>>> Thanks,
>>>>> -Rick
>>>>> On 9/6/11 7:07 AM, Reka Thirunavukkarasu wrote:
>>>>>> Hi all,
>>>>>> We are from university of Moratuwa,Sri lanka.We are willing to apply
>>>>>> data compression to Derby in query processing
>>>>>> as requirement of our Advanced Database course project.
>>>>>> Currently Derby has facility to trim the free space in raw data
>>>>>> container(using SYSCS_UTIL.SYSCS_COMPRESS_TABLE
>>>>>> system procedure).Our goal is to apply data compression(Run-length
>>>>>> encoding Compression) for each of values(not field name)
>>>>>> of a query before executing and decompressing the data
>>>>>>  when the execution finishes.
>>>>>> Initially we went through the code base and identified that the data
>>>>>> compression can be applied within the executeStatement()
>>>>>> method of org.apache.derby.impl.jdbc.EmbedStatement class before
>>>>>> calling ps.execute(),and we thought Using getParameterValueSet()
>>>>>> method of Activation class the the attribute values of the parsed
>>>>>> query can be obtained.But when we try to print the contents of the
>>>>>> ParameterValueSet for typical insert query
>>>>>> ,it is printing null(it is just empty set).
>>>>>> We are expecting help from community regarding following questions.
>>>>>> 1)What is wrong with point we identified to apply compression?
>>>>>> 2)By applying compression before executing query,will the query
>>>>>> execution process be affected?
>>>>>> 3)Are there any possible place to apply compression and decompression
>>>>>> before executing query?
>>>>>> Thank you.
>>>>>> -Reka

View raw message