db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Data Compression for Query Processing
Date Tue, 06 Sep 2011 18:06:57 GMT
The encryption points work at
a page level and the system counts on the number of bytes being the
same.  The reason being that we find pages by page number * number of
bytes per page.  If you decide to go at it from this level you will
need to implement an underlying filesystem to map the pages.  I don't
think this is very interesting as I believe you can get this effect
"for free" on a number of OS's by just picking a compressed filesystem 
and put derby on that file system.

If I were working on this I think I would look at the point where each
datatype that you are trying to compress is read and written to disk.
Start with looking at the various "readExternal*" and "writeExternal*"
routines for each datatype.  Start by understanding the current on
disk formats of the datatype and then propose the new on disk formats
for the datatype.

Note the result of this work would not be appropriate for submission
as a complete project would suggest how user would control whether or
not to compress.  And Final format should allow the system to tell
the difference between the formats.   There are many options here, for
instance we could trace compression at the following levels:
per single column value
per single column in table (ie. metadata indicates column is compressed 
in this table)
per all columns in a table (ie. metadata indicates all columns in table 
per database (ie. metadata in database says all data is compressed).

The system is not set up well to track per single column value.  It 
would not be too difficult to track at table level, with the creation
of new internal datatypes that would inherit from each other.  ie.
a CompressedSQLChar that inherits from a SQLChar.

Understanding code in the following directory is a good start:

Rick Hillegas wrote:
> Hi Reka,
> I would recommend looking at the Derby logic for encrypting databases. 
> You can probably get column compression to work by putting your 
> (de)compression logic alongside the (de)encryption touchpoints.
> Hope this helps,
> -Rick
> On 9/6/11 9:21 AM, Reka Thirunavukkarasu wrote:
>> Hi Rick,
>> Thank you for your immediate reply.We are trying to achieve attribute
>> level compression
>> (in your words more compact storage of columns).Attribute level
>> compression is best
>> from the query processing point of view.Attributes fall in to three
>> major category Integer,
>> floating point and character string.We have to apply three different
>> compression techniques
>> for each data types.But for demonstration purpose we will apply
>> compression to only character
>> string attributes.We will test it in a database which has only
>> character string.This is our main goal.
>> Thank you.
>> On Tue, Sep 6, 2011 at 8:19 PM, Rick 
>> Hillegas<rick.hillegas@oracle.com>  wrote:
>>> Hi Reka,
>>> Can you give us more detail about what you are trying to achieve? 
>>> That may
>>> help us figure out what the right touchpoints are. Are you trying to 
>>> achieve
>>> any of the following:
>>> 1) More aggressive garbage-collection of deleted rows...
>>> 2) More compact storage of columns...
>>> 3) More compact storage of rows...
>>> 4) More compact storage of pages...
>>> 5) Something else...
>>> Thanks,
>>> -Rick
>>> On 9/6/11 7:07 AM, Reka Thirunavukkarasu wrote:
>>>> Hi all,
>>>> We are from university of Moratuwa,Sri lanka.We are willing to apply
>>>> data compression to Derby in query processing
>>>> as requirement of our Advanced Database course project.
>>>> Currently Derby has facility to trim the free space in raw data
>>>> system procedure).Our goal is to apply data compression(Run-length
>>>> encoding Compression) for each of values(not field name)
>>>> of a query before executing and decompressing the data
>>>>   when the execution finishes.
>>>> Initially we went through the code base and identified that the data
>>>> compression can be applied within the executeStatement()
>>>> method of org.apache.derby.impl.jdbc.EmbedStatement class before
>>>> calling ps.execute(),and we thought Using getParameterValueSet()
>>>> method of Activation class the the attribute values of the parsed
>>>> query can be obtained.But when we try to print the contents of the
>>>> ParameterValueSet for typical insert query
>>>> ,it is printing null(it is just empty set).
>>>> We are expecting help from community regarding following questions.
>>>> 1)What is wrong with point we identified to apply compression?
>>>> 2)By applying compression before executing query,will the query
>>>> execution process be affected?
>>>> 3)Are there any possible place to apply compression and decompression
>>>> before executing query?
>>>> Thank you.
>>>> -Reka

View raw message