Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Derby Discussion" <derby-user@db.apache.org>
Received-SPF: neutral (hermes.apache.org: local policy)
Message-ID: <426EAAE0.9050808@sbcglobal.net>
Date: Tue, 26 Apr 2005 13:56:00 -0700
From: Mike Matrigali <mikem_app@sbcglobal.net>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: Derby Discussion <derby-user@db.apache.org>,
        Derby Development <derby-dev@db.apache.org>
Subject: Re: Features: Tablepartitioning, Tablespaces and replication and
 Loadbalancing
References: <426E04C1.1070904@yahoo.de> <426E820A.5050400@sbcglobal.net>
 <426EA2A4.3040506@yahoo.de>
In-Reply-To: <426EA2A4.3040506@yahoo.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

What I meant to say is that I don't see much of a benefit to doing
physical table partitioning at the derby software layer, I think it
is best done at the hardware level.  By physical I mean the usual
disk striping provided by OS or RAID technology.

Logical table partitioning may be useful (partitioning a table by some 
sort of logical key), but not until derby is
enhanced to use more than one thread to process a single query.
And if this enhancement was done, table partitioning might not be
necessary, as current indexes could be used to partition the data
logically - and hardware level physical partitioning could be used to
scale the I/O throughput.

One of the reasons that current databases have database level partitioning
is that the software was written before physical partitioning was
freely available in modern OS's and cheap RAID hardware.  I believe they
had to write their own file systems, at least for some OS 
implementations - so the OS could not be used to do physical striping.

Currently Derby only uses one thread per connection to execute queries,
so if there is no change in this area table partitioning does not help.
If you have multiple threads accessing the one big table, then hardware
level partitioning will probably help to avoid I/O bottlenecks just as
well as logical table partitioning.

I agree that getting query processing to be able to process a single
query across multiple threads to scale on multi-processors is an 
interesting and useful feature.


apoc9009@yahoo.de wrote:
> Mike Matrigali wrote:
> 
>> table partitioning-
>>       The question here is why do want to partition the table.  If it
>>       just to spread I/O randomly across disks, I don't think it is a
>>       very useful feature.  The same thing can easily accomplished on
>>       most modern hardware/OS's at a lower level while presenting the
>>       disk farm as one disk to the JVM/derby.
> 
> 
> One simple but big Table (larger then 1 GByte of Data and expanding)
> could be splitted into serveral smaller Units for faster Query processing.
> (ORACLE and IBM DB2 are able to partitioning Tables in this way, but
> the Price is up to over  30.000 USD $)
> 
>>       Now if you are talking about key partitioning then that may be
>>       useful, but only if accompanying work is done to partition
>>       query execution in parallel against those partitions.  Below
>>       I will describe one approach that I think is the easiest and most
>>       maintainable first step towards this.
> 
> 
> exactly
> 
> 
> 
> 
>