incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Per-Namespace / Per-Table Partitioner
Date Wed, 01 Apr 2009 02:20:26 GMT
Yes, I had loadable Partitioners implemented but it is out now pending
Avinash's new OPHF...

On Tue, Mar 31, 2009 at 8:58 PM, Sandeep Tata <sandeep.tata@gmail.com> wrote:
> I agree with Alexander.
>
> The partitioner per-namespace, while useful for some apps, really ends
> up looking like a quick and dirty hack for multiple tables.
> You could achieve all of what Neophytos described in his example by
> sticking the logic in the partitioner class if we eventually allowed
> users to stick a more complex partitioning class using:
>
> <Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>
>
> (See CASSANDRA-3)
>
> This is not an elegant solution, but I'm only making it quicker and dirtier :)
>
> Perhaps we should postpone this discussion to after we resolve CASSANDRA-3 ?
>
>
> On Tue, Mar 31, 2009 at 5:02 PM, Alexander Staubo
> <madevilgenius@gmail.com> wrote:
>> On Mon, Mar 30, 2009 at 10:24 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> But I do think there is nothing wrong with partitioner-per-namespace.
>>> It should be straightfoward to implement (once we have real namespace
>>> support to begin with) and it might be interesting for some apps to
>>> have that ability.
>>
>> I can think of plenty of reasons why you would want or need to go
>> beyond mere namespaces. In my opinion "table" or possibly "database"
>> are the only sensible terms to describe such a division.
>>
>> For example, tables ought to support different replication factors
>> (BigTable and HBase both support this). You might also want to specify
>> different database directories for each table, eg. to distribute them
>> across several disks.
>>
>> There are all sorts of settings you will want to apply differently to
>> different tables due to usage semantics; for example, I imagine
>> Cassandra could be improved to more efficiently supporting streaming
>> of large blobs of binary data, GFS-style; and that some of that
>> support may be enabled by table-level settings (eg., flags to set
>> streaming buffers, append semantics or whatever). I also imagine the
>> partitioning and compaction algorithms could mature into providing
>> user-definable settings that could be tweaked according to load
>> requirements.
>>
>> It should also be possible to easily delete an entire table without
>> touching other tables. For testing purposes, for example, I would like
>> to be able to load an entire table into the system, play with it, then
>> drop the entire thing, without having to go through the process of a
>> whole new, separate Cassandra. Using temporary tables to store result
>> sets is also very common in MapReduce applications.
>>
>> Alexander.
>>
>

Mime
View raw message