accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <>
Subject Re: Major Compacting ISAMs
Date Sat, 28 Jul 2012 15:01:42 GMT
John is spot on. However, there's one additional implication to mention,
which is that you need to pick a table structure that doesn't require
adding more data to the same tablet over time if you are continuing to
write new data to your table. Depending on what type of indexing you would
like to use, this generally requires using a document-partitioned structure
like that used in the WikiSearch example:

For some problems (like building a graph or an RDF triple store) this isn't
really feasible, and you will eventually need to major compact.


On Fri, Jul 27, 2012 at 11:35 AM, John Armstrong <> wrote:

> On 07/27/2012 11:23 AM, Hugh Xedni wrote:
>> If I load sorted key-value map or ISAM files into HDFS via bulk loading,
>> how can I ensure only one file will be assigned to a tablet and major
>> compaction is avoided?
> I think (and those more knowledgeable will correct me if I'm wrong) that
> you could achieve this by
> (a) making sure that all your bulk-load files contain non-overlapping
> Accumulo key ranges and are
> (b) each smaller than the maximum tablet size on the table, and
> (c) setting the table splits to the file key range boundaries before bulk
> importing.
> These should be sufficient conditions, though possibly (likely?) not
> necessary.
> hth

View raw message