accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <wsla...@gmail.com>
Subject Re: How to choose BinId for Document partitioned index
Date Sat, 06 Feb 2016 17:25:54 GMT
Often it'll be a hash of the document mod the number of bins you're using.
The hash should be "good" in the sense that it uniquely identifies the
document. It can be as simple as some unique field in the document or just
a hash (like murmur) of the whole document.

On Saturday, February 6, 2016, Jamie Johnson <jej2003@gmail.com> wrote:

> Just found this excellent write up that explains a bit.
>
> https://www.slideshare.net/mobile/acordova00/text-indexing-in-accumulo
> On Feb 6, 2016 8:52 AM, "Jamie Johnson" <jej2003@gmail.com
> <javascript:_e(%7B%7D,'cvml','jej2003@gmail.com');>> wrote:
>
>> Reading the examples for table design I've come across a question
>> associated with the document partitioned index, specifically what is
>> typically chosen as the BinId or maybe more appropriately what factors
>> should influence what is chosen as the BinId and what impact do they have?
>>
>

Mime
View raw message