accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Trouble with IntersectingIterator
Date Tue, 01 Oct 2013 21:46:34 GMT
Heath,

In your case, the question that you are effectively asking is "within each
partition, which documents' index entries include all of the given terms".
Since you have partitions aligned by field and only a single index entry
per field you will not get any matches for queries with more than one term.
You can't ask a question that correlates index entries that cross a
partition boundary with the IntersectingIterator. For example, document
"m1" has the index entry for "habelson" in the "sender" partition, but the
index entry for "mgiordano" is in the "receiver" partition.

Another thing you might try is to partition by field within the document
partitions. You can hack this together by building something like the
following, with p1 = {m1,m2,m3} and p2 = {m4,m5}:

p1 receiver_habelson:m3 []    habelson
p1 receiver_jmarcolla:m2 []    jmarcolla
p1 receiver_mgiordano:m1 []    mgiordano
p1 sender_habelson:m1 []    habelson
p1 sender_habelson:m2 []    habelson
p1 sender_mgiordano:m3 []    mgiordano
p1 sentTime_1380571500:m1 []    1380571500
p1 sentTime_1380571502:m2 []    1380571502
p1 sentTime_1380571504:m3 []    1380571504
p1 subject_Lunch:m1 []    Lunch
p1 subject_Lunch:m2 []    Lunch
p1 subject_Lunch:m3 []    Lunch
p2 receiver_habelson:m5 []    habelson
p2 receiver_mcross:m4 []    mcross
p2 sender_habelson:m4 []    habelson
p2 sender_mcross:m5 []    mcross
p2 sentTime_1380571506:m4 []    1380571506
p2 sentTime_1380571508:m5 []    1380571508
p2 subject_Lunch:m4 []    Lunch
p2 subject_Lunch:m5 []    Lunch

Here terms are prefixed by field_, and you can do queries for things like
{"sender_habelson", "receiver_mgiordano"}.

Adam




On Tue, Oct 1, 2013 at 4:13 PM, Heath Abelson <HAbelson@netcentricinc.com>wrote:

>  Looking at this example, the index and record do not occur in the same
> row. The seems to be more related to the IndexedDocIterator.****
>
> ** **
>
> If we take my “mail” object as my document, and think of it as being
> partitioned by field name rather than some hash, It seems to me like the
> use of this iterator could still apply.****
>
> ** **
>
> *From:* William Slacum [mailto:wilhelm.von.cloud@accumulo.net]
> *Sent:* Tuesday, October 01, 2013 3:48 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: Trouble with IntersectingIterator****
>
> ** **
>
> That iterator is designed to be used with a sharded table format, where in
> the index and record each occur within the same row. See the Accumulo
> examples page http://accumulo.apache.org/1.4/examples/shard.html****
>
> ** **
>
> On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson <HAbelson@netcentricinc.com>
> wrote:****
>
> I am attempting to get a very simple example working with the Intersecting
> Iterator. I made up some dummy objects for me to do this work:****
>
>  ****
>
> A scan on the “Mail” table looks like this:****
>
>  ****
>
> m1 mail:body [U&(USA)]    WTF?****
>
> m1 mail:receiver [U&(USA)]    mgiordano****
>
> m1 mail:sender [U&(USA)]    habelson****
>
> m1 mail:sentTime [U&(USA)]    1380571500****
>
> m1 mail:subject [U&(USA)]    Lunch****
>
> m2 mail:body [U&(USA)]    I know right?****
>
> m2 mail:receiver [U&(USA)]    jmarcolla****
>
> m2 mail:sender [U&(USA)]    habelson****
>
> m2 mail:sentTime [U&(USA)]    1380571502****
>
> m2 mail:subject [U&(USA)]    Lunch****
>
> m3 mail:body [U&(USA)]    exactly!****
>
> m3 mail:receiver [U&(USA)]    habelson****
>
> m3 mail:sender [U&(USA)]    mgiordano****
>
> m3 mail:sentTime [U&(USA)]    1380571504****
>
> m3 mail:subject [U&(USA)]    Lunch****
>
> m4 mail:body [U&(USA)]    Dude!****
>
> m4 mail:receiver [U&(USA)]    mcross****
>
> m4 mail:sender [U&(USA)]    habelson****
>
> m4 mail:sentTime [U&(USA)]    1380571506****
>
> m4 mail:subject [U&(USA)]    Lunch****
>
> m5 mail:body [U&(USA)]    Yeah****
>
> m5 mail:receiver [U&(USA)]    habelson****
>
> m5 mail:sender [U&(USA)]    mcross****
>
> m5 mail:sentTime [U&(USA)]    1380571508****
>
> m5 mail:subject [U&(USA)]    Lunch****
>
>  ****
>
> A scan on the “MailIndex” table looks like this:****
>
>  ****
>
> receiver habelson:m3 []    habelson****
>
> receiver habelson:m5 []    habelson****
>
> receiver jmarcolla:m2 []    jmarcolla****
>
> receiver mcross:m4 []    mcross****
>
> receiver mgiordano:m1 []    mgiordano****
>
> sender habelson:m1 []    habelson****
>
> sender habelson:m2 []    habelson****
>
> sender habelson:m4 []    habelson****
>
> sender mcross:m5 []    mcross****
>
> sender mgiordano:m3 []    mgiordano****
>
> sentTime 1380571500:m1 []    1380571500****
>
> sentTime 1380571502:m2 []    1380571502****
>
> sentTime 1380571504:m3 []    1380571504****
>
> sentTime 1380571506:m4 []    1380571506****
>
> sentTime 1380571508:m5 []    1380571508****
>
> subject Lunch:m1 []    Lunch****
>
> subject Lunch:m2 []    Lunch****
>
> subject Lunch:m3 []    Lunch****
>
> subject Lunch:m4 []    Lunch****
>
> subject Lunch:m5 []    Lunch****
>
>  ****
>
> If I use an IntersectingIterator with a BatchScanner and pass it the terms
> “habelson”,”mgiordano” (or seemingly any pair of terms) I get zero results.
> If, instead, I use the same value as both terms (i.e.
> “habelson”,”habelson”) I properly get back the records that contain that
> value.****
>
>  ****
>
> My code is almost identical to the userguide example, and I am using
> Accumulo 1.4.3****
>
>  ****
>
> Any help would be appreciated****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> Heath Abelson****
>
> NetCentric Technology, Inc.****
>
> 3349 Route 138, Building A****
>
> Wall, NJ  07719****
>
> Phone: 732-544-0888 x159****
>
> Email:  habelson@netcentricinc.com  ****
>
>  ****
>
> ** **
>

Mime
View raw message