accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Heath Abelson <HAbel...@netcentricinc.com>
Subject RE: Trouble with IntersectingIterator
Date Tue, 01 Oct 2013 20:13:09 GMT
Looking at this example, the index and record do not occur in the same row. The seems to be
more related to the IndexedDocIterator.

If we take my "mail" object as my document, and think of it as being partitioned by field
name rather than some hash, It seems to me like the use of this iterator could still apply.

From: William Slacum [mailto:wilhelm.von.cloud@accumulo.net]
Sent: Tuesday, October 01, 2013 3:48 PM
To: user@accumulo.apache.org
Subject: Re: Trouble with IntersectingIterator

That iterator is designed to be used with a sharded table format, where in the index and record
each occur within the same row. See the Accumulo examples page http://accumulo.apache.org/1.4/examples/shard.html

On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson <HAbelson@netcentricinc.com<mailto:HAbelson@netcentricinc.com>>
wrote:
I am attempting to get a very simple example working with the Intersecting Iterator. I made
up some dummy objects for me to do this work:

A scan on the "Mail" table looks like this:

m1 mail:body [U&(USA)]    WTF?
m1 mail:receiver [U&(USA)]    mgiordano
m1 mail:sender [U&(USA)]    habelson
m1 mail:sentTime [U&(USA)]    1380571500
m1 mail:subject [U&(USA)]    Lunch
m2 mail:body [U&(USA)]    I know right?
m2 mail:receiver [U&(USA)]    jmarcolla
m2 mail:sender [U&(USA)]    habelson
m2 mail:sentTime [U&(USA)]    1380571502
m2 mail:subject [U&(USA)]    Lunch
m3 mail:body [U&(USA)]    exactly!
m3 mail:receiver [U&(USA)]    habelson
m3 mail:sender [U&(USA)]    mgiordano
m3 mail:sentTime [U&(USA)]    1380571504
m3 mail:subject [U&(USA)]    Lunch
m4 mail:body [U&(USA)]    Dude!
m4 mail:receiver [U&(USA)]    mcross
m4 mail:sender [U&(USA)]    habelson
m4 mail:sentTime [U&(USA)]    1380571506
m4 mail:subject [U&(USA)]    Lunch
m5 mail:body [U&(USA)]    Yeah
m5 mail:receiver [U&(USA)]    habelson
m5 mail:sender [U&(USA)]    mcross
m5 mail:sentTime [U&(USA)]    1380571508
m5 mail:subject [U&(USA)]    Lunch

A scan on the "MailIndex" table looks like this:

receiver habelson:m3 []    habelson
receiver habelson:m5 []    habelson
receiver jmarcolla:m2 []    jmarcolla
receiver mcross:m4 []    mcross
receiver mgiordano:m1 []    mgiordano
sender habelson:m1 []    habelson
sender habelson:m2 []    habelson
sender habelson:m4 []    habelson
sender mcross:m5 []    mcross
sender mgiordano:m3 []    mgiordano
sentTime 1380571500:m1 []    1380571500
sentTime 1380571502:m2 []    1380571502
sentTime 1380571504:m3 []    1380571504
sentTime 1380571506:m4 []    1380571506
sentTime 1380571508:m5 []    1380571508
subject Lunch:m1 []    Lunch
subject Lunch:m2 []    Lunch
subject Lunch:m3 []    Lunch
subject Lunch:m4 []    Lunch
subject Lunch:m5 []    Lunch

If I use an IntersectingIterator with a BatchScanner and pass it the terms "habelson","mgiordano"
(or seemingly any pair of terms) I get zero results. If, instead, I use the same value as
both terms (i.e. "habelson","habelson") I properly get back the records that contain that
value.

My code is almost identical to the userguide example, and I am using Accumulo 1.4.3

Any help would be appreciated





Heath Abelson
NetCentric Technology, Inc.
3349 Route 138, Building A
Wall, NJ  07719
Phone: 732-544-0888 x159<tel:732-544-0888%20x159>
Email:  habelson@netcentricinc.com<mailto:habelson@netcentricinc.com>



Mime
View raw message