accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Williamson, Luke MR 1" <luke.williams...@defence.gov.au>
Subject Intersecting Iterators [SEC=UNCLASSIFIED]
Date Wed, 14 Aug 2013 01:58:31 GMT
UNCLASSIFIED

Hi,
 
I have field indexes that looks something like
 
Row Id: <date>-<UUID>
CF: fi||<type>||<value>
CQ: <date>-<UUID>
 
For example: 

20130814-550e8400-e29b-41d4-a716-446655440000 fi||verb||run 20130814-550e8400-e29b-41d4-a716-446655440000
20130814-550e8400-e29b-41d4-a716-446655440000 page||58 line||16 "the boy can run up the hill"

>From what I could determine from the doco and API I am executing the following code to
perform an intersecting query on two values...

Set<Range> shards = new HashSet<Range>();

Text[] terms = {new Text("fi||<type>||<value>"), new Text("fi||<type>||<value>")};

BatchScanner bs = conn.createBatchScanner(table, auths, 20); bs.setTimeout(360, TimeUnit.SECONDS);

IteratorSetting iter = new IteratorSetting(20, "ii", IntersectingIterator.class); IntersectingIterator.setColumnFamilies(iter,
terms); bs.addScanIterator(iter);

bs.setRanges(Collections.singleton(new Range()));

for(Entry<Key,Value> entry : bs) {

    shards.add(new Range(entry.getKey().getColumnQualifier()));
}

I then perform a second batch scan using the set of ranges returned by the above to get my
actual results.

My issues is that the intersecting query takes several minutes to return if at all (in some
cases it times out). Is this expected? Is there some way to improve performance? Is there
a better way to do this sort of query?

Any guidance would be much appreciated.

Thanks

Luke


IMPORTANT: This email remains the property of the Department of Defence and is subject to
the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in
error, you are requested to contact the sender and delete the email.

Mime
View raw message