accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Williamson, Luke MR 1" <luke.williams...@defence.gov.au>
Subject RE: Intersecting Iterators [SEC=UNCLASSIFIED]
Date Wed, 14 Aug 2013 04:50:37 GMT
UNCLASSIFIED

I have tried increasing the number of threads and it seems to guarantee that it will return
before it hits the timeout but it is taking approx. 7 minutes to complete. Looking at the
accumulo manager page it appears that all the tablet servers get equally hit (around 16 per
node) and start to return but a couple of tablet servers take longer than the others. This
behaviour was indicated to potentially happen in the doco but I was hoping it wouldn't be
taking this long.

________________________________

From: David Medinets [mailto:david.medinets@gmail.com]
Sent: Wednesday, 14 August 2013 12:45
To: accumulo-user
Subject: Re: Intersecting Iterators [SEC=UNCLASSIFIED]


I'm wondering about the 20 threads in the BatchScanner. Have you played with increasing it?
I've seen that number go above 15 per accumulo node. Are you seeing the scans in the Accumulo
monitor? Are the scans progressing through the Accumulo nodes?


On Tue, Aug 13, 2013 at 9:58 PM, Williamson, Luke MR 1 <luke.williamson1@defence.gov.au>
wrote:


	UNCLASSIFIED
	
	Hi,
	
	I have field indexes that looks something like
	
	Row Id: <date>-<UUID>
	CF: fi||<type>||<value>
	CQ: <date>-<UUID>
	
	For example:
	
	20130814-550e8400-e29b-41d4-a716-446655440000 fi||verb||run 20130814-550e8400-e29b-41d4-a716-446655440000
	20130814-550e8400-e29b-41d4-a716-446655440000 page||58 line||16 "the boy can run up the hill"
	
	From what I could determine from the doco and API I am executing the following code to perform
an intersecting query on two values...
	
	Set<Range> shards = new HashSet<Range>();
	
	Text[] terms = {new Text("fi||<type>||<value>"), new Text("fi||<type>||<value>")};
	
	BatchScanner bs = conn.createBatchScanner(table, auths, 20); bs.setTimeout(360, TimeUnit.SECONDS);
	
	IteratorSetting iter = new IteratorSetting(20, "ii", IntersectingIterator.class); IntersectingIterator.setColumnFamilies(iter,
terms); bs.addScanIterator(iter);
	
	bs.setRanges(Collections.singleton(new Range()));
	
	for(Entry<Key,Value> entry : bs) {
	
	    shards.add(new Range(entry.getKey().getColumnQualifier()));
	}
	
	I then perform a second batch scan using the set of ranges returned by the above to get my
actual results.
	
	My issues is that the intersecting query takes several minutes to return if at all (in some
cases it times out). Is this expected? Is there some way to improve performance? Is there
a better way to do this sort of query?
	
	Any guidance would be much appreciated.
	
	Thanks
	
	Luke
	
	
	IMPORTANT: This email remains the property of the Department of Defence and is subject to
the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in
error, you are requested to contact the sender and delete the email.
	



IMPORTANT: This email remains the property of the Department of Defence and is subject to
the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in
error, you are requested to contact the sender and delete the email.

Mime
View raw message