accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Wikisearch Iterators
Date Thu, 06 Jun 2013 20:33:01 GMT
Hi Thomas,

A couple of things you can glean from this.

"full table scan" - Implies that, for some reason, the iterators or 
client code did not find one of the terms necessary to satisfy your 
query and attempted to find matching records using an exhaustive search. 
IMO, this shouldn't even exist as the Wikisearch indexes everything, and 
the 'feature' masks infinitely more problems than helping satisfies 
queries that the index can't satisfy (which are few).

OOME - Was this the tabletserver or the webserver? If the webserver, it 
could be that your query returned too many results that fit into the 
configured Java heap space. You could try upping -Xmx and see if you can 
find the sweet spot.

It should be said, also, that the iterators included in the Wikisearch 
application are *very* rough and are likely not great examples to use as 
a basis for good Accumulo SortedKeyValueIterator development. However, 
the basic algorithm which the iterators perform is sound, scalable, and 
can perform quite well, especially when coupled with certain optimizations.

A would agree with you that a white-paper or similar on the table 
structure and algorithm is long overdue.

If you have more specific problems, I'm sure the community at large 
(self, included) would be happy to help and go into more detail.

On 06/06/2013 04:05 PM, Thomas Jackson wrote:
> Hey everyone,
> I am taking the Wikisearch application for a test drive and ran into 
> some issues.  I have successfully ingested a number of wiki dumps for 
> several langues into Accumulo and have been able to search on terms 
> that I know exist in the corpus.  However, the issue I run into is 
> that I get an out of memory exception when the application performs a 
> full table scan searching for a term that does not exist in the index. 
> Has anyone else encountered this issue?
> Also I was hoping to find out if anyone had any documentation or 
> information on how the iterators in the wikisearch application work.
> Thanks
> TJ

View raw message