accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Smith <>
Subject RE: Wikisearch
Date Tue, 11 Jun 2013 01:46:41 GMT
Ok, thanks for these insights, as I have mentioned, I am tweaking and changing things for my
own purpose, and I am trying to understand just how much my tweaking might have unintended
To extend upon your thoughts for why there is a problem, I need to look in the web services
to make sure it isn't creating objects from the results of the search scan, because it should
return no results.  That is where I am still concerned, shouldn't the scan iterator not pass
anything through for something with no results?  Again, I need to look harder myself, but
I am more trying to understand how the iterators notionally behave with the this table structure.

Date: Sun, 9 Jun 2013 23:18:43 -0400
Subject: Re: Wikisearch

The forward and reverse index are very important, yes, with the in-partition "field index"
being even more important. 
Yes to full table scans being undesirable and probably useless in the scope of the wikisearch
as it should index most everything and thus there is nothing extra to be gleaned. 
I forget exactly how it was implemented, but tokens will appear in the global indices and
the doc partitioned table. 
The most likely reason for the oome is that the trivial web service included attempts to suck
all results into memory. There's nothing inherently wrong with scanning all records in Accumulo,
but the webserver will easily fall over. 

On Jun 9, 2013 11:08 PM, "Frank Smith" <> wrote:

Appreciate everyone's help on the file storage question, but I was also looking at Josh's
response to Thomas Jackson, and do I understand him correctly that the scan of the Index (and
likely the ReverseIndex) table are really the key part of the search query, and the full table
scan isn't really useful for much (because all of the tokens should go in the Index tables)?

So if I understand correctly, the partitioned main table is where documents and tokens get
written, and then a combiner feeds the index tables, which are then scanned during a search?

What would I lose if I wanted to avoid Thomas's OOME and just skip the full table scan part
of the search?  
Obviously, since I am not searching Wikipedia, I am going to be making some changes, just
want to do it smartly.

View raw message