lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: solr faceted search performance reason
Date Wed, 06 Apr 2011 18:06:29 GMT
On 4/6/2011 10:55 AM, Robin Palotai wrote:
> Therefore, Lucene supposedly has some advanced technique for multi-field
> queries other than just taking the intersection of matching documents based
> on the inverted index.

I don't think so, neccesarily.  It's just that Lucene's algorithms to 
doing this is very fast, with some additional  optimizations to make it 
even faster. There may be some edge cases where the optimizations take 
some shortcuts on top of this -- ie, if you ask for only the first ten 
facet values ordered by number of hits, in some cases solr/lucene won't 
even calculate the hit counts for facet values it already knows aren't 
going to be in the top 10.  The facetting code in 1.4+ is actually kind 
of tangled, in that several different calculation approaches can be 
taken depending on the nature of the result set and schema.


But anyway, I think you're right that you could set up an rdbms schema 
to _conceptually_ allow very similar operations to a lucene index. It 
would be unlikely to perform as well, because the devil is in the 
details of the storage formats and algorithms, and lucene has been 
optimized for these particular cases (at the expense of not covering a 
great many cases that an rdbms can cover).

In fact, while I can't find it now on Google, I think someone HAS in the 
past written an extension to lucene to have it store it's indexes in an 
rdbms using a schema much like you describe, instead of in the file 
system. I'm not sure why they would want to do this instead of just 
using the rdbms -- either lucene's access algorithms still provide a 
performance benefit even when using an rdbms as the underlying 'file 
system', or lucene provides convenient functions that you wouldn't want 
to have to re-implement yourself solely in terms of an rdbms, or both. 
Ah, here's a brief reference to that approach in the lucene FAQ: 
http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index_in_a_relational_database.3F

Jonathan

> So the question is, what is this technique/trick? More broadly: Why can
> Lucene/Solr achieve better faceted search performance theoretically than
> RDBMS could (if so)?
>
> *Note: My first guess would be that Lucene would use some space partitioning
> method for partitioning a vector space built from the document fields as
> dimensions, but as I understand Lucene is not purely vector space based.*
> Thanks,
> Robin
>
> On Wed, Apr 6, 2011 at 3:15 PM, Erick Erickson<erickerickson@gmail.com>wrote:
>
>> Please re-post the question here so others can see
>> the discussion without going to another list.
>>
>> Best
>> Erick
>>
>> On Wed, Apr 6, 2011 at 4:09 AM, Robin Palotai<m.palotai.robin@gmail.com
>>> wrote:
>>> Hello List,
>>>
>>> Please see my question at
>>>
>>>
>> http://stackoverflow.com/questions/5552919/how-does-lucene-solr-achieve-high-performance-in-multi-field-faceted-search
>>> ,
>>> I would be interested to know some details.
>>>
>>> Thank you,
>>> Robin
>>>

Mime
View raw message