lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@osafoundation.org>
Subject Re: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 02:28:51 GMT

On Thu, 28 Oct 2010, Robert Muir wrote:

> On Thu, Oct 28, 2010 at 8:35 PM,  <karl.wright@nokia.com> wrote:
>> In database queries, it is often useful to treat an empty value specially, and be
able to search explicitly for records that have (for instance) no field X, or no value for
field X.  I can't regurgitate offhand all the precise situations that I've used this and
claim that they would apply to a search engine, but it is conceivable that it could be helpful
to somebody.  Would your proposed change preclude current or future support for such null
queries?
>
> in a database, not having a value for field X is 'null'.
>
> 1. null is different than empty term.
> 2. comparing this concept with an inverted index vs a database record
> really isn't a comparison.
>
> by not having 'empty' terms (terms of length=0), what searches would
> be affected.
>
> I still haven't heard a real use case (though surely perhaps there is
> someone abusing this somehow), but there is a serious performance trap
> that is definitely real.

I've used this in a URL index. I needed to be able to distinguish between 
searching URLs that had, say, no path, from searching URLs without matching 
the path component. The absence of path was represented with an empty token 
in the path field.

Andi..

Mime
View raw message