lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Handeling when a field does not exist in the document
Date Thu, 22 May 2008 13:44:50 GMT
See below...

On Thu, May 22, 2008 at 5:44 AM, lucene user <luz290@gmail.com> wrote:

> We have a requirement to inform users on a regular basis of new material on
> which they have expressed interest. How are we to know what is "new" from
> the point of view of a particular user? Our idea is to tag each new item in
> some way (perhaps a date/time stamp in the lucene index indicating when the
> new document was indexed) and remember when the last time we sent out an
> alert to that user.
> How should we tag the documents? With a date/time of indexing stamp? An
> incrementing batch import ID number? Does it matter much?
>
> *I am reminded that ranges of dates and numbers, (as well as wild cards)
> are
> evaluated as if they were a large OR query covering all the values that
> exist in the index. Lucene only finds exact matches - it does not do
> comparisons. This means that ranges with lots of different values in them
> are bad - and can actually crash with a 'too many clauses' exception if
> there are enough distinct values to push the number of clauses over 1024.
> Do
> I understand this correctly?*


Yes, but.. I think ConstantScoreRangeQuery is your friend here. From the
doc...

"It does not have an upper bound on the number of clauses covered in the
range.  "

The whole expansion thing was designed to work well with scoring as I
understand.
In cases like this I don't think you care about how the tag contributes to
the score,
it's just yes/no. You could create your own Filter instead, but why bother?


>
> *How do we handle existing documents which do not have such a new field
> associated with them? Can we provide a default value for the existing
> documents? *


Not that I know of. You can certainly test if each document you are
returning has
the field. Document.get(<field>) returns null if the doc doesn't have the
field so that
should fix you up. But there's no way I know of to assign a default for a
non-existent
field.


>
>
> I did not find the place in the Lucene Documentation where it explains what
> you get when you try to retrieve or search on a field that does not exist
> in
> the document. I remember it not being a problem, but I couldn't find it.
> How
> do I do this? What should I read?
>

Searching on a field that doesn't exist just means that field isn't part of
the
scoring. So if you have a search that includes the field as an AND clause,
you'll get no matches. Imagine that each document *did* have such a
field with a value that never matched any value you search on and you'll
get the idea. So, +field1:stuff +nonexistentfield:morestuff will never turn
any document that doesn't have any value in nonexistentfield


Best
Erick

>
> Thanks!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message