lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lucene user" <>
Subject Handeling when a field does not exist in the document
Date Thu, 22 May 2008 09:44:29 GMT
We have a requirement to inform users on a regular basis of new material on
which they have expressed interest. How are we to know what is "new" from
the point of view of a particular user? Our idea is to tag each new item in
some way (perhaps a date/time stamp in the lucene index indicating when the
new document was indexed) and remember when the last time we sent out an
alert to that user.
How should we tag the documents? With a date/time of indexing stamp? An
incrementing batch import ID number? Does it matter much?

*I am reminded that ranges of dates and numbers, (as well as wild cards) are
evaluated as if they were a large OR query covering all the values that
exist in the index. Lucene only finds exact matches - it does not do
comparisons. This means that ranges with lots of different values in them
are bad - and can actually crash with a 'too many clauses' exception if
there are enough distinct values to push the number of clauses over 1024. Do
I understand this correctly?*

*How do we handle existing documents which do not have such a new field
associated with them? Can we provide a default value for the existing
documents? *

I did not find the place in the Lucene Documentation where it explains what
you get when you try to retrieve or search on a field that does not exist in
the document. I remember it not being a problem, but I couldn't find it. How
do I do this? What should I read?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message