lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: datetools and index storage question
Date Thu, 14 Dec 2006 15:18:09 GMT
UN_TOKENIZED is probably the safest way to store your dates. You could get
by with using, say, WhitespaceAnalyzer for indexing and parsing the query,
but that would invite hard-to-track bugs to no advantage I can see.

I'll let someone more knowledgeable than me talk about NORMS puzzled me at first too... The short answer is that you *can*
search on it if it's indexed but not stored, you just can't reconstruct the
entire field reliably. Imagine you are indexing, for instance, a book. You
care about which books contain text matching "gyre" and "brillig" within 8
terms , but you never want to display the text of the page it occurs on with
the results. You would use for the text.

Why should you want to do this? Well, imagine that you want to display only
the abstract of a document and direct the user to the full document in, say,
PDF format. Relevance is important. So, all you really care about is that
the document contains the terms you want so you can display the abstract (to
help the user decide if this is really what they want), and just provide a
link to the actual PDF document. You'd store the abstract (so you could
display it with data only from the index) but you wouldn't store the text,
just index it. In this scenario, you might NOT even want to index the
abstract assuming you didn't want to search it (you'd get all your searching
satisfied by searching the text).

Note, you still can do proximity searches, wildcard searches, etc. on the
unstored text.

Thought of another way, you index something you want to search, you store
something you want to display without going outside the index. There are
circumstances where you want to do each of the 4 possibilities.

You wind up with a smaller (sometimes MUCH smaller) faster index when you
don't store stuff.


On 12/14/06, Cam Bazz <> wrote:
> Hello Everyone,
> I have two fields that contain the original and modification dates of
> certain documents.
> I decided to store them like:
> Document entry = new Document();
> entry.add(new Field("edate", DateTools.timeToString(edate.getTime(),
> DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.UN_TOKENIZED));
> is this correct? also I should use an un_tokenized index from what I
> understand, correct?
> I am using un_tokenized index for unique things, and tokenized for
> everything I like to search. What are the benefits of a NO_NORMS field
> index?
> also I am curious to know under what circumstacnes a is
> used?
> if the field is not stored, it is not there, so why even put it?
> Best Regards,
> C.B.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message