lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject Re: datetools and index storage question
Date Thu, 14 Dec 2006 15:37:52 GMT
this made it very clear. thank you.

On 12/14/06, Erick Erickson <erickerickson@gmail.com> wrote:
>
> UN_TOKENIZED is probably the safest way to store your dates. You could get
> by with using, say, WhitespaceAnalyzer for indexing and parsing the query,
> but that would invite hard-to-track bugs to no advantage I can see.
>
> I'll let someone more knowledgeable than me talk about NORMS
>
>
> field.store.NO puzzled me at first too... The short answer is that you
> *can*
> search on it if it's indexed but not stored, you just can't reconstruct
> the
> entire field reliably. Imagine you are indexing, for instance, a book. You
> care about which books contain text matching "gyre" and "brillig" within 8
> terms , but you never want to display the text of the page it occurs on
> with
> the results. You would use field.store.NO for the text.
>
> Why should you want to do this? Well, imagine that you want to display
> only
> the abstract of a document and direct the user to the full document in,
> say,
> PDF format. Relevance is important. So, all you really care about is that
> the document contains the terms you want so you can display the abstract
> (to
> help the user decide if this is really what they want), and just provide a
> link to the actual PDF document. You'd store the abstract (so you could
> display it with data only from the index) but you wouldn't store the text,
> just index it. In this scenario, you might NOT even want to index the
> abstract assuming you didn't want to search it (you'd get all your
> searching
> satisfied by searching the text).
>
> Note, you still can do proximity searches, wildcard searches, etc. on the
> unstored text.
>
> Thought of another way, you index something you want to search, you store
> something you want to display without going outside the index. There are
> circumstances where you want to do each of the 4 possibilities.
>
> You wind up with a smaller (sometimes MUCH smaller) faster index when you
> don't store stuff.
>
> Erick
>
> On 12/14/06, Cam Bazz <cambazz@gmail.com> wrote:
> >
> > Hello Everyone,
> >
> > I have two fields that contain the original and modification dates of
> > certain documents.
> > I decided to store them like:
> >
> > Document entry = new Document();
> > entry.add(new Field("edate", DateTools.timeToString(edate.getTime(),
> > DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.UN_TOKENIZED
> ));
> >
> > is this correct? also I should use an un_tokenized index from what I
> > understand, correct?
> >
> > I am using un_tokenized index for unique things, and tokenized for
> > everything I like to search. What are the benefits of a NO_NORMS field
> > index?
> > also I am curious to know under what circumstacnes a field.store.NO is
> > used?
> > if the field is not stored, it is not there, so why even put it?
> >
> > Best Regards,
> > C.B.
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message