lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Indexing multiple instances of the same field and counting their frequency afterward
Date Wed, 04 Apr 2007 13:46:32 GMT
See below

On 4/4/07, Sengly Heng <sengly.heng@gmail.com> wrote:
>
> Dear all,
>
> My problem is a little bit strange. Instead of parsing the content of the
> document to the indexer. I am adding one by one. Here is a piece of my
> code
> :
>
> Document doc = new Document();
> doc.add(Field.Text("Features", "blue");
> doc.add(Field.Text("Features","beautiful");
> doc.add(Field.Text("Features", "black");
> doc.add(Field.Text("Features","white");
> doc.add(Field.Text("Features", "blue");
> doc.add(Field.Text("Features","blue");
>
> I'd like to know whether the internal representation of this is like when
> we
> add at once the whole content of the document as a long string?


There's no internal difference unless you implement an Analyzer that
returns a value other than 1 from getPositionIncrementGap(). Why would
you do this you ask? Occasionally, it is useful to index data in the
same field but NOT allow proximity queries to span separate
additions. Imagine indexing a book and, for some reason, you
didn't want span queries to cross, say, chapters. You could
do something like
doc.add("page", <contents of page 1>);
doc.add("page", <contents of page 2>);



then in your Analyzer, have something in getPositionIncrementGap
like
if (page is beginning of chapter)  {
    return 1000;
} else {
return 1;
}

Now, no Span query with a slop of less than 1,000 would
match from the last page in one chapter to the first page of another.

FWIW
Erick

I'd like
> also to count the number of "blue" occurence from the document. How to do
> this?


See the other reply <G>.

Thank you very much for your suggestion in advance.
>
> Regards,
>
> Sengly
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message