lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sengly Heng" <sengly.h...@gmail.com>
Subject Re: Indexing multiple instances of the same field and counting their frequency afterward
Date Wed, 04 Apr 2007 15:07:43 GMT
Thanks so much for your explaination. But there is one thing that I want to
make sure is that in case that i add the same token to the same field,
internally is it redundancy?

And in case, that I have many fields. What is the best way to list up the
frequency of all the tokens from different fields without knowing in advance
what are the tokens that we have.

Once again, thank you very much for your reply.

Best regards,

Sengly


On 4/4/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> See below
>
> On 4/4/07, Sengly Heng <sengly.heng@gmail.com> wrote:
> >
> > Dear all,
> >
> > My problem is a little bit strange. Instead of parsing the content of
> the
> > document to the indexer. I am adding one by one. Here is a piece of my
> > code
> > :
> >
> > Document doc = new Document();
> > doc.add(Field.Text("Features", "blue");
> > doc.add(Field.Text("Features","beautiful");
> > doc.add(Field.Text("Features", "black");
> > doc.add(Field.Text("Features","white");
> > doc.add(Field.Text("Features", "blue");
> > doc.add(Field.Text("Features","blue");
> >
> > I'd like to know whether the internal representation of this is like
> when
> > we
> > add at once the whole content of the document as a long string?
>
>
> There's no internal difference unless you implement an Analyzer that
> returns a value other than 1 from getPositionIncrementGap(). Why would
> you do this you ask? Occasionally, it is useful to index data in the
> same field but NOT allow proximity queries to span separate
> additions. Imagine indexing a book and, for some reason, you
> didn't want span queries to cross, say, chapters. You could
> do something like
> doc.add("page", <contents of page 1>);
> doc.add("page", <contents of page 2>);
>
>
>
> then in your Analyzer, have something in getPositionIncrementGap
> like
> if (page is beginning of chapter)  {
>    return 1000;
> } else {
> return 1;
> }
>
> Now, no Span query with a slop of less than 1,000 would
> match from the last page in one chapter to the first page of another.
>
> FWIW
> Erick
>
> I'd like
> > also to count the number of "blue" occurence from the document. How to
> do
> > this?
>
>
> See the other reply <G>.
>
> Thank you very much for your suggestion in advance.
> >
> > Regards,
> >
> > Sengly
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message