lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From csahat <csa...@gmail.com>
Subject Re: Counting term frequency without using Explanation
Date Wed, 07 Feb 2007 15:27:24 GMT
  Hi Erick,

    Thanks for your help. I found it now . Maybe I will donate an article /
FAQ
regarding this... since many newbies like me asking for things like this.


On 2/7/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> Before you go too far down this path, please consider what a "hit" is.
> It's
> more complicated than you think <G>.
>
> If all you want to do is count up the number of times any term appears in
> the document, it's not too hard. You should be able to use a
> termenum/termdocs process to count them.
>
> TermDocs should work, just seek to a term, skip to the document number
> (which you'll have to get somewhere else), and keep adding to your count
> while the docid is the same as your target. Repeat for each term.
>
> But it's a much more complicated story if you want to accurately reflect a
> query. For instance, consider a near query, that is terms within, say, 3
> of
> each other. If you do something like the above, you'll present "hits" that
> aren't real. For instance...
>
> a b c d e f g h i j a
>
> if you search for a and c within 3 of each other, is this one hit? two? it
> definitely isn't three which is what you'd get if you just counted the
> occurrence of the terms a, b... What about a NOT clause? How does a phrase
> query get counted?
>
> There have been several discussions of various aspects of this issue, but
> often in the context of highlighting. You'll probably get some good
> information from the following threads...
>
> Counting terms' hits from phrases
> Counting hits in a document
>
> as well as searching the archive on highlighting and/or hitcount
>
> Best
> Erick
>
>
>
>
> On 2/7/07, csahat <csahat@gmail.com> wrote:
> >
> > Hi all,
> >
> >   I'm so sorry if this question already answered before in this list,
> but
> > I
> > already search
> > the list, and I couldn't find the answer.
> >
> >    This is what I want to do :
> >
> >   When the user type in the query, for example "WebSphere Java",
> > Lucene will show not only the score, but showing the term count per
> > document
> > as well, like this
> >
> >   doc1    0.8333          websphere=3, Java = 2
> >   doc2    0.817            websphere=2, Java=2
> >
> >
> >   I already tried to implement with TermFreqVector, but TermFreqVector
> > show
> > all the
> > terms in the field, instead what I want is only the terms that happen in
> > the
> > query.
> > I already tried using TermDocs as well, but it always gave result 0.
> >
> >   I tried using Explanation class, using toString method, but I have to
> > "clean"
> > the information.
> >
> >
> >   Is there any "direct" way to do this in Lucene ?  Or perhaps someone
> can
> > give me a hint ?
> >
> >   Thanks in advance
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message