lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ariel" <ionat...@gmail.com>
Subject Re: MoreLikeThis over a subset of documents
Date Tue, 22 Apr 2008 22:52:25 GMT
Smart idea, but it won't help me. I have almost 50 categories and eventually
I would like to "filter" not just on category but maybe also on language,
etc.
Karl: what do you mean by measure the distance between the term vectors and
cluster them in real time?

On Tue, Apr 22, 2008 at 7:39 PM, Glen Newton <glen.newton@gmail.com> wrote:

> Sorry, I misunderstood the problem. My mistake.
>
> While not optimal and rather expensive space-wise, you could have - in
> addition to existing keyword field - a field for each category.  If
> the document being indexed is in category A, only add the text to the
> catA field. Now do MoreLikeThis on catA. This assumes you know the
> categories at index time, of course.
> Redundant but workable.
>
> -Glen
>
> 2008/4/22 Jonathan Ariel <ionathan@gmail.com>:
> > Is there any way to execute a MoreLikeThis over a subset of documents? I
> >  need to retrieve a set of interesting keywords from a subset of
> documents
> >  and not the entire index (imagine that my index has documents
> categorized as
> >  A, B and C and I just want to work with those categorized as A). Right
> now
> >  it is using docFreq from the IndexReader. So I looked into the
> >  FilterIndexReader to see if I can override the docFreq behavior, but
> I'm not
> >  sure if it's possible.
> >
> >  What do you think?
> >
> >  Jonathan
> >
>
>
>
> --
>
> -
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message