lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael A. Schoen" <>
Subject Re: clustering results
Date Sun, 11 Apr 2004 03:55:50 GMT
So as Venu pointed out, sorting doesn't seem to help the problem. If we have
to walk the result set, access docs and dedupe using brute force, we're
better off w/ the standard order by relevance.

If you've got an example of this type of clustering done in a more efficient
way, that'd be great.

Any other ideas?

----- Original Message ----- 
From: "Erik Hatcher" <>
To: "Lucene Users List" <>
Sent: Saturday, April 10, 2004 12:35 AM
Subject: Re: clustering results

> On Apr 9, 2004, at 8:16 PM, Michael A. Schoen wrote:
> > I have an index of urls, and need to display the top 10 results for a
> > given query, but want to display only 1 result per domain. It seems
> > that using either Hits or a HitCollector, I'll need to access the doc,
> > grab the domain field (I'll have it parse ahead of time) and only
> > take/display documents that are unique.
> >
> > A significant percentage of the time I expect I may have to access
> > thousands of results before I find 10 in unique domains. Is there a
> > faster approach that won't require accessing thousands of documents?
> I have examples of this that I can post when I have more time, but a
> quick pointer... check out the overloaded
> methods which accept a Sort.  You can do really really interesting
> slicing and dicing, I think, using it.  Try this one on for size:
>      example.displayHits(allBooks,
>          new Sort(new SortField[]{
>            new SortField("category"),
>            SortField.FIELD_SCORE,
>            new SortField("pubmonth", SortField.INT, true)
>          }));
> Be clever indexing the piece you want to group on - I think you may
> find this the solution you're looking for.
> Erik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message