lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Konrad" <Karsten.Kon...@xtramind.com>
Subject AW: clustering results
Date Sun, 11 Apr 2004 20:13:14 GMT

Hi (danger: shameless advertising below),

our partner, Brox It Solutions, is using our - XtraMind Technologies GmbH - clustering for
implementing meta-search clustering of search results ala Vivisimo. Check out:

http://www.anyfinder.de/

The clustering is done on the snipplets coming from search engines, but the original version
that we still use in our own products is based on modified Lucene indexes as these can efficiently
handle lots of information on texts and terms. Our clustering engine does not only cluster
search results, but also performs trend recognition for competitive intelligence and similar
tasks, but not too many people require such specialized features.

Brox' price models for this engine may be interesting for those who find other products too
expensive; it also works with all existing search engines, not only Lucene. 

--

Dr.-Ing. Karsten Konrad
Head of Artificial Intelligence Lab

XtraMind Technologies GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbr├╝cken
Phone: +49 (681) 3025113
Fax: +49 (681) 3025109
konrad@xtramind.com
www.xtramind.com







-----Urspr├╝ngliche Nachricht-----
Von: kevin@ckhill.com [mailto:kevin@ckhill.com] 
Gesendet: Sonntag, 11. April 2004 19:03
An: Lucene Users List
Betreff: Re: clustering results


I got all excited reading the subject line "clustering results" but this isn't 
really clustering is it?  This is more sorting.  Does anyone know of any work 
within Lucene (or another indexer) to do actual subject clustering (i.e. like 
Vivisimo @ http://vivisimo.com/ or Kartoo @ http://www.kartoo.com/)?  It would 
be pretty awesome if Lucene had such ability, I know there aren't a whole lot 
of clustering options, and the commercial products are very expensive.  
Anyhow, just curious.

A brief definition of clustering: automatically organizing search or database 
query results into meaningful hierarchical folders ... transforming long lists 
of search results into categorized information without any clumsy pre- processing of the source
documents.

I'm not sure how it would be done...?  Based off of top Term Frequencies for a 
document?

-K

Quoting "Michael A. Schoen" <schoenm@earthlink.net>:

> So as Venu pointed out, sorting doesn't seem to help the problem. If 
> we have to walk the result set, access docs and dedupe using brute 
> force, we're better off w/ the standard order by relevance.
> 
> If you've got an example of this type of clustering done in a more 
> efficient way, that'd be great.
> 
> Any other ideas?
> 
> 
> ----- Original Message -----
> From: "Erik Hatcher" <erik@ehatchersolutions.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Saturday, April 10, 2004 12:35 AM
> Subject: Re: clustering results
> 
> 
> > On Apr 9, 2004, at 8:16 PM, Michael A. Schoen wrote:
> > > I have an index of urls, and need to display the top 10 results 
> > > for a given query, but want to display only 1 result per domain. 
> > > It seems that using either Hits or a HitCollector, I'll need to 
> > > access the doc, grab the domain field (I'll have it parse ahead of 
> > > time) and only take/display documents that are unique.
> > >
> > > A significant percentage of the time I expect I may have to access 
> > > thousands of results before I find 10 in unique domains. Is there 
> > > a faster approach that won't require accessing thousands of 
> > > documents?
> >
> > I have examples of this that I can post when I have more time, but a 
> > quick pointer... check out the overloaded IndexSearcher.search() 
> > methods which accept a Sort.  You can do really really interesting 
> > slicing and dicing, I think, using it.  Try this one on for size:
> >
> >      example.displayHits(allBooks,
> >          new Sort(new SortField[]{
> >            new SortField("category"),
> >            SortField.FIELD_SCORE,
> >            new SortField("pubmonth", SortField.INT, true)
> >          }));
> >
> > Be clever indexing the piece you want to group on - I think you may 
> > find this the solution you're looking for.
> >
> > Erik
> >
> >
> > --------------------------------------------------------------------
> > -
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 






---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message