lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Clustering with Lucene?
Date Wed, 27 Apr 2011 06:19:18 GMT
They may not be dictionary, but they is a limited number of term entries and
they seem regular. Your inquiries indicate you need a faceting feature (or
even an sql-like set of queries backed up by a fast index...), probably with
some pruning.

Clustering is an unsupervised process that attempts to find latent
relationships between concepts in text (and describe these somehow).
Faceting is, ehm... "flattening" of your search result with respect to some
category for which the dictionary of terms is relatively limited. Any
product categories (with counts) that you see in search results on Amazon or
other sites like these are facets.


On Wed, Apr 27, 2011 at 6:07 AM, vivek sar <> wrote:

> Thanks Dawid. I was trying to give some example, but this is not
> exactly our text. Our fields include things like "user name", "IP
> Address", "Application Name", "Port 3", "Byte Count" - all network
> related stuff. So, if user searches on certain IP address then we
> would need to group the result by user, application, i.e. show me all
> the users who have used this IP, what applications have been used on
> that IP etc. These are definitely not dictionary fields.
> I'm looking at faceting right now - checking if this would work with
> Lucene (as we can not change to Solr at this point). What's the main
> difference between clustering and faceting?
> Thanks,
> -vivek
> On Tue, Apr 26, 2011 at 12:02 PM, Dawid Weiss <>
> wrote:
> >> 1) We index around 20 fields, of that we want to have grouping option
> >> for five of them. For ex., user can search on name of the city and we
> >> should have option to group by products available in that city (and
> >> vice-versa).
> >>
> >
> > Are these fields stricly defined or free text? Because if they are
> > product/dictionary fields then what you're looking for is not text
> > clustering, but faceting and the solution is to use either SOLR or its
> > components for doing exactly this.
> >
> >
> >> 2) We also need an aggregation facility, which would allow to
> >> aggregate certain field value from that group. For ex., sum the qty
> >> for all the products in a category. The aggregation may not be part of
> >> clustering, but could be something add-on to it.
> >>
> >
> > This definitely looks like faceting. Take a look at Solr's faceting
> > functionality -- I think this will solve your problem.
> >
> > Dawid
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message