lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Using categories with Lucene
Date Mon, 09 Aug 2010 05:03:54 GMT
Hello Luan,

I think you are looking for facets and faceted search.  In short, it means 
storing the category for a document (web page) in the Document Field in Lucene 
index .  Then, at search time, you count how many matches were in which 
category.  You can implement this yourself or you can use Solr, which has this 
functionality built-in.  If you want to stick with Lucene and don't want Solr, 
you can use Bobo Browse with Lucene - Lucene in Action 2 has a case study about 
Bobo Browse, where you can learn how this is done.  Slick stuff.

Thanks for using :)

Sematext :: :: Solr - Lucene - Nutch
Lucene ecosystem search ::

----- Original Message ----
> From: Luan Cestari <>
> To:
> Sent: Sun, August 8, 2010 7:16:05 PM
> Subject: Using categories with Lucene
> Lucene developers, 
> We’ve been working on a undergraduate project to  the college about changing
> Apache Nutch (that uses Lucene do index it’s web  pages) to include a
> category filter, and we are having problems about the  query part. We want to
> develop an application with a good performance, so we  thought that here
> would be the best place to ask this kind of question. The  idea is that the
> user can search pages stored for only a category. So the  number of results
> found should display the number of pages that actually is  classified in that
> category.
> The problem is about how to add to the  Lucene indexes the category
> information, and how filter the search on that.  We tried to look on the
> Nutch mailing-list (Nabble) about that and asked some  help, but people from
> there think that we should use some plug-in like  Carrot, that get like 100
> of pages and classify it in the query time. We are  not very confident that
> it’s the best solution. We thought in other two  different ideas: #1 To
> classify those pages and store that information on a  DB and in the query
> time filter the result that DB to filter the result. #2  Use different index
> servers, one for each category and one to search without  filtering by
> category.
> We have seen on this project that there are
> pre-defined categories. We think that this should be  classified at indexing
> time, as we wanted.
> Do you have any other idea  about how to do that? 
> Sincerely,
> Daniel Costa Gimenes & Luan  Cestari
> Undergraduate students of University Center of FEI
> Brazil
> -- 
> View this message in context: 
> Sent  from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To  unsubscribe, e-mail:
> For  additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message