lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From findbestopensource <>
Subject Re: Using categories with Lucene
Date Mon, 09 Aug 2010 05:13:01 GMT
Hello Daniel & Luan

1. Carrot is not required for your purpose. Carrot helps to
consolidate the results from multiple search results.

2. You need to add a category to the pages at the index time and
filter out the results during search time. If you want to use Lucene,
then you could store the category information in a separate DB or XML
and filter out the search results. If you want an automatic system to
display the category data then you could consider using Solr. You
could checkout our website, we have used Solr to display the category
/ tagged field data.


On Mon, Aug 9, 2010 at 4:46 AM, Luan Cestari <> wrote:
> Lucene developers,
> We’ve been working on a undergraduate project to the college about changing
> Apache Nutch (that uses Lucene do index it’s web pages) to include a
> category filter, and we are having problems about the query part. We want to
> develop an application with a good performance, so we thought that here
> would be the best place to ask this kind of question. The idea is that the
> user can search pages stored for only a category. So the number of results
> found should display the number of pages that actually is classified in that
> category.
> The problem is about how to add to the Lucene indexes the category
> information, and how filter the search on that. We tried to look on the
> Nutch mailing-list (Nabble) about that and asked some help, but people from
> there think that we should use some plug-in like Carrot, that get like 100
> of pages and classify it in the query time. We are not very confident that
> it’s the best solution. We thought in other two different ideas: #1 To
> classify those pages and store that information on a DB and in the query
> time filter the result that DB to filter the result. #2 Use different index
> servers, one for each category and one to search without filtering by
> category.
> We have seen on this project that there are
> pre-defined categories. We think that this should be classified at indexing
> time, as we wanted.
> Do you have any other idea about how to do that?
> Sincerely,
> Daniel Costa Gimenes & Luan Cestari
> Undergraduate students of University Center of FEI
> Brazil
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message