lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luan Cestari <>
Subject Using categories with Lucene
Date Sun, 08 Aug 2010 23:16:05 GMT

Lucene developers, 

We’ve been working on a undergraduate project to the college about changing
Apache Nutch (that uses Lucene do index it’s web pages) to include a
category filter, and we are having problems about the query part. We want to
develop an application with a good performance, so we thought that here
would be the best place to ask this kind of question. The idea is that the
user can search pages stored for only a category. So the number of results
found should display the number of pages that actually is classified in that

The problem is about how to add to the Lucene indexes the category
information, and how filter the search on that. We tried to look on the
Nutch mailing-list (Nabble) about that and asked some help, but people from
there think that we should use some plug-in like Carrot, that get like 100
of pages and classify it in the query time. We are not very confident that
it’s the best solution. We thought in other two different ideas: #1 To
classify those pages and store that information on a DB and in the query
time filter the result that DB to filter the result. #2 Use different index
servers, one for each category and one to search without filtering by

We have seen on this project that there are
pre-defined categories. We think that this should be classified at indexing
time, as we wanted.

Do you have any other idea about how to do that? 


Daniel Costa Gimenes & Luan Cestari
Undergraduate students of University Center of FEI
View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message