lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <>
Subject Re: Using categories with Lucene
Date Wed, 11 Aug 2010 17:16:41 GMT
BTW I don't remember anyone on the Nutch list suggesting you to use Carrot
for this (see : or classifying at
querying time

What I suggested in was about
classifying during the parsing or indexing and generating a field for Lucene
or SOLR. As Otis pointed out you can of course use SOLR for faceting. Since
you will be using Nutch anyway, you might as well avoid an external DB just
for storing the results of the classification and just keep the labels e.g.
in the parse metadata

DigitalPebble Ltd

Open Source Solutions for Text Engineering

On 9 August 2010 00:16, Luan Cestari <> wrote:

> Lucene developers,
> We’ve been working on a undergraduate project to the college about changing
> Apache Nutch (that uses Lucene do index it’s web pages) to include a
> category filter, and we are having problems about the query part. We want
> to
> develop an application with a good performance, so we thought that here
> would be the best place to ask this kind of question. The idea is that the
> user can search pages stored for only a category. So the number of results
> found should display the number of pages that actually is classified in
> that
> category.
> The problem is about how to add to the Lucene indexes the category
> information, and how filter the search on that. We tried to look on the
> Nutch mailing-list (Nabble) about that and asked some help, but people from
> there think that we should use some plug-in like Carrot, that get like 100
> of pages and classify it in the query time. We are not very confident that
> it’s the best solution. We thought in other two different ideas: #1 To
> classify those pages and store that information on a DB and in the query
> time filter the result that DB to filter the result. #2 Use different index
> servers, one for each category and one to search without filtering by
> category.
> We have seen on this project that there are
> pre-defined categories. We think that this should be classified at indexing
> time, as we wanted.
> Do you have any other idea about how to do that?
> Sincerely,
> Daniel Costa Gimenes & Luan Cestari
> Undergraduate students of University Center of FEI
> Brazil
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message