lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luan Cestari <luan.cest...@gmail.com>
Subject Re: Using categories with Lucene
Date Wed, 11 Aug 2010 19:43:56 GMT
Julien,

You're right. We discovered carrot by searching the mailing-list and thought
it was mentioned in one of our conversations. We are sorry for our mistake.

Best Regards,
Daniel Gimenes
Luan Cestari

On Wed, Aug 11, 2010 at 2:16 PM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> BTW I don't remember anyone on the Nutch list suggesting you to use Carrot
> for this (see : http://search-lucene.com/?q=luan+carrot) or classifying at
> querying time
>
> What I suggested in http://search-lucene.com/m/JWZTj1q4lB92 was about
> classifying during the parsing or indexing and generating a field for
> Lucene
> or SOLR. As Otis pointed out you can of course use SOLR for faceting. Since
> you will be using Nutch anyway, you might as well avoid an external DB just
> for storing the results of the classification and just keep the labels e.g.
> in the parse metadata
>
> Julien
> --
> DigitalPebble Ltd
>
> Open Source Solutions for Text Engineering
> http://www.digitalpebble.com
>
> On 9 August 2010 00:16, Luan Cestari <luan.cestari@gmail.com> wrote:
>
> >
> > Lucene developers,
> >
> > We’ve been working on a undergraduate project to the college about
> changing
> > Apache Nutch (that uses Lucene do index it’s web pages) to include a
> > category filter, and we are having problems about the query part. We want
> > to
> > develop an application with a good performance, so we thought that here
> > would be the best place to ask this kind of question. The idea is that
> the
> > user can search pages stored for only a category. So the number of
> results
> > found should display the number of pages that actually is classified in
> > that
> > category.
> >
> > The problem is about how to add to the Lucene indexes the category
> > information, and how filter the search on that. We tried to look on the
> > Nutch mailing-list (Nabble) about that and asked some help, but people
> from
> > there think that we should use some plug-in like Carrot, that get like
> 100
> > of pages and classify it in the query time. We are not very confident
> that
> > it’s the best solution. We thought in other two different ideas: #1 To
> > classify those pages and store that information on a DB and in the query
> > time filter the result that DB to filter the result. #2 Use different
> index
> > servers, one for each category and one to search without filtering by
> > category.
> >
> > We have seen on this project http://search-lucene.com/ that there are
> > pre-defined categories. We think that this should be classified at
> indexing
> > time, as we wanted.
> >
> > Do you have any other idea about how to do that?
> >
> > Sincerely,
> >
> > Daniel Costa Gimenes & Luan Cestari
> > Undergraduate students of University Center of FEI
> > Brazil
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Using-categories-with-Lucene-tp1049232p1049232.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>



-- 
Luan cestari

"All the gold which is under or upon the earth is not enough to give in
exchange for virtue."
Plato
"At his best, man is the noblest of all animals; separated from law and
justice he is the worst."
"A true friend is one soul in two bodies."
Aristotle

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message