lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Saltiel" <adam.salt...@btinternet.com>
Subject RE: carrot2 question too - Re: Fun with the Wikipedia
Date Sat, 29 Jan 2005 19:29:30 GMT
Strangely enough this subject is being taken up in the RE: -> Grouping
Search Results by Clustering Snippets: thread.

Adam

> -----Original Message-----
> From: Owen Densmore [mailto:owen@backspaces.net]
> Sent: Friday, January 28, 2005 4:57 PM
> To: lucene-user@jakarta.apache.org
> Cc: Owen Densmore
> Subject: Re: carrot2 question too - Re: Fun with the Wikipedia
>
> I looked at the Carrot2 docs which mentioned dimension reduction via
> singular value decomposition (SVD) .. and other forms too I think.
>
> Question: Does anyone have pointers to successful clustering
techniques
> used with lucene?  I'm particularly interested in 2D and 3D graphics
as
> well, possibly SOM (Self Organizing Maps).
>
> I'm hoping to combine lucene with a graphical auto-clustering stunt of
> some kind but am not sure how to do it yet.
>
> Owen
>
>
> > From: Akmal Sarhan <as@byteaction.de>
> > Date: January 28, 2005 8:19:03 AM MST
> > To: Lucene Users List <lucene-user@jakarta.apache.org>
> > Subject: Re: carrot2 question too - Re: Fun with the Wikipedia
> >
> >
> > Hello,
> >
> > we have been experimenting with carrot2 and are very pleased so far,
> > only one issue: there is no release not even an alpha one and the
> > dependencies seemed to be patched (jama)
> > is there any intentions to have any releases in the near future?
> >
> > thanks
> >
> > Akmal
> > Am Montag, den 17.01.2005, 10:15 +0100 schrieb Dawid Weiss:
> >> Hi David,
> >>
> >> I apologize about the delay in answering this one, Lucene is a busy
> >> mailing list and I had a hectic last week... Again, sorry for
belated
> >> answer, hope you still find it useful.
> >>
> >>>> That is awesome and very inspirational!
> >>
> >> Yes, I admit what you've done with Wikipedia is quite interesting
and
> >> looks very good. I'm also glad you spent some time working out
Carrot
> >> integration with Lucene. It works quite nice.
> >>
> >>>> Carrot2 looks very interesting. Wondering if anybody has a list
of
> >>>> all
> >>>> the
> >>>
> >>> Technically I don't think carrot2 uses lucene per-se- it's just
that
> >>> you
> >>> can integrate the two, and ditto for Nutch - it has code that uses
> >>> Carrot2.
> >>
> >> Yes, this is true. Carrot2 doesn't use all of Lucene's potential --
it
> >> merely takes the output from a query (titles, urls and snippets)
and
> >> attempts to cluster them into some sensible groups. I think many
> >> things
> >> could be improved, the most important of them is fast snippet
> >> retrieval
> >>    from Lucene because right now it takes 50% of the time of the
> >> clustering; I've seen a post a while ago describing a faster
snippet
> >> generation technique, I'm sure that would give clustering a huge
boost
> >> speed-wise.
> >>
> >>> And here's my question. I reread the Carrot2<->Lucene code, esp
> >>> Demo.java, and there's this fragment:
> >>>
> >>>     // warm-up round (stemmer tables must be read etc).
> >>>     List clusters = clusterer.clusterHits(docs);
> >>>
> >>>     long clusteringStartTime = System.currentTimeMillis();
> >>>     clusters = clusterer.clusterHits(docs);
> >>>     long clusteringEndTime = System.currentTimeMillis();
> >>>
> >>> Thus it calls clusterHits() twice.
> >>>
> >>> I don't really understand how to use Carrot2 - but I think the
above
> >>> is
> >>> just for the sake of benchmarking clusterHits() w/o the effect of
> >>> 1-time
> >>> initialization - and that there's no benefit of repeatedly calling
> >>> clusterHits (where a benefit might be that it can find nested
> >>> clusters
> >>> or whatever) - is that right (that there's no benefit)?
> >>
> >> No, there is absolutely no benefit from it. It was merely to show
> >> people
> >> that the clustering needs to be warmed up a bit. I should not have
put
> >> it in the code knowing people would be confused by it. You can
safely
> >> use clusterHits just once. It will just have a small delay at the
> >> first
> >> invocation.
> >>
> >>
> >> Thanks for experimenting. Please BCC me if you have any urgent
> >> projects
> >> -- I read Lucene's list in batches and my personal e-mail I try to
> >> keep
> >> up to date with.
> >>
> >> Dawid
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message