lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Assign rich-text document's title name from clustering results
Date Wed, 10 Jun 2015 09:31:40 GMT
The main objective here is actually to assign a title to the documents as
they are being indexed.

We actually found that the cluster labels provides a good information on
the key points of the documents, but I'm not sure if we can get a good
cluster labels with a single documents.

Besides getting from cluster labels, is there other methods which we can
use to assign a title?


Regards,
Edwin


On 10 June 2015 at 17:16, Alessandro Benedetti <benedetti.alex85@gmail.com>
wrote:

> Hi Edwin,
> let's do this step by step.
>
> Clustering is problem solved by unsupervised machine learning algorithms.
> The scope of clustering is to group per similarity a corpus of documents,
> trying to have meaningful groups for a human being.
> Solr currently provides different approaches for *Query Time Clustering* (
> also known Online Clustering).
> There's an out of the box integration that allows you to use clustering at
> query time on the query results.
> Different algorithms can be selected, mainly provided by Carrots2 .
>
> This algorithms also provide a guess for the cluster name.
>
> Given this introduction let me see your problem.
>
> 1) The first part can be solved with a custom UpdateProcessor that will
> process the document and add the automatic new title.
> Now the problem is, how we want to extract this new title ?
> Honestly I can not understand how clustering can fit here …
>
> 2) Index time clustering is not yet provided in Solr ( I remember there was
> only an interface ready, but no implementation) .
> You should cluster the content before indexing it in Solr using a machine
> Learning library.
> Indexing time clustering is delicate. What will happen to the next re-Index
> ? Should we cluster everything again ?
> This topic must be investigated more.
>
> Anyway, let me know as the original problem maybe does not require the
> clustering.
>
> Cheers
>
>
> 2015-06-10 4:13 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
>
> > Hi,
> >
> > I'm currently using Solr 5.1, and I'm thinking of ways to allow the
> system
> > to automatically give the rich-text documents that are being indexed a
> > title automatically, instead of user entering it in manually, as we might
> > have to index a whole folder of documents together, so it is not wise for
> > the user to enter the title one by one.
> >
> > I would like to check, if it's possible to run the clustering, get the
> > results, and use the top score label to be the title of the document?
> > Apparently, we need to run the clustering prior to the indexing, so I'm
> not
> > sure if that is possible.
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message