lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szűcs Roland <szucs.rol...@bookandwalk.hu>
Subject Re: MoreLikeThisHandler with mltipli input documents
Date Tue, 29 Sep 2015 09:39:50 GMT
Hi Alessandro,

My original goal was to get offline suggestsion on content based similarity
for every e-book we have . We wanted to run a bulk more like this
calculation in the evening when the usage of our site is low and we submit
a new e-book. Real time more like this can take a while as we have
typically long documents (2-5MB text) with all the content indexed.

When we upload a new document we wanted to recalculate the more like this
suggestions and a tf-idf based tag cloouds. Both of them are delivered by
the More LikeThisHandler but only for one document as you wrote.

The text input is not good for us because we need the similar doc list for
each of the matched document. If I put together text of 10 document I can
not separate which suggestion relates to which matched document and also
the tag cloud will belong to the mixed text.

Most likley we will use the MoreLikeThisHandler for each of the documents
and parse the json repsonse and store the result in a DQL database

Thanks your help.

2015-09-29 11:18 GMT+02:00 Alessandro Benedetti <benedetti.alex85@gmail.com>
:

> Hi Roland,
> what is your exact requirement ?
> Do you want to basically build a "description" for a set of documents and
> then find documents in the index, similar to this description ?
>
> By default , based on my experience ( and on the code) this is the entry
> point for the Lucene More Like This :
>
>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *org.apache.lucene.queries.mlt.MoreLikeThis/*** Return a query that will
> > return docs like the passed lucene document ID.** @param docNum the
> > documentID of the lucene doc to generate the 'More Like This" query for.*
> > @return a query that will return docs like the passed lucene document
> > ID.*/public Query like(int docNum) throws IOException {if (fieldNames ==
> > null) {// gather list of valid fields from luceneCollection<String>
> fields
> > = MultiFields.getIndexedFields(ir);fieldNames = fields.toArray(new
> > String[fields.size()]);}return createQuery(retrieveTerms(docNum));}*
>
> It means that talking about "documents" you can feed only one Solr doc.
>
> But you can also feed the MLT with simple text.
>
> So you should study better your use case and understand which option
> fits better :
>
> 1) customising the MLT component starting from Lucene
>
> 2) doing some processing client side and use the "text" similarity feature.
>
>
> Cheers
>
>
> 2015-09-29 10:05 GMT+01:00 Roland Szűcs <roland.szucs@bookandwalk.com>:
>
> > Hi all,
> >
> > Is it possible to feed multiple solr id for a MoreLikeThisHandler?
> >
> > <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
> > <lst name="defaults">
> > <str name="mlt.match.include">false</str>
> > <str name="mlt.interestingTerms">details</str>
> > <str name="mlt.fl">title,content</str>
> > <str name="mlt.minwl">4</str>
> > <str name="mlt.qf">title^12 content^1</str>
> > <str name="mlt.mintf">2</str>
> > <int name="mlt.count">10</int>
> > <str name="mlt.boost">true</str>
> > <str name="wt">json</str>
> > <str name="indent">true</str>
> > </lst>
> >   </requestHandler>
> >
> > when I call this: http://localhost:8983/solr/bandwhu/mlt?q=id:8&fl=id
> >  it works fine. Is there any way to have a kind of "bulk" call of more
> like
> > this handler . I need the intresting terms as well and as far as I know
> if
> > i use more like this as a search component it does not return with it so
> it
> > is not an alternative.
> >
> > Thanks in advance,
> >
> >
> > --
> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Roland
> Szűcs
> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Connect
> with
> > me on Linkedin <
> > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> > <https://bookandwalk.hu/>CEOPhone: +36 1 210 81 13Bookandwalk.hu
> > <https://bokandwalk.hu/>
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk
meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
-en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
<https://bokandwalk.hu/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message