lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darko Todoric <todo...@mdpi.com>
Subject Re: Search by similarity?
Date Mon, 28 Aug 2017 07:17:42 GMT
Hm... I cannot make that this DisMax work on my Solr...

In solr I have document with title:
  - "title-1-end"
  - "title-2-end"
  - "title-3-end"
  - ...
  - ...
  - "title-312-end"

and when I make query 
"*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*'

I get all documents from solr :\
What I doing wrong?

Also, I don't know if affecting results, but on "title" field I use 
"WhitespaceTokenizerFactory".

Kind regards,
Darko


On 08/25/2017 06:38 PM, Junte Zhang wrote:
> If you already have the title of the document, then you could run that title as a new
query against the whole index and exclude the source document from the results as a filter.
>
> You could use the DisMax query parser: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
>
> And then set the minimum match ratio of the OR clauses to 90%.
>
> /JZ
>
> -----Original Message-----
> From: Darko Todoric [mailto:todoric@mdpi.com]
> Sent: Friday, August 25, 2017 5:49 PM
> To: solr-user@lucene.apache.org
> Subject: Search by similarity?
>
> Hi,
>
>
> I have 90.000.000 documents in Solr and I need to compare "title" of this document and
get all documents with more than 80% similarity. PHP have "similar_text" but it's not so smart
inserting 90m documents in the array...
> Can I do some query in Solr which will give me the more the 80% similarity?
>
>
> Kind regards,
> Darko Todoric
>
> --
> Darko Todoric
> Web Engineer, MDPI DOO
> Veljka Dugosevica 54, 11060 Belgrade, Serbia
> +381 65 43 90 620
> www.mdpi.com
>
> Disclaimer: The information and files contained in this message are confidential and
intended solely for the use of the individual or entity to whom they are addressed.
> f you have received this message in error, please notify me and delete this message from
your system.
> You may not copy this message in its entirety or in part, or disclose its contents to
anyone.
>

-- 
Darko Todoric
Web Engineer, MDPI DOO
Veljka Dugosevica 54, 11060 Belgrade, Serbia
+381 65 43 90 620
www.mdpi.com

Disclaimer: The information and files contained in this message are confidential
and intended solely for the use of the individual or entity to whom they are addressed.
f you have received this message in error, please notify me and delete this message from your
system.
You may not copy this message in its entirety or in part, or disclose its contents to anyone.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message