lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Malgorzata Urbanska <>
Subject Re: compare paragraphs of text - which Query Class to use?
Date Fri, 14 Jun 2013 16:44:35 GMT
thanks, I will try it

On Fri, Jun 14, 2013 at 10:33 AM, Jack Krupansky <>wrote:

> First, start with Solr and use the edismax query parser with the default
> query operator as "OR" and set pf, pf2, and pf3, and then simply query by
> the raw text of the paragraph. This will order the results by how closely
> the indexed paragraphs match the query paragraph.
> This is also a good technique for detecting plagiarism where a lot of the
> text is similar if not identical.
> Once you get experience using this technique in Solr, then simply look at
> the parsed query that edismax generates and do the same in your Lucene Java
> code.
> -- Jack Krupansky
> -----Original Message----- From: Malgorzata Urbanska
> Sent: Friday, June 14, 2013 12:23 PM
> To:
> Subject: compare paragraphs of text - which Query Class to use?
> Hello,
> I've just started using Lucene and I'm not sure which Query Classes I
> should use in my project.
> My goal is to compare paragraphs of text. Paragraph A is a query and
> paragraph B is a document for which I would like to calculate similarity
> score.
> the paragraphs A and B can be in some situations exactly the same or not.
> Generally I would like to check do they talk about the same topic.
> In my project I have set of paragraphs A and set of paragraphs B, so I'm
> looking for some universal solution which allow me to check similarity
> score for each paragraph A all paragraphs B.
> Do you have any suggestions? I really appreciate all of the ideas.
> --
> gosia
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**<>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message