lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Malgorzata Urbanska <urban...@cs.colostate.edu>
Subject Re: compare paragraphs of text - which Query Class to use?
Date Fri, 14 Jun 2013 16:44:35 GMT
thanks, I will try it
gosia


On Fri, Jun 14, 2013 at 10:33 AM, Jack Krupansky <jack@basetechnology.com>wrote:

> First, start with Solr and use the edismax query parser with the default
> query operator as "OR" and set pf, pf2, and pf3, and then simply query by
> the raw text of the paragraph. This will order the results by how closely
> the indexed paragraphs match the query paragraph.
>
> This is also a good technique for detecting plagiarism where a lot of the
> text is similar if not identical.
>
> Once you get experience using this technique in Solr, then simply look at
> the parsed query that edismax generates and do the same in your Lucene Java
> code.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Malgorzata Urbanska
> Sent: Friday, June 14, 2013 12:23 PM
> To: java-user@lucene.apache.org
> Subject: compare paragraphs of text - which Query Class to use?
>
>
> Hello,
>
> I've just started using Lucene and I'm not sure which Query Classes I
> should use in my project.
>
> My goal is to compare paragraphs of text. Paragraph A is a query and
> paragraph B is a document for which I would like to calculate similarity
> score.
>
> the paragraphs A and B can be in some situations exactly the same or not.
> Generally I would like to check do they talk about the same topic.
>
> In my project I have set of paragraphs A and set of paragraphs B, so I'm
> looking for some universal solution which allow me to check similarity
> score for each paragraph A all paragraphs B.
>
> Do you have any suggestions? I really appreciate all of the ideas.
>
> --
> gosia
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message