lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: compare paragraphs of text - which Query Class to use?
Date Fri, 14 Jun 2013 16:33:14 GMT
First, start with Solr and use the edismax query parser with the default 
query operator as "OR" and set pf, pf2, and pf3, and then simply query by 
the raw text of the paragraph. This will order the results by how closely 
the indexed paragraphs match the query paragraph.

This is also a good technique for detecting plagiarism where a lot of the 
text is similar if not identical.

Once you get experience using this technique in Solr, then simply look at 
the parsed query that edismax generates and do the same in your Lucene Java 
code.

-- Jack Krupansky

-----Original Message----- 
From: Malgorzata Urbanska
Sent: Friday, June 14, 2013 12:23 PM
To: java-user@lucene.apache.org
Subject: compare paragraphs of text - which Query Class to use?

Hello,

I've just started using Lucene and I'm not sure which Query Classes I
should use in my project.

My goal is to compare paragraphs of text. Paragraph A is a query and
paragraph B is a document for which I would like to calculate similarity
score.

the paragraphs A and B can be in some situations exactly the same or not.
Generally I would like to check do they talk about the same topic.

In my project I have set of paragraphs A and set of paragraphs B, so I'm
looking for some universal solution which allow me to check similarity
score for each paragraph A all paragraphs B.

Do you have any suggestions? I really appreciate all of the ideas.

-- 
gosia 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message