lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zebrowski, Zak" <...@mitre.org>
Subject RE: [lucy-user] C library - Phrase Searches
Date Tue, 28 Nov 2017 18:00:36 GMT
You want to look to see how the documents are being analyzed... Look at the Lucy::Analysis
set of perl modules (for hints as to how this is handled in c).  You can write your own analyzer
to take into special cases for you.  $0.02

Zachary Zebrowski
Forensic Database Engineer / Division Mentoring Liaison
The MITRE Corporation
(W) 202-406-6346
(C) 571-232-5643
(AR) KM4ZZE

-----Original Message-----
From: serkanmulayim@gmail.com [mailto:serkanmulayim@gmail.com] 
Sent: Tuesday, November 28, 2017 12:56 PM
To: user@lucy.apache.org
Subject: [lucy-user] C library - Phrase Searches

Hi guys again :)

I have a question regarding the phrase searches and their scoring. As I see when we search
for a phrase in quotation marks, e.g. "the united states", only messages that contain "the
united states" are being returned. (to be more exact messages containing "the unite state"
would have returned as well).

My question is how is such queries being handled in the library. Is it by looking at the consecutive
term positions in documents? What is the performance impact for such queries?

Secondly how are they being scored? Is it still tf/idf? If so what is the definition of tf
and of idf, for these queries?

Thanks as always,
Serkan


Mime
View raw message