lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: variable string search
Date Fri, 13 Sep 2013 17:59:31 GMT
Brian,

  It looks like "variable" is variable; and you'll probably want to use some combination of
PhraseQuery, FuzzyQuery and maybe BooleanQuery.  I've made my best guess at what the underlying
types of Queries would be that would meet your use cases below.

"free text" : Doc1, Doc2  :: PhraseQuery
"version text" : Doc2, Doc 3 :: PhraseQuery with slop or BooleanQuery depending on what exactly
you mean
"text version" : Doc2, Doc3 :: PhraseQuery with slop 
"some version text" : Doc2, Doc3  :: BooleanQuery (I don't see some in your documents)??
"long" : Doc2 :: You'll need to use a stemming  analyzer to match this or use FuzzyQuery with
maxEdits = 2 (long~2)
"anothr" : Doc3 :: FuzzyQuery with maxEdits = 1 (anothr~1)

And maybe even:

"another longer free text" : Doc1, Doc2, Doc3  :: BooleanQuery

FuzzyQuery captures variation within a token (Levenshtein edit distance, er, Optimal String
Alignment...you can get from another to anothr with only one keystroke difference); PhraseQuery
allows for flexibility for combinations of tokens.

  Do you need to generate your queries by hand in code or would a query parser help out (see
this for the classic parser's syntax: http://lucene.apache.org/core/4_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html).

  Best,

            Tim

-----Original Message-----
From: Wasikowski, Brian [ JRDUS] [mailto:bwasikow@its.jnj.com] 
Sent: Friday, September 13, 2013 1:03 PM
To: java-user@lucene.apache.org
Subject: variable string search

First let me start by saying:  I'm sorry!

I know this question has probably been asked and answered already, but I am new to this project
and just trying to get up to speed.  I do have a very simple example working, but not quite
how I'd like.   So let me explain what I'd like to do and see if the community can suggest
the proper analyzer and query.

Consider the following data to be indexed:

Doc1: This is free text
Doc2: This is a longer version of free text
Doc3: Yet another version of text

Right now, the working example I have will match a query for "text" with all 3 documents.
However, any combination of more words or partials do not work.

Uses cases:

"free text" : Doc1, Doc2  :: PhraseQuery
"version text" : Doc2, Doc 3 :: PhraseQuery with slop or BooleanQuery depending on what exactly
you mean
"text version" : Doc2, Doc3 :: PhraseQuery with slop and no directionality
"some version text" : Doc2, Doc3  :: BooleanQuery ??
"long" : Doc2 :: You'll need to use a stemming  analyzer to match this or use FuzzyQuery long~2
"anothr" : Doc3 :: FuzzyQuery another~1

And maybe even:

"another longer free text" : Doc1, Doc2, Doc3  :: BooleanQuery

Any help is appreciated.  Here are the components I am currently using:

Lucene.Net.Analysis.Standard.StandardAnalyzer
Lucene.Net.Search.Query query = new Lucene.Net.Search.FuzzyQuery
Lucene.Net.Search.TopDocs hits = searcher.Search

________________________________________
Brian Wasikowski
Director, HIT Alliances and Support
Janssen Diagnostics, Inc.
Tel: +1 919 786 9153
Fax: +1 919 882 0913
Email: bwasikow@its.jnj.com<mailto:bwasikow@its.jnj.com>
Web: www.janssendiagnostics.com

IntraLinks Courier Dropbox<https://services.intralinks.com/ILClient/courier/lockbox.html?p1=331268255215951899&p2=QnJpYW4gV2FzaWtvd3NraQ%3D%3D>

Confidentiality Notice: This e-mail transmission may contain confidential or legally privileged
information that is intended only for the individual or entity named in the e-mail address.
If you are not the intended recipient, you are hereby notified that any disclosure, copying,
distribution, or reliance upon the contents of this e-mail is strictly prohibited. If you
have received this e-mail transmission in error, please reply to the sender, so that Johnson
& Johnson can arrange for proper delivery, and then please delete the message from your
inbox. Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message