lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: 1:n queries again
Date Wed, 12 Nov 2008 15:07:54 GMT
Christian,

If I understand your situation correctly, you should look at sloppy phrases and at Span family
of queries.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Christian Reuschling <christian.reuschling@gmail.com>
To: java-user@lucene.apache.org
Sent: Wednesday, November 12, 2008 8:58:53 AM
Subject: 1:n queries again

Hello Friends,

In order to offer some simple 1:n matching, currently we create several, counted
attributes and expand our queries that we search inside each attribute, e.g.:

Query 'attName:myTerm'  => Query 'attName1:myTerm attName2:myTerm'

This is not the fastest way, and sometimes not easy to handle - also we have to
consider the 1:n attributes during indexing, and must remember the highest 'n'
for query expansion. We get very big queries.


Currently I have some other scenario in mind, but I'm not sure how I can achieve
this. The idea is to write all n datasets into one attribute, with a specialized
start and end delimiter term, e.g.:

document entry for attName:
"startDelimiter myterm1 myterm2 endDelimiter startDelimiter myterm3 myterm4 endDelimiter"

When I look to this, it would go somehow into the direction of a PhraseQuery,
where I can search e.g. for

attName:"startDelimiter myterm1 myterm2 endDelimiter"
but the query
attName:"startDelimiter myterm1 myterm4 endDelimiter"

would not match.

The only thing that lacks now is that the queries
attName:"startDelimiter myterm1 endDelimiter"
attName:"startDelimiter myterm2 myterm1 endDelimiter"

also should match - which of course isn't possible with the current PhraseQuery
implementation.

Best would be some construct like attName:"startDelimiter (myterm1 myterm2) endDelimiter"

Whereby the stuff inside the brackets would be a standard BooleanQuery, but only
applied inside the range of the delimiters. Is this somehow possible, or do I
have to write my own Query implementation - and what would be the best way in this case.


Thanks in advance

Christian Reuschling
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message