lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svetlana <>
Subject Searching for sentences containing a list of words with a configurable number of words not in the list inbetween?
Date Mon, 09 Jul 2012 18:28:42 GMT

I am just about to work through the demo and get to know lucene now I
actually got it to build :)  I was wondering if someone could point me in
the right direction for my project.  

I want to query using a list of words but the order that they appear in and
how common they are is not relevant (i.e. no 'stop words' if I got that
terminology correct).  The only relevant thing is how closely grouped they
are and how many of the words in the list occur, and I want to be able to
configure from 0 (no other non-queried words inbetween) until 'n'
non-queried words inbetween.

So for example, if I query for 'a and in house I go together or' (stupid
example I guess) and specify 0 words inbetween then I would only want to get
hits with those query words in any order sorted by relevance based on how
many of those words occured.  For example:

'In a house together' may be the most relevant result

If I specify 1 other none query word allowed, results may look like

1. 'In a house together.'
2. 'In a house sleeping together.'  ('sleeping' being the one extra word

These should also be complete sentences or clauses, i.e. not 'fragments' - I
guess I need to use a grammar analyser to determine that.

Any help very much appreciated, I realise that this is probably deceptively
difficult but if anyone can give some pointers that would be amazing.


View this message in context:
Sent from the Lucene - Java Developer mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message