lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Search in all fields
Date Mon, 19 Feb 2007 19:01:31 GMT
Sure. Convert your simple queries into span queries (which are also
relatively simple). Then, when you index everything in the "all" field,
subclass your analyzer to return a large PositionIncrementGap. Explaining
how this works with words is awkward, so....

doc.add("all", "one two three");
doc.add("all", "four five six");
doc.add("all", "seven eight nine");
index the document.

Assume you've implemented an analyzer that returns 1000 for

Now, the term offsets in the single document will be
one - 0
two - 1
three - 2
four 1003
five 1004
six 1005
seven 2006
eight 2007
nine 2008

Now, if you use SpanNearQuery with a slop of 900 (i.e. "one nine"~900) you
won't get a match because the "distance" between one and nine is more than
900. But "one three"~900 will match.

It's possible to transform any query into a set of span queries, See the
thread "Multiword Highlighting" that Mark Miller and I were exchanging ideas
on recently. Be aware that the code we were talking about has to have a
modification when used on a "regular" index where it pays attention to the
document that each sub-clause comes. The code, as written, assumes you're
using a MemoryIndex for one and only one document, so unless you need
complex queries, I'd just think about rewriting simple queries with ANDs as
a SpanNearQuery.


On 2/19/07, Kainth, Sachin <> wrote:
> Hi All,
> I want to be able to do a search for a term in all fields in a document.
> One way this can be done is to put every element of a document in the
> default field (or I guess any other single named field) as well as
> separate fields in which those elements belong.  So for example if for
> my documents I had the following fields:
> A, B, C, D and E
> If I then set up a field called
> All
> And for all documents I processed as well as putting the elements of
> that document in A, B, C, D and E I would also put them as a
> concatenation into All as well.
> One problem with this is that if for a particular document I had these
> values for my five fields:
> A -> Hello
> B -> How
> C -> Are
> D -> You
> E -> Mate
> (All -> Hello How Are You Mate)
> Then a search for "How Are You" in All would return true when no single
> field contains this string which is not ideal.
> Another problem with this is that it would double the size of the index
> (unless Lucene does something clever here).
> A way to solve the original issue is to convert the search for "How Are
> You" into this:
> A:How Are You OR B:How Are You OR C:How Are You OR D:How Are You OR
> E:How Are You
> This solves both the problems of the solution where we set up the All
> field (viz. increasing the size of the index and  bringing back more
> results than we should).
> However, this solution also has it's drawback and that is that now we
> have gone from a simple query to a complex ANDing of all fields in the
> document.
> My question is this: is there a third way?
> Cheers
> Sachin
> This email and any attached files are confidential and copyright
> protected. If you are not the addressee, any dissemination of this
> communication is strictly prohibited. Unless otherwise expressly agreed in
> writing, nothing stated in this communication shall be legally binding.
> The ultimate parent company of the Atkins Group is WS Atkins
> plc.  Registered in England No. 1885586.  Registered Office Woodcote Grove,
> Ashley Road, Epsom, Surrey KT18 5BW.
> Consider the environment. Please don't print this e-mail unless you really
> need to.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message