lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: Search Expansion
Date Fri, 02 Apr 2004 15:19:22 GMT
On Friday 02 April 2004 02:30, wrote:
> Dear all,
> I want to do expand a search in order to retrieve
> matching XML documents with the help of a domain
> taxonomy. That means if someone is specifying a term
> high up in the taxonomy it will have lots of
> subconcepts.
> Everything is working fine so far except that Lucene
> creates an 'input string too long' error when I ask for
> e.g. subject:(term001 ... term800).

If I'm not mistaken, perhaps you shouldn't expand terms at all, but do path 
query with components instead. This effectively means that instead of your 
app flattening the structure and ending up with hundreds of leafs to match, 
you use Lucene in sort of "hierarchy-aware" way.
There have been a few questions (and answers) regarding implementation of such 
a feature.
Almost seems like there should be a FAQ entry (or Eric could add an example to 
his book? :-) ).

There are at least 2 way to do this; one is to combine one long 'word' (unit 
analyzer does not split into separate tokens, ie. words), something like:


and search using prefix query ("doohickeys-gadgets-*"), or:

STARTMARKER doohickeys gadgets foobar ENDMARKER

and use phrase query ("STARTMARKER doohickeys gadgets"). (STARTMARKER and 
ENDMARKER only if components are not guaranteed to be unique, and one needs 
to make sure query is restricted to individual classification entry)

Both approaches can be varied by using some internal ids instead of actual 
Strings (UUIDs, sequence numbers).

Does above make sense?

-+ Tatu +-

> I am aware that it is not a usual task for a search
> engine to take several hundered terms as input.
> Is there a distinct limit (I haven't got the source
> yet) and - more important - is there a configuration to
> work around ?
> If not I would break down the input terms in blocks and
> send them to Lucene sequentially...
> However I would prefer I I could adjust the limit.
> Is search expansion something you as developers are
> looking into ?
> Thanks
> Holger
> ___________________________________________________
> The ALL NEW CS2000 from CompuServe
>  Better!  Faster! More Powerful!
>  250 FREE hours! Sign-on Now!
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message