lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject Re: Search Expansion
Date Fri, 02 Apr 2004 17:41:59 GMT
I'll add an entry on the wiki describing ways to handle hierarchal
information if the info moves to the FAQ, we can remove it then.

cheers,
sv

On Fri, 2 Apr 2004, Tatu Saloranta wrote:

> On Friday 02 April 2004 02:30, hgadm@cswebmail.com wrote:
> > Dear all,
> >
> > I want to do expand a search in order to retrieve
> > matching XML documents with the help of a domain
> > taxonomy. That means if someone is specifying a term
> > high up in the taxonomy it will have lots of
> > subconcepts.
> >
> > Everything is working fine so far except that Lucene
> > creates an 'input string too long' error when I ask for
> > e.g. subject:(term001 ... term800).
>
> If I'm not mistaken, perhaps you shouldn't expand terms at all, but do path
> query with components instead. This effectively means that instead of your
> app flattening the structure and ending up with hundreds of leafs to match,
> you use Lucene in sort of "hierarchy-aware" way.
> There have been a few questions (and answers) regarding implementation of such
> a feature.
> Almost seems like there should be a FAQ entry (or Eric could add an example to
> his book? :-) ).
>
> There are at least 2 way to do this; one is to combine one long 'word' (unit
> analyzer does not split into separate tokens, ie. words), something like:
>
> doohickeys-gadgets-foobar
>
> and search using prefix query ("doohickeys-gadgets-*"), or:
>
> STARTMARKER doohickeys gadgets foobar ENDMARKER
>
> and use phrase query ("STARTMARKER doohickeys gadgets"). (STARTMARKER and
> ENDMARKER only if components are not guaranteed to be unique, and one needs
> to make sure query is restricted to individual classification entry)
>
> Both approaches can be varied by using some internal ids instead of actual
> Strings (UUIDs, sequence numbers).
>
> Does above make sense?
>
> -+ Tatu +-
>
> >
> > I am aware that it is not a usual task for a search
> > engine to take several hundered terms as input.
> > Is there a distinct limit (I haven't got the source
> > yet) and - more important - is there a configuration to
> > work around ?
> >
> > If not I would break down the input terms in blocks and
> > send them to Lucene sequentially...
> > However I would prefer I I could adjust the limit.
> >
> > Is search expansion something you as developers are
> > looking into ?
> >
> > Thanks
> >
> > Holger
> >
> >
> > ___________________________________________________
> > The ALL NEW CS2000 from CompuServe
> >  Better!  Faster! More Powerful!
> >  250 FREE hours! Sign-on Now!
> >  http://www.compuserve.com/trycsrv/cs2000/webmail/
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message