lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: combine wildcard and phrase query
Date Thu, 06 Mar 2008 13:38:43 GMT
No, as far as I know you can't combine wildcards in phrases. This would
get extraordinarily ugly extraordinarily quickly. The way Lucene handles
wildcards (conceputally) is to expand all the possible terms into a large OR
clause. Say my index contains term1, term2, and term3. The search for term*
really expands into term1 OR term2 OR term3. Now imagine the
complexity of a phrase like "dog* cat* hors*". Now say your index contained
10 terms starting with dog, 10 with cat and 10 with hors. You'd have 1,000
ORed phrase queries. And this is a tiny example....

You can try various approximations, and depending upon your index size they
may or may not work. For instance, you could index all the successive
shorter
forms. with increments of 0 (see synonym analyzer)  I.e. index horse, hors$
hor$
ho$ h$ all in the same position. Then searching for hor* becomes searching
for
hor$ and it all "just works". Of course this makes your index bigger.....

About your second issue: I'm not clear what your trying to accomplish. It's
no
problem to add the same field multiple times for a document. That is, you
can
doc.add(new field("field1", ......)
doc.add(new field("field1", ......)
doc.add(new field("field1", ......)
doc.add(new field("field1", ......)
as many times as you want before you add the document to the index. For
retrieval you can call getFields ("field1") and get an array of Fields back,
one
for each call to add above. You can also set the PositionIncrementGap while
indexing to separate the termposition of the first term of successive add()
calls
by, say, 100 (or whatever) if you need to worry about SpanNear or some such.

This may be waaaay off base. If so, could you give a concrete example of
what
your inputs are and how you want to search them?

Best
Erick

On Thu, Mar 6, 2008 at 7:28 AM, JensBurkhardt <jensburkhardt@web.de> wrote:

>
> okay, another problem occured. I have different fields with the same name.
> I
> can't seperate them like naming them field1 field2 etc. cause while
> indexing
> i don't know how many fields i will need.
> Like a book has several signature numbers i want to save them in a field
> signature and when i search for such a number i want the search hit every
> single field and not all fields together.
> Right now i separate the string using an unique separator (in this case
> just
> $$$) so i can split the string into the numbers but i think this is kinda
> the worst form doing it.
>
>
>
>
> JensBurkhardt wrote:
> >
> > hey everybody,
> >
> > I'm wondering if it's possible to combine wildcards and phrase query.
> >
> > For example "term1 term*"
> >
> > I know that the documentation says "Lucene supports single and multiple
> > character wildcard searches within single terms (not within phrase
> > queries)" but maybe someone has had the same problem and found a
> solution.
> >
> > Thanks for your help
> >
> > Jens Burkhardt
> >
>
> --
> View this message in context:
> http://www.nabble.com/combine-wildcard-and-phrase-query-tp15870647p15872169.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message