lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: combine wildcard and phrase query
Date Fri, 07 Mar 2008 20:08:22 GMT
Have you considered indexing that field UN_TOKENIZED? Make sure
you build your queries that way too. I'm not at *all* clear about
how this works with wildcards, so you'll have to test that.

This assumes you never want to just be able to search on LA
and get a hit.

Best
Erick

On Fri, Mar 7, 2008 at 10:35 AM, JensBurkhardt <jensburkhardt@web.de> wrote:

>
> hi again,
>
> referring to my second issue, i've got another question. I mean, this
> field
> thing works pretty well but:
> My fields look like:
> signature: LA A 100
> signature: LA A 201
> signature: LA A 202
> signature: LA B 200
> signature: LC B 300
> Now i use getFields and search them.
> Let's assume i'm searching for a signature like "LA B 200". If i use a
> phrase query, no problem. I search all the fields and the only if the
> field
> value and query exactly match, i get a hit.
> But what if you want to use wildcards and search for something like LA A
> 20*. Now all the LA signatures will be in my results even if i just want
> two
> of them. The problems are the blanks but i have no idea how this could
> work.
>
> Thanks and have a nice evening
>
> Jens
>
>
> Erick Erickson wrote:
> >
> > No, as far as I know you can't combine wildcards in phrases. This would
> > get extraordinarily ugly extraordinarily quickly. The way Lucene handles
> > wildcards (conceputally) is to expand all the possible terms into a
> large
> > OR
> > clause. Say my index contains term1, term2, and term3. The search for
> > term*
> > really expands into term1 OR term2 OR term3. Now imagine the
> > complexity of a phrase like "dog* cat* hors*". Now say your index
> > contained
> > 10 terms starting with dog, 10 with cat and 10 with hors. You'd have
> 1,000
> > ORed phrase queries. And this is a tiny example....
> >
> > You can try various approximations, and depending upon your index size
> > they
> > may or may not work. For instance, you could index all the successive
> > shorter
> > forms. with increments of 0 (see synonym analyzer)  I.e. index horse,
> > hors$
> > hor$
> > ho$ h$ all in the same position. Then searching for hor* becomes
> searching
> > for
> > hor$ and it all "just works". Of course this makes your index
> bigger.....
> >
> > About your second issue: I'm not clear what your trying to accomplish.
> > It's
> > no
> > problem to add the same field multiple times for a document. That is,
> you
> > can
> > doc.add(new field("field1", ......)
> > doc.add(new field("field1", ......)
> > doc.add(new field("field1", ......)
> > doc.add(new field("field1", ......)
> > as many times as you want before you add the document to the index. For
> > retrieval you can call getFields ("field1") and get an array of Fields
> > back,
> > one
> > for each call to add above. You can also set the PositionIncrementGap
> > while
> > indexing to separate the termposition of the first term of successive
> > add()
> > calls
> > by, say, 100 (or whatever) if you need to worry about SpanNear or some
> > such.
> >
> > This may be waaaay off base. If so, could you give a concrete example of
> > what
> > your inputs are and how you want to search them?
> >
> > Best
> > Erick
> >
> > On Thu, Mar 6, 2008 at 7:28 AM, JensBurkhardt <jensburkhardt@web.de>
> > wrote:
> >
> >>
> >> okay, another problem occured. I have different fields with the same
> >> name.
> >> I
> >> can't seperate them like naming them field1 field2 etc. cause while
> >> indexing
> >> i don't know how many fields i will need.
> >> Like a book has several signature numbers i want to save them in a
> field
> >> signature and when i search for such a number i want the search hit
> every
> >> single field and not all fields together.
> >> Right now i separate the string using an unique separator (in this case
> >> just
> >> $$$) so i can split the string into the numbers but i think this is
> kinda
> >> the worst form doing it.
> >>
> >>
> >>
> >>
> >> JensBurkhardt wrote:
> >> >
> >> > hey everybody,
> >> >
> >> > I'm wondering if it's possible to combine wildcards and phrase query.
> >> >
> >> > For example "term1 term*"
> >> >
> >> > I know that the documentation says "Lucene supports single and
> multiple
> >> > character wildcard searches within single terms (not within phrase
> >> > queries)" but maybe someone has had the same problem and found a
> >> solution.
> >> >
> >> > Thanks for your help
> >> >
> >> > Jens Burkhardt
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/combine-wildcard-and-phrase-query-tp15870647p15872169.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/combine-wildcard-and-phrase-query-tp15870647p15896083.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message