lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Problems with "tagged" and "non tagged" text
Date Sun, 12 Dec 2010 02:28:14 GMT
Unless you provide details on how you are indexing these documents,
it's pretty hard to help.

It's also hard to reconcile your statement that OR is the default operator
with
the results you posted, the '+' all over the place really points to AND
as the default.

There's no magic in Lucene that will automatically put the "content" of
an (X)HTM document in the content field of your document, how are you
insuring that the doc is indexed as you expect?

Luke is a very valuable tool for inspecting your index to see if it is what
you think it is...

Best
Erick

On Sat, Dec 11, 2010 at 8:34 PM, Celso Fontes <celsowm@gmail.com> wrote:

> Hi, i have the same text in two files:
>
> ****TXT      file: http://pastebin.com/u9Rd9VVA
> ****(X)HTM file: http://pastebin.com/ydHmTQZ8
>
> And i running this Question:
>
>   APC (adenomatous polyposis coli) actin assembly
>
> with OR operator and SNOWBALL Analyser results in:
>
>    +content:apc +(+content:adenomat +content:polyposi +content:coli)
> +content:actin +content:assembl
>
>
> But... only txt returns ok, why?
>
>
> ps: if i try without "()" i got the same result....
> Thanks,
> Celso
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message