lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Celso Fontes <cels...@gmail.com>
Subject Re: Problems with "tagged" and "non tagged" text
Date Sun, 12 Dec 2010 02:46:23 GMT
Dear Erick,
Sorry i am using really "AND" operator, i wrote wrong in email (i am
very tired)...
But..Follow the 'main' part of code:

            Document document = new Document();
            String path = file.getCanonicalPath();

            document.add(new Field("title", path,
                    Field.Store.YES,
                    Field.Index.ANALYZED));

            Reader reader = new FileReader(file);
            document.add(new Field("content", reader));

As you can see I do indexing ! and...
with the others questions, i have a good result with htm files...this
htm, for example, is good for this question:
******APC (adenomatous polyposis coli) Colon Cancer

Thanks,
Celso.

2010/12/12 Erick Erickson <erickerickson@gmail.com>:
> Unless you provide details on how you are indexing these documents,
> it's pretty hard to help.
>
> It's also hard to reconcile your statement that OR is the default operator
> with
> the results you posted, the '+' all over the place really points to AND
> as the default.
>
> There's no magic in Lucene that will automatically put the "content" of
> an (X)HTM document in the content field of your document, how are you
> insuring that the doc is indexed as you expect?
>
> Luke is a very valuable tool for inspecting your index to see if it is what
> you think it is...
>
> Best
> Erick
>
> On Sat, Dec 11, 2010 at 8:34 PM, Celso Fontes <celsowm@gmail.com> wrote:
>
>> Hi, i have the same text in two files:
>>
>> ****TXT      file: http://pastebin.com/u9Rd9VVA
>> ****(X)HTM file: http://pastebin.com/ydHmTQZ8
>>
>> And i running this Question:
>>
>>   APC (adenomatous polyposis coli) actin assembly
>>
>> with OR operator and SNOWBALL Analyser results in:
>>
>>    +content:apc +(+content:adenomat +content:polyposi +content:coli)
>> +content:actin +content:assembl
>>
>>
>> But... only txt returns ok, why?
>>
>>
>> ps: if i try without "()" i got the same result....
>> Thanks,
>> Celso
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message