From java-user-return-48080-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Sun Dec 12 02:28:43 2010 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 42642 invoked from network); 12 Dec 2010 02:28:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Dec 2010 02:28:42 -0000 Received: (qmail 52683 invoked by uid 500); 12 Dec 2010 02:28:40 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 52627 invoked by uid 500); 12 Dec 2010 02:28:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 52619 invoked by uid 99); 12 Dec 2010 02:28:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Dec 2010 02:28:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Dec 2010 02:28:35 +0000 Received: by qwh6 with SMTP id 6so5341053qwh.35 for ; Sat, 11 Dec 2010 18:28:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=WkVfpXwCiI0s97/jPYxWW2ZZlFOLZSFLa4Ta52rHof4=; b=lJdFRhKl12G0Y0gRfoCnagnIunfnXE94wem71lvoUNevSHOKxeFIAVOtszyxxPhU6P 8keentEBLKgGP+cw8BF+hhqFU4K3lusMx5nRyCoqThzFGWQhwr+W09O386pI3CVakA1p u3RKCuSzIHQ3rBGPJeik9ixwKansUy34cq/gs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=kXEMTogEK4/92nHwV6feuQjkN5lcFDoriWpvGE6587BE0OZuNq9yiMkoiw8Dnn/ObX MsTaTby5WAFCBlqAfDmQHzXDuBuIFsQjvVLHdD3mp11JMJr8ZYOOFbCDH5sy6sWvZfcQ joSXZQEsokT0Fr1YugTYcRLCSod2chj01PC6I= MIME-Version: 1.0 Received: by 10.229.249.203 with SMTP id ml11mr2382962qcb.266.1292120894199; Sat, 11 Dec 2010 18:28:14 -0800 (PST) Received: by 10.229.235.208 with HTTP; Sat, 11 Dec 2010 18:28:14 -0800 (PST) In-Reply-To: References: Date: Sat, 11 Dec 2010 21:28:14 -0500 Message-ID: Subject: Re: Problems with "tagged" and "non tagged" text From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e64caa7ee7d74804972d56c1 --0016e64caa7ee7d74804972d56c1 Content-Type: text/plain; charset=ISO-8859-1 Unless you provide details on how you are indexing these documents, it's pretty hard to help. It's also hard to reconcile your statement that OR is the default operator with the results you posted, the '+' all over the place really points to AND as the default. There's no magic in Lucene that will automatically put the "content" of an (X)HTM document in the content field of your document, how are you insuring that the doc is indexed as you expect? Luke is a very valuable tool for inspecting your index to see if it is what you think it is... Best Erick On Sat, Dec 11, 2010 at 8:34 PM, Celso Fontes wrote: > Hi, i have the same text in two files: > > ****TXT file: http://pastebin.com/u9Rd9VVA > ****(X)HTM file: http://pastebin.com/ydHmTQZ8 > > And i running this Question: > > APC (adenomatous polyposis coli) actin assembly > > with OR operator and SNOWBALL Analyser results in: > > +content:apc +(+content:adenomat +content:polyposi +content:coli) > +content:actin +content:assembl > > > But... only txt returns ok, why? > > > ps: if i try without "()" i got the same result.... > Thanks, > Celso > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e64caa7ee7d74804972d56c1--