Return-Path: Delivered-To: apmail-commons-user-archive@www.apache.org Received: (qmail 203 invoked from network); 27 Feb 2011 09:53:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Feb 2011 09:53:49 -0000 Received: (qmail 84066 invoked by uid 500); 27 Feb 2011 09:53:48 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 83529 invoked by uid 500); 27 Feb 2011 09:53:45 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 83518 invoked by uid 99); 27 Feb 2011 09:53:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Feb 2011 09:53:44 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simone.tripodi@gmail.com designates 74.125.82.49 as permitted sender) Received: from [74.125.82.49] (HELO mail-ww0-f49.google.com) (74.125.82.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Feb 2011 09:53:40 +0000 Received: by wwj40 with SMTP id 40so893928wwj.6 for ; Sun, 27 Feb 2011 01:53:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=4eDFqHeCyP8GVQl292P5ONK16iFdx0j1C/FsXt2pXBo=; b=Xwuia2h+v7ZDk+dSjSTksYj0utOgaW0W4DG31JdHPwfOwCA+oK6KCkClu1Twzl4jA1 +zHBf4prFzDdxUyxHdMnZS/4R2wseCybY0ft/2qfsFq7Dll3ypeqI5PqXeDrtKOyvfKu q19cewY6X37EnmGKXqTcGqaUhmgbMcNru5nIA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=RxCB+gSACwRxjZxxK+Jq0RWFwltAEIrmWC6tE3tOVPqyilhtDd5u0aQjp8v1FcTyqZ fMYTnnVpOK3UenXRMyIp+5dpU8b+523rHA4JKCayiaaOITABNmDJRwtDglOHeREmcM9D IrWZa4i+KXcWswH2ZKRrEhjX6ojKb8MmvsM+c= MIME-Version: 1.0 Received: by 10.227.166.11 with SMTP id k11mr3828441wby.127.1298800398643; Sun, 27 Feb 2011 01:53:18 -0800 (PST) Sender: simone.tripodi@gmail.com Received: by 10.227.133.198 with HTTP; Sun, 27 Feb 2011 01:53:18 -0800 (PST) In-Reply-To: References: Date: Sun, 27 Feb 2011 10:53:18 +0100 X-Google-Sender-Auth: kjbatPOppDDutT0dxg8TN6LW6S8 Message-ID: Subject: Re: [digester] How to deal with flexible XML ? From: Simone Tripodi To: Commons Users List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Patrick, I used Field.Index different values just to show that the Lucene rule accepts parameters. Unfortunately your question is more Lucene/domain related, I suggest you asking on Lucene ML. HTH, have a nice WE, Simo http://people.apache.org/~simonetripodi/ http://www.99soft.org/ On Sun, Feb 27, 2011 at 10:39 AM, Patrick Diviacco wrote: > hi, > > thanks for the snipper. I see in your code you are > using Field.Index.NOT_ANALYZED =C2=A0for the title. > > It is not clear to me what I should analyze and what not. I need to add > tf-idf weights to all terms of all fields. > > Should I use Field.Index.ANALYZED for all of them ? > > thnks > > > > On 27 February 2011 09:55, Simone Tripodi wrot= e: > >> Hi Patrick, >> I quickly had a look at your code and l didn't see anything wrong, the >> Digester should work either the tag is empty or not. >> >> When you will have documents such >> >> >> .. >> >> >> >> the `collection/doc/geo/(latitude|longitude)` pattern will never >> match, so set(Latitude|Longitude) methods won't be invoked. >> I can suggest you 2 options: >> >> =C2=A0* quick solution: when building the Lucene document, check if the >> latitude/longitude is not null before setting it >> >> =C2=A0 =C2=A0if (flickrDoc.getLatitude() !=3D null) { >> =C2=A0 =C2=A0 =C2=A0 =C2=A0document.add(new Field("latitude", flickrDoc.= getLatitude(), >> Field.Store.YES, Field.Index.ANALYZED)); >> =C2=A0 =C2=A0} >> >> =C2=A0* a little more complex - but more efficient - solution I wrote fo= r >> you and paste on[1], it parses & index the document into Lucene >> Document in one shot; the LuceneFieldRule is parametrized just in case >> you need to configure the Lucene Field depending on the matching >> pattern. >> >> HTH, >> Simo >> >> [1] http://pastie.org/1612471 >> >> http://people.apache.org/~simonetripodi/ >> http://www.99soft.org/ >> >> >> >> On Fri, Feb 25, 2011 at 9:21 PM, Patrick Diviacco >> wrote: >> > hi, >> > >> > I need to understand how to deal changing xml fields such as these one= s: >> > >> > >> > .. >> > >> > >> > >> > >> > .. >> > >> > =C2=A02432 >> > =C2=A02342 >> > >> > >> > >> > As you can see geo element can be empty or parent element. I need to >> > build an apposite parser to deal with it. THis is my current code, but >> > I get error since latitude not always works... >> > http://codepad.org/jpKXmGZq >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org >> For additional commands, e-mail: user-help@commons.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org