commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: [digester] How to deal with flexible XML ?
Date Sun, 27 Feb 2011 09:39:37 GMT
hi,

thanks for the snipper. I see in your code you are
using Field.Index.NOT_ANALYZED  for the title.

It is not clear to me what I should analyze and what not. I need to add
tf-idf weights to all terms of all fields.

Should I use Field.Index.ANALYZED for all of them ?

thnks



On 27 February 2011 09:55, Simone Tripodi <simonetripodi@apache.org> wrote:

> Hi Patrick,
> I quickly had a look at your code and l didn't see anything wrong, the
> Digester should work either the <geo> tag is empty or not.
>
> When you will have documents such
>
> <doc>
> ..
> <geo></geo>
> </doc>
>
> the `collection/doc/geo/(latitude|longitude)` pattern will never
> match, so set(Latitude|Longitude) methods won't be invoked.
> I can suggest you 2 options:
>
>  * quick solution: when building the Lucene document, check if the
> latitude/longitude is not null before setting it
>
>    if (flickrDoc.getLatitude() != null) {
>        document.add(new Field("latitude", flickrDoc.getLatitude(),
> Field.Store.YES, Field.Index.ANALYZED));
>    }
>
>  * a little more complex - but more efficient - solution I wrote for
> you and paste on[1], it parses & index the document into Lucene
> Document in one shot; the LuceneFieldRule is parametrized just in case
> you need to configure the Lucene Field depending on the matching
> pattern.
>
> HTH,
> Simo
>
> [1] http://pastie.org/1612471
>
> http://people.apache.org/~simonetripodi/
> http://www.99soft.org/
>
>
>
> On Fri, Feb 25, 2011 at 9:21 PM, Patrick Diviacco
> <patrick.diviacco@gmail.com> wrote:
> > hi,
> >
> > I need to understand how to deal changing xml fields such as these ones:
> >
> > <doc>
> > ..
> > <geo></geo>
> > </doc>
> >
> > <doc>
> > ..
> > <geo>
> >  <latitude>2432</latitude>
> >  <longitude>2342</longitude>
> > </geo>
> > </doc>
> >
> > As you can see geo element can be empty or parent element. I need to
> > build an apposite parser to deal with it. THis is my current code, but
> > I get error since latitude not always works...
> > http://codepad.org/jpKXmGZq
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message