commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Tripodi <simonetrip...@apache.org>
Subject Re: [digester] How to deal with flexible XML ?
Date Sun, 27 Feb 2011 09:53:18 GMT
Hi Patrick,
I used Field.Index different values just to show that the Lucene rule
accepts parameters.
Unfortunately your question is more Lucene/domain related, I suggest
you asking on Lucene ML.
HTH, have a nice WE,
Simo

http://people.apache.org/~simonetripodi/
http://www.99soft.org/



On Sun, Feb 27, 2011 at 10:39 AM, Patrick Diviacco
<patrick.diviacco@gmail.com> wrote:
> hi,
>
> thanks for the snipper. I see in your code you are
> using Field.Index.NOT_ANALYZED  for the title.
>
> It is not clear to me what I should analyze and what not. I need to add
> tf-idf weights to all terms of all fields.
>
> Should I use Field.Index.ANALYZED for all of them ?
>
> thnks
>
>
>
> On 27 February 2011 09:55, Simone Tripodi <simonetripodi@apache.org> wrote:
>
>> Hi Patrick,
>> I quickly had a look at your code and l didn't see anything wrong, the
>> Digester should work either the <geo> tag is empty or not.
>>
>> When you will have documents such
>>
>> <doc>
>> ..
>> <geo></geo>
>> </doc>
>>
>> the `collection/doc/geo/(latitude|longitude)` pattern will never
>> match, so set(Latitude|Longitude) methods won't be invoked.
>> I can suggest you 2 options:
>>
>>  * quick solution: when building the Lucene document, check if the
>> latitude/longitude is not null before setting it
>>
>>    if (flickrDoc.getLatitude() != null) {
>>        document.add(new Field("latitude", flickrDoc.getLatitude(),
>> Field.Store.YES, Field.Index.ANALYZED));
>>    }
>>
>>  * a little more complex - but more efficient - solution I wrote for
>> you and paste on[1], it parses & index the document into Lucene
>> Document in one shot; the LuceneFieldRule is parametrized just in case
>> you need to configure the Lucene Field depending on the matching
>> pattern.
>>
>> HTH,
>> Simo
>>
>> [1] http://pastie.org/1612471
>>
>> http://people.apache.org/~simonetripodi/
>> http://www.99soft.org/
>>
>>
>>
>> On Fri, Feb 25, 2011 at 9:21 PM, Patrick Diviacco
>> <patrick.diviacco@gmail.com> wrote:
>> > hi,
>> >
>> > I need to understand how to deal changing xml fields such as these ones:
>> >
>> > <doc>
>> > ..
>> > <geo></geo>
>> > </doc>
>> >
>> > <doc>
>> > ..
>> > <geo>
>> >  <latitude>2432</latitude>
>> >  <longitude>2342</longitude>
>> > </geo>
>> > </doc>
>> >
>> > As you can see geo element can be empty or parent element. I need to
>> > build an apposite parser to deal with it. THis is my current code, but
>> > I get error since latitude not always works...
>> > http://codepad.org/jpKXmGZq
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message