nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xiao yang <yangxiao9...@gmail.com>
Subject Re: How to Add a new field
Date Wed, 02 Sep 2009 15:27:20 GMT
Hi, Parvez:

conf/custom-fields.xml only configure a QueryFilter.
You need to add a Parse Extension and a Index Extension, so the meta
data can be extracted from html and added to the index.
This article may help you: http://wiki.apache.org/nutch/WritingPluginExample

Xiao

On Fri, Aug 28, 2009 at 11:31 PM, MilleBii<millebii@gmail.com> wrote:
> Well new field extraction requires purposed indexing filter plug-in so if
> none of the current plug-in does it for you then you have to build one for
> yourself. Relatively easy and well explained on the wiki.
> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg14397.html
>
>
> I'm not completely sure what the conf/custom-fields.xml is used for, but it
> does not create new field. I know because I made the same assumption as you
> did when I started. I assume it tells the different components what to do
> when they see the new field.
>
>
>
>
> 2009/8/28 Mohamed Parvez <parvez@gmail.com>
>
>> what plug-in is it?
>>
>> In the plug-in directory i don't see anything by name custom-fields.
>>
>> ----
>> Thanks/Regards,
>> Parvez
>> GV : 786-693-2228
>>
>>
>> On Fri, Aug 28, 2009 at 4:40 AM, MilleBii <millebii@gmail.com> wrote:
>>
>> > If there is nothing in the index it is most probably that you forgot to
>> add
>> > the plug-in that indexes this field.
>> >
>> > 2009/8/28 Mohamed Parvez <parvez@gmail.com>
>> >
>> > > Hello All,
>> > >
>> > >        I am using Nutch 1.0
>> > >
>> > >        In html pages of my website, there is a meta tag called
>> > page_title,
>> > > which will have the actual page title.
>> > >
>> > >        I see that there is an option to add custom fields in the file
>> > > conf/custom-fields.xml
>> > >
>> > > <properties>
>> > >  <entry key="field.name">page_title</entry>
>> > >  <entry key="field.indexed">yes</entry>
>> > >  <entry key="field.stored">yes</entry>
>> > >  <entry key="field.tokenized">no</entry>
>> > >  <entry key="field.boost">1.0</entry>
>> > >  <entry key="field.multi">false</entry>-->
>> > > </properties>
>> > >
>> > >        I added the field name in that file but don't see it in the
>> index,
>> > > when i opened in the index using luke.
>> > >
>> > >        Is there any documentation on using the file
>> > conf/custom-fields.xml
>> > > or If some one can tell me how to use, it will be great help.
>> > >
>> > > ---
>> > > Thanks/Regards,
>> > > Parvez
>> > >
>> >
>> >
>> >
>> > --
>> > -MilleBii-
>> >
>>
>
>
>
> --
> -MilleBii-
>

Mime
View raw message