nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cam Bazz <camb...@gmail.com>
Subject Re: injecting url and url metadata
Date Tue, 26 Jul 2011 16:55:09 GMT
Hello,

And how can I access this data, afterwards when parsing or indexing?

is it going to be in the parseMeta?

Best

On Mon, Jul 25, 2011 at 7:50 PM, Julien Nioche
<lists.digitalpebble@gmail.com> wrote:
> that's just the way the toString() method concatenates things, the key
> values are stored correctly and this should not be a problem.
> look at plugin/urlmeta for a way of propagating the features to the outlinks
>
> On 25 July 2011 17:47, Cam Bazz <cambazz@gmail.com> wrote:
>
>> Hello,
>>
>> I have figured out that it can be done indeed. However when I
>> inject/generate/readdb dump
>>
>> Score: 1.0
>> Signature: null
>> Metadata: status: 9catId: 1
>>
>> In the metadata part there is no space between 9 and catId, I wonder
>> if that is a problem.
>>
>> Best Regards,
>> C.B.
>>
>>
>>
>> On Mon, Jul 25, 2011 at 7:21 PM, Cam Bazz <cambazz@gmail.com> wrote:
>> > Hello,
>> >
>> > How could I inject metadata for urls that I provide?
>> >
>> > In Injector.java :
>> >
>> > /** This class takes a flat file of URLs and adds them to the of pages to
>> be
>> >  * crawled.  Useful for bootstrapping the system.
>> >  * The URL files contain one URL per line, optionally followed by
>> > custom metadata
>> >  * separated by tabs with the metadata key separated from the
>> > corresponding value by '='. <br>
>> >  * Note that some metadata keys are reserved : <br>
>> >  * - <i>nutch.score</i> : allows to set a custom score for a specific
URL
>> <br>
>> >  * - <i>nutch.fetchInterval</i> : allows to set a custom fetch
>> > interval for a specific URL <br>
>> >  * e.g. http://www.nutch.org/ \t nutch.score=10 \t
>> > nutch.fetchInterval=2592000 \t userType=open_source
>> >  **/
>> >
>> >
>> > could I extend this structure to store metadata about urls?
>> >
>> > Best Regards,
>> > -C.B.
>> >
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Mime
View raw message