nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Re: Adding additional metadata
Date Mon, 11 Jan 2010 12:18:59 GMT

First of all: I didn't know about the list archive, so sorry for not 
searching that resource before I sent a new post.

MilleBii wrote:
> For lastModified just enable the index|query-more plugins it will do
> the job for you.

Unfortunately not. Our pages include Dublin core metadata which has a 
Norwegian name.

> For other meta searc the mailing list its explained many times how to do it

I found several posts concerning metadata, but for me, one question is 
still unanswered: Do I really have to create a lot of new classes/xml 
files in order to store the content of just two metadata? I have not 
managed to parse the content of the lastModified metadata after I tried 
to rewrite the HtmlParser class. So I tried to add hard coded metadata 
values in HtmlParser like this instead:
entry.getValue().getData().getParseMeta().set("dato.endret", "01.01.2008");

My modified MoreIndexingFilter managed to pick up the hard coded values, 
and the dates were successfully stored into my Solr Index after running 
the solrindex option.

This means that it is not necessary to write a new MoreIndexingFilter 
class, but I'm still unsure about the HtmlParser class since I haven't 
managed to parse the content of the metadata.

Erlend

-- 
Erlend GarĂ¥sen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Mime
View raw message