nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max S <maximillian...@googlemail.com>
Subject RE: Customise scoring
Date Tue, 08 Sep 2009 21:46:05 GMT
Thanks MilleBii,

That sounds logical. I'll look at query plugin instead. 

Regards
Max S

 

-----Original Message-----
From: MilleBii [mailto:millebii@gmail.com] 
Sent: Thursday, September 03, 2009 8:04 AM
To: nutch-user@lucene.apache.org
Subject: Re: Customise scoring

I think the scoring filter has more to do with crawling and how you would
want to do search in the webgraph (crawldb).

Since you talk about search, you need to write a query plug-in instead that
implements your algorithm and sets the document boost adequately.

Having said that, I vote for having XML/EXIF parser standard in a future
nutch build...



2009/9/2 Max S <maximillian009@googlemail.com>

> Hi all,
>
> I'm have installed / imported a XML and EXIF parser plugin into Nutch 
> to parse xml files and EXIF metadata from JPG images.
>
> The idea would be to:
> 1. Fetch and extract data and links from XML file
>        NB: The XML file contains Geo coordinates (latitude and 
> longitude), title and image links.
> 2. Fetch image and extract EXIF metadata 3. Store the extracted data 
> from both parser in Index.
>
> I would like to customise search so the results is ordered by the 
> following priority.
> 1. Proximity to location
> 2. Keywords from EXIF Metadata
> 3. Kewords from XML title
>
> From what I can see at the moment, I will need to 1. Set a higher 
> score to the fields according to the priority above 2. Repurpose the 
> algorithm within GeoPosition plugin
> (http://wiki.apache.org/nutch/GeoPosition)
> 3. Update ScoringFilter logic to include Geo Position algorithm?
>
>
> The question here is, is the last item correct? Or are there any other 
> approach?
> Where should I start looking? Appreciate any suggestions.
>
> Regards
> Max S
>
>
>
>


--
-MilleBii-


Mime
View raw message