nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Morille Jerome (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic
Date Fri, 11 Dec 2009 20:42:18 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789515#action_12789515
] 

Morille Jerome commented on NUTCH-422:
--------------------------------------

No It don't work with nutch version 1.0
He still use the Lucene Document and not NutchDocument.in new Apis.
It easy to correct.

If you want to use it, Take care with this code,a fast read  you can see :
 - InputStream was open and never close
 - Exception cath to Null

The idear is good, 
Nutch distribution plugin don't permit to customize easly Index data.

They are something to do !!!



> index-extra plugin creates additional fields in the index, based on configurable logic
> --------------------------------------------------------------------------------------
>
>                 Key: NUTCH-422
>                 URL: https://issues.apache.org/jira/browse/NUTCH-422
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.8.1
>         Environment: All environments
>            Reporter: Alan Tanaman
>            Assignee: Sami Siren
>         Attachments: index-extra-v1.0-bin-java1.5.zip, index-extra-v1.0-source.zip
>
>
> Extract from the Readme file:
> A.  Introduction
>     The index-extra plugin allows you to configure additional fields that you wish to
be added to the index, based on one of the following sources:
>       - The parsed text
>       - Meta data fields
>       - Previously created document-to-be-indexed fields
>       - Plain constant string
>       - Java expression combining one or more of the above, and resolving to a string
>     A regex can also be applied to any of the above, allowing fields to be created based
on patterns extracted from the source.
> B.  Installation
>     1)  Binaries only:  Copy the 'index-extra' folder within index-extra-v1.0-bin-java1.5.zip
to NUTCHDIR/build
>                         Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml file
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building
Nutch.  In short:
>                         Copy the 'index-extra' folder within index-extra-v1.0-source.zip
to NUTCHDIR/src/plugin
>                         Update the build.xml in NUTCHDIR/src/plugin to include plugin
>                         Update the NUTCHDIR/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml file
> C.  Known Issues
>     1)  For this plugin to work correctly on any document field, it is necessary to run
the other index filters
>     first, so that all basic document fields are generated first.  To do this, configure
the indexingfilter.order
>     property.  (Please see patch NUTCH-421 to enable indexingfilter.order property. If
this patch is not applied,
>     the plugin will still work, but will not be able to use document fields created by
other index filter plugins.)
>     2)  At this stage, field boost can not be used as Nutch scoring overrides the field
boost with its own
>     document-level boost calculation.  This occurs at the end of org.apache.nutch.indexer.Indexer's
reduce method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message