nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-898) Multi valued subcollection is not multi valued
Date Mon, 06 Sep 2010 16:44:33 GMT
Multi valued subcollection is not multi valued
----------------------------------------------

                 Key: NUTCH-898
                 URL: https://issues.apache.org/jira/browse/NUTCH-898
             Project: Nutch
          Issue Type: Bug
          Components: indexer
         Environment: nutch-2010-07-07_04-49-04
            Reporter: Markus Jelsma
             Fix For: 1.2


NUTCH-716 concatenates multiple values in a single string instead of adding single values
to a multi valued field. For a test crawl i have defined the following two subcollection definitions:

<subcollection>
<name>asdf</name>
<id>asdf-site</id>
<whitelist>http://asdf/</whitelist>
<blacklist/>
</subcollection>

<subcollection>
<name>news</name>
<id>asdf-news</id>
<whitelist>http://asdf/news/</whitelist>
<blacklist/>
</subcollection>

Reindexing the segments by sending them to Solr will yield the following results for a news
URL:

<doc>
<arr name="subcollection">
<str>asdf</str>
</arr>
<str name="url">http://asdf/home/</str>
</doc>
<doc>
<arr name="subcollection">
<str>asdf news</str>
</arr>
<str name="url">http://asdf/news/</str>
</doc>

Instead, i expected the following result for the second document:

<doc>
<arr name="subcollection">
<str>asdf</str>
<str>news</str>
</arr>
<str name="url">http://asdf/news/</str>
</doc>

My Solr schema.xml has the following declaration for the subcollection field:

<field name="subcollection" type="string" stored="true" indexed="true" multiValued="true"
/>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message