flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject SolrCell help!
Date Mon, 22 Jul 2013 16:18:25 GMT
Hi to all,
I'm trying to understand how to "master" Morphline configuration files in
order to put some data into Solr but I'm facing some problem with
TestMorphlineSolrSink. This is what I done:

1) Since I want to index the title of the testXML.xml (i.e. "Tika test
document") so I commented out all the parsers
except org.apache.tika.parser.xml.DcXMLParser (which parse Doublin Core
2) In schema.xml I added the following field:
    <field name="title" type="text_en" indexed="true" stored="true"
multiValued="false" />

 - If I don't add anything to fmap or capture everything works fine but I
don't understand why (who fills that field?). If instead I add to capture
title or/and to famp title: title (or dc_title:title) Solr complains that 2
values are retrieved for 'title' (debugging the values I see the title and
one empty value in the 'title\ metadata array...).
Thus, the problem is that everything works magically if the field is named
title, but if I change its name to something like doc_title there's no way
to make it non-multivalued.  Am I right? How can I fix this problem?
- I'd like to manage JSON files..How can I map JSON fields to Solr fields?
Could someone give a simple example?


View raw message