lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sascha Szott <>
Subject Re: Indexing multiple documents in Solr/SolrCell
Date Tue, 17 Nov 2009 14:13:09 GMT

Kerwin wrote:
> Our approach is similar to what you have mentioned in the jira issue except
> that we have all metadata in the xml and not in the database. I am therefore
> using a custom XmlUpdateRequestHandler to parse the XML and then calling
> Tika from within the XML Loader to parse the content. Until now this seems
> to work.
> When and in which Solr version do you expect the jira issue to be
> addressed?
That's a good question. Since I'm not a Solr committer, I cannot give 
any estimate on when it will be released (hopefully in Solr 1.5).


> On Mon, Nov 16, 2009 at 5:02 PM, Sascha Szott <> wrote:
>> Hi,
>> the problem you've described -- an integration of DataImportHandler (to
>> traverse the XML file and get the document urls) and Solr Cell (to extract
>> content afterwards) -- is already addressed in issue SOLR-1358 (
>> Best,
>> Sascha
>> Kerwin wrote:
>>> Hi,
>>> I am new to this forum and would like to know if the function described
>>> below has been developed or exists in Solr. If it does not exist, is it a
>>> good Idea and can I contribute.
>>> We need to index multiple documents with different formats. So we use Solr
>>> with Tika (Solr Cell).
>>> Question:
>>> Can you index both metadata and content for multiple documents iteratively
>>> in Solr?
>>> For example I have an XML with metadata and a links to the documents
>>> content. There are many documents in this XML and I would like to index
>>> them
>>> all without firing multiple URLs.
>>> Example of XML
>>> <add>
>>> <doc>
>>> <field name=id>34122</field>
>>> <field name=author>Michael</field>
>>> <field name=size>3MB</field>
>>> <field name=URL>URL of the document</field>
>>> </doc>
>>> </add>
>>> <doc2>.....</doc2>...</docN>
>>> I need to index all these documents by sending this XML in a single
>>> URL.The
>>> collection of documents to be indexed could be on a file system.
>>> I have altered the Solr code to be able to do this but is there an already
>>> existing feature?

View raw message