lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Split one string into many fields
Date Mon, 22 Jan 2007 04:35:28 GMT
On 1/21/07, Ryan McKinley <ryantxu@gmail.com> wrote:
> Deep within the "Update Plugin" discussion, Hoss and I agreed that
> adding an interface and registry for DocumentParsers is a good idea:
>
> interface SolrDocumentParser
> {
>    Document parse(ContentStream content);
> }
>
> SolrDocumentParser parser = core.getDocumentParse( "text/html");
>
> This would let update plugins share (pluggable) logic for how to
> convert a single stream into a single document...  this is more then
> we are talking about doing now, but something (else) to keep in mind.

Yes, please, for another day... ;-)

It would be interesting to explore what we could share with Nutch
too... they're in the business of doc parsing.

When we get to it, I'd like to hear why it (things like PDF parsing)
should be inside Solr rather than outside using our update interfaces.

-Yonik

Mime
View raw message