lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zacarias <zacar...@linebee.com>
Subject Re: Solr Cell revamped as an UpdateProcessor?
Date Tue, 05 Jan 2010 15:48:55 GMT
Hi, I'm developing a directory monitor to add in a Sor implementation.
Tell me if it could be interesting for you we will be glad to share it with
the comunity. Also I would like your opinion about the propousal if it looks
ok for you and if you like to make any change or question it will be very
well welcome.

Regards
Zacarias
www.linebee.com


2009/12/8 Noble Paul നോബിള്‍ नोब्ळ् <noble.paul@corp.aol.com>

> I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
> is a good idea
>
> On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll <gsingers@apache.org>
> wrote:
> >
> > On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ्
wrote:
> >
> >> Integrating Extraction w/ DIH is a better option. DIH makes it easier
> >> to do the mapping of fields etc.
> >
> > Which comment is this directed at?  I'm lacking context here.
> >
> >>
> >>
> >> On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll <gsingers@apache.org>
> wrote:
> >>>
> >>> On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
> >>>
> >>>>
> >>>> ASs someone with very little knowledge of Solr Cell and/or Tika, I
> find myself wondering if ExtractingRequestHandler would make more sense as
> an extractingUpdateProcessor -- where it could be configured to take take
> either binary fields (or string fields containing URLs) out of the
> Documents, parse them with tika, and add the various XPath matching hunks of
> text back into the document as new fields.
> >>>>
> >>>> Then ExtractingRequestHandler just becomes a handler that slurps up
> it's ContentStreams and adds them as binary data fields and adds the other
> literal params as fields.
> >>>>
> >>>> Wouldn't that make things like SOLR-1358, and using Tika with
> URLs/filepaths in XML and CSV based updates fairly trivial?
> >>>
> >>> It probably could, but am not sure how it works in a processor chain.
>  However, I'm not sure I understand how they work all that much either.  I
> also plan on adding, BTW, a SolrJ client for Tika that does the extraction
> on the client.  In many cases, the ExtrReqHandler is really only designed
> for lighter weight extraction cases, as one would simply not want to send
> that much rich content over the wire.
> >>
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Noble Paul | Systems Architect| AOL | http://aol.com
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message