lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mike st. john" <>
Subject Re: indexing db records via SolrJ
Date Mon, 16 Mar 2015 18:39:33 GMT
Take a look at some of the integrations people are using with apache storm,
  we do something similar on a larger scale , having created a pgsql spout
and having a solr indexing bolt.


On Mon, Mar 16, 2015 at 11:08 AM, Hal Roberts <> wrote:

> We import anywhere from five to fifty million small documents a day from a
> postgres database.  I wrestled to get the DIH stuff to work for us for
> about a year and was much happier when I ditched that approach and switched
> to writing the few hundred lines of relatively simple code to handle
> directly the logic of what gets updated and how it gets queried from
> postgres ourselves.
> The DIH stuff is great for lots of cases, but if you are getting to the
> point of trying to hack its undocumented internals, I suspect you are
> better off spending a day or two of your time just writing all of the
> update logic yourself.
> We found a relatively simple combination of postgres triggers, export to
> csv based on those triggers, and then just calling update/csv to work best
> for us.
> -hal
> On 3/16/15 9:59 AM, Shawn Heisey wrote:
>> On 3/16/2015 7:15 AM, sreedevi s wrote:
>>> I had checked this post.I dont know whether this is possible but my query
>>> is whether I can use the configuration for DIH for indexing via SolrJ
>> You can use SolrJ for accessing DIH.  I have code that does this, but
>> only for full index rebuilds.
>> It won't be particularly obvious how to do it.  Writing code that can
>> intepret DIH status and know when it finishes, succeeds, or fails is
>> very tricky because DIH only uses human-readable status info, not
>> machine-readable, and the info is not very consistent.
>> I can't just share my code, because it's extremely convoluted ... but
>> the general gist is to create a SolrQuery object, use setRequestHandler
>> to set the handler to "/dataimport" or whatever your DIH handler is, and
>> set the other parameters on the request like "command" to "full-import"
>> and so on.
>> Thanks,
>> Shawn
> --
> Hal Roberts
> Fellow
> Berkman Center for Internet & Society
> Harvard University

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message