lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Rangarajan <arunrangara...@gmail.com>
Subject Re: Why is my DIH delta import doing a full import?
Date Tue, 04 Jun 2013 16:53:59 GMT
Thanks, Raheel. That's the approach I took. I modified the deltaQuery like
this:

deltaQuery="SELECT l.list_id AS id FROM lists l
     LEFT JOIN agg_list_view_stats agglvs ON agglvs.list_id = l.list_id
     WHERE l.status = 'ACTIVE' AND l.is_public = 1 AND
     (
     (('${dih.request.entity1}' = 'true') AND (l.modified_on &gt;
'${dih.last_index_time}')) OR
     (('${dih.request.entity2}' = 'true') AND
(agglvs.overall_view_modified_date &gt; DATE_SUB(NOW(), INTERVAL 1 HOUR)))
     )

Then I pass
entity1=true for what was my previous first entity
and
entity2=true for the previous 2nd entity.


On Tue, Jun 4, 2013 at 9:21 AM, Raheel Hasan <raheelhasan.fsd@gmail.com>wrote:

> maybe this will help you:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
>
>
> On Tue, Jun 4, 2013 at 8:38 PM, Arun Rangarajan <arunrangarajan@gmail.com
> >wrote:
>
> > Shawn,
> >
> > Thanks for your reply. My data-config.xml actually has two entities. I
> sent
> > only the first entity in my previous email. Since I had not run any
> imports
> > on the 2nd entity, dataimport.properties did not have an entry for it
> yet.
> > This worked fine in 3.6.2, so looks like a bug in 4.2.1.
> >
> > For now, I am thinking that I can skip using the dih properties entirely.
> > For the first entity, I can look for documents that changed in the last
> 10
> > min in the DB and run the delta import cron job every 10 min. For the 2nd
> > entity, the interval is 1 hour. Of course, if one of the delta imports
> fail
> > this approach may skip some documents, but we do full import once a day
> so
> > those docs should eventually catch up. Guess that's the best I can get
> with
> > DIH for now!
> >
> >
> > On Tue, Jun 4, 2013 at 7:05 AM, Shawn Heisey <solr@elyograg.org> wrote:
> >
> > > On 6/4/2013 7:52 AM, Arun Rangarajan wrote:
> > > > I upgraded from Solr 3.6.2 to 4.2.1 and I am noticing that my data
> > import
> > > > handler's delta import is actually doing a full import.
> > >
> > > <snip>
> > >
> > > > What changed and how do I get delta import to only index the
> documents
> > > that
> > > > got modified after ${dih.Lists.last_index_time}'?
> > >
> > > It's a bug.  I've built a test that shows the problem, but I haven't
> > > figured out yet how to actually fix it.
> > >
> > > https://issues.apache.org/jira/browse/SOLR-4788
> > >
> > > I now have one more data point to add to the mix that I didn't know
> > > before - it works in 3.6.2.
> > >
> > > It looks like you only have the one entity showing a last_indexed_time,
> > > so you should be able to use ${dih.last_index_time} instead of
> > > ${dih.Lists.last_index_time}.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
>
> --
> Regards,
> Raheel Hasan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message