incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: [jira] [Created] (BLUR-95) IndexImporter class - add a double check on the rowid to validate the index.
Date Thu, 23 May 2013 13:35:27 GMT
Yep that sounds right.  Pretty straight forward.  I think that if it
detects a bad rowid that it should:

1. Log the error.
2. Exit the import of that shard (which means calling rollback on the
writer).
3. Rename the directory of the index that was trying to be imported from
"whatever_the_name_is.commit" to "whatever_the_name_is.bad_rowids".  That
way the importer will not try to import it again.

Aaron


On Thu, May 23, 2013 at 8:28 AM, Gagan Juneja <gagandeepjuneja@gmail.com>wrote:

> So here in applyDeletes() method of IndexImporter class we can get
> shardName for a rowId using BlurPartitioner.getPartition() method and
> compare with the shard id available in IndexImporter through
> (_shardContext.getShard()) method. What u say?
>
> Regards,
> Gagan
>
> On Thu, May 23, 2013 at 3:09 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> > Sure.
> >
> > The BlurOutputFormat class loads data into a table in Blur by delivering
> > new indexes into the shard directory where the Blur table is configured
> to
> > store it's data.  The new indexes are just sub directories n the shard
> > index directory.  The index importer looks for sub directories that are
> > named *.commit.  Then they are opened and the rowid are scanned and
> deletes
> > are called for each rowid in the new index (replacing the rows).  This
> task
> > is meant to guard the shard from getting rows that are not meant for this
> > index.  Basically it's a double check that a row for shard-000001 doesn't
> > make it into shard-000007, this could happen if someone changed the
> reducer
> > count to an invalid number before running the map reduce job.
> >
> > Hope this helps.
> >
> > Aaron
> >
> >
> > On Wed, May 22, 2013 at 11:43 AM, Gagan Juneja <
> gagandeepjuneja@gmail.com>wrote:
> >
> >> I understand a bit of it. Could you please explain this bit more?
> >>
> >> Regards,
> >> Gagan
> >>
> >> On Tue, May 21, 2013 at 5:09 AM, Aaron McCurry (JIRA) <jira@apache.org>
> >> wrote:
> >> > Aaron McCurry created BLUR-95:
> >> > ---------------------------------
> >> >
> >> >              Summary: IndexImporter class - add a double check on the
> >> rowid to validate the index.
> >> >                  Key: BLUR-95
> >> >                  URL: https://issues.apache.org/jira/browse/BLUR-95
> >> >              Project: Apache Blur
> >> >           Issue Type: Improvement
> >> >     Affects Versions: 0.1.5
> >> >             Reporter: Aaron McCurry
> >> >              Fix For: 0.1.5
> >> >
> >> >
> >> > In the IndexImporter add a double check to the importer that validates
> >> the rowids in the import are valid ids for the given shard.  This can be
> >> done when the rowids in the new index are iterated over during the
> delete
> >> phase.  A BlurPartitioner class can valid the rowid should be in the
> given
> >> shard.
> >> >
> >> > --
> >> > This message is automatically generated by JIRA.
> >> > If you think it was sent incorrectly, please contact your JIRA
> >> administrators
> >> > For more information on JIRA, see:
> >> http://www.atlassian.com/software/jira
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message