incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BLUR-95) IndexImporter class - add a double check on the rowid to validate the index.
Date Fri, 24 May 2013 16:34:21 GMT

    [ https://issues.apache.org/jira/browse/BLUR-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666437#comment-13666437
] 

Aaron McCurry commented on BLUR-95:
-----------------------------------

Overall I think your patch is pretty good, the logic is sound however there a few optimizations
that I would like to see.

1. I would like to see the logic of the comparing the shards in the applyDeletes method changed
a little.  I'm concerned about how many String objects are going to be created.  A more optimized
way of doing the comparsion is to take the shard String passed into the applyDeletes method
and call int currentShardId = BlurUtil.getShardIndex(shard) to get the integer of this shard.
 Then when you call "int partition = blurPartitioner.getPartition(key, null, numberOfShards);"
you can just check that the partition == currentShardId.

2. Also you should reuse Hadoop writable objects like BytesWritable by using setBytes() instead
of just creating a new object on every iteration "BytesWritable key = new BytesWritable(rowId.getBytes());".
 If you have to just inline the method into the loop to make it easier to reuse the objects
that is fine.  I am more concerned about performance than small methods.

3. The code "_shardContext.getTableContext().getDescriptor().getShardCount()" should be called
once before the loop instead of every iteration through the loop.

4. The ref.utf8ToString() is an expensive call because it creates a String.  You should be
able to set the bytes into the BytesWritable object without first converting it to a String.
 This will be much faster, because utf8ToString turns the byte[] in the ByteRef into a String
and the rowId.getBytes() just turns it back into a byte[].

Thanks!  Let me know if you need any help with these.

Aaron


                
> IndexImporter class - add a double check on the rowid to validate the index.
> ----------------------------------------------------------------------------
>
>                 Key: BLUR-95
>                 URL: https://issues.apache.org/jira/browse/BLUR-95
>             Project: Apache Blur
>          Issue Type: Improvement
>    Affects Versions: 0.1.5
>            Reporter: Aaron McCurry
>             Fix For: 0.1.5
>
>         Attachments: 0001-BLUR-ID-95-double-check-on-the-rowid.patch
>
>
> In the IndexImporter add a double check to the importer that validates the rowids in
the import are valid ids for the given shard.  This can be done when the rowids in the new
index are iterated over during the delete phase.  A BlurPartitioner class can valid the rowid
should be in the given shard.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message