accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: OfflineScanner
Date Thu, 19 Feb 2015 19:46:29 GMT
Want to file a ticket, Ara? I didn't realize it wasn't directly in the 
public API (only via m/r). I think it would make a nice addition.

Ara Ebrahimi wrote:
> OfflineScanner is package protected. So I'll need to hack it. If it
> proves to be faster at least 20% then it's worth having it in the public
> Ali, perhaps even let user use it by a asking specific file to be
> scanned rather than directing scan by carefully defining the range to
> touch the intended file.
>
> Ara.
>
> On Feb 19, 2015, at 8:15 AM, Keith Turner <keith@deenlo.com
> <mailto:keith@deenlo.com>> wrote:
>
>>
>>
>> On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi
>> <ara.ebrahimi@argyledata.com <mailto:ara.ebrahimi@argyledata.com>> wrote:
>>
>>     Hi,
>>
>>     I’m trying to optimize a connector we’ve written for Presto. In
>>     some cases we need to perform full table scans. This happens
>>     across all the nodes but each node is assigned to process only a
>>     sharded subset of data. Each shard is hosted by only 1 RFile. I’m
>>     looking at the AbstractInputFormat and OfflineIterator and it
>>     seems like the code is not that hard to use for this case. Is
>>     there any drawback? It seems like if the table is offline then
>>     OfflineIterator is used which apparently reads the RFiles directly
>>     and doesn’t involve any RPC and I think should be significantly
>>     faster. Is it so? Is there any drawback to using this while the
>>     table is not offline but no other app is messing with the table?
>>
>>
>> The code will throw an exception if the table is not offline (intent
>> is to ensure the files are stable and not garbage collected). As
>> others have stated you can clone.
>> Currently offline scanning is only supported in the public API w/ Map
>> Reduce. Curious, would you be interested in seeing this in the client
>> public API?
>>
>>
>>     Thanks,
>>     Ara.
>>
>>
>>
>>     ________________________________
>>
>>     This message is for the designated recipient only and may contain
>>     privileged, proprietary, or otherwise confidential information. If
>>     you have received it in error, please notify the sender
>>     immediately and delete the original. Any other use of the e-mail
>>     by you is prohibited. Thank you in advance for your cooperation.
>>
>>     ________________________________
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have received it in error, please notify the sender immediately and
>> delete the original. Any other use of the e-mail by you is prohibited.
>> Thank you in advance for your cooperation.
>>
>> ------------------------------------------------------------------------
>
>
>
> ------------------------------------------------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have received it in error, please notify the sender immediately and
> delete the original. Any other use of the e-mail by you is prohibited.
> Thank you in advance for your cooperation.
>
> ------------------------------------------------------------------------

Mime
View raw message