accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: OfflineScanner
Date Thu, 19 Feb 2015 16:12:04 GMT
On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi <ara.ebrahimi@argyledata.com>
wrote:

> Hi,
>
> I’m trying to optimize a connector we’ve written for Presto. In some cases
> we need to perform full table scans. This happens across all the nodes but
> each node is assigned to process only a sharded subset of data. Each shard
> is hosted by only 1 RFile. I’m looking at the AbstractInputFormat and
> OfflineIterator and it seems like the code is not that hard to use for this
> case. Is there any drawback? It seems like if the table is offline then
> OfflineIterator is used which apparently reads the RFiles directly and
> doesn’t involve any RPC and I think should be significantly faster. Is it
> so? Is there any drawback to using this while the table is not offline but
> no other app is messing with the table?
>

The code will throw an exception if the table is not offline (intent is to
ensure the files are stable and not garbage collected). As others have
stated you can clone.

Currently offline scanning is only supported in the public API w/ Map
Reduce.  Curious, would you be interested in seeing this in the client
public API?


> Thanks,
> Ara.
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
>
> ________________________________
>

Mime
View raw message