From Mick Semb Wever <>
Subject Re: RF=1 w/ hadoop jobs
Date Fri, 02 Sep 2011 06:54:29 GMT
On Fri, 2011-09-02 at 08:20 +0200, Patrik Modesto wrote:
> As Jonathan
> already explained himself: "ignoring unavailable ranges is a
> misfeature, imo" 

Generally it's not what one would want i think.
But I can see the case when data is to be treated volatile and ignoring
unavailable ranges may be acceptable. 

For example if you searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. The worse case
scenario here is no worse than the job always failing on you. Although
some indication of ranges ignored is required.

Another example is when your just trying to extract a small random
sample (like a pig SAMPLE) of data out of cassandra.

Patrik: is it possible to describe the use-case you have here?


