hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Sorting data sets
Date Wed, 08 Jul 2009 16:46:46 GMT
I know that this is probably old news, but sorting on time has the wonderful
property of being unique up to ties.  If you have bounds on timing errors,
having input that is sorted makes all of the the things much easier than
sorting all of the data.

On Wed, Jul 8, 2009 at 7:28 AM, Patterson, Josh <jpatterson0@tva.gov> wrote:

> We'll throw that in some of our jobs (single lane ones), but others require
> us to scan 10+ "lanes" of time series data per bucket at a time, so we still
> have to line those up before we scan in some sort of data structure. It's an
> odd alignment, but basically researchers want to scan "across" the "lanes"
> (to see if a condition occurs in certain ways across various sensors) as the
> time window slides forward (plus there are missing points, miss-aligned
> timings on certain lanes, etc)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message