hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Keyes <oke...@wikimedia.org>
Subject Sampling from a single column
Date Tue, 11 Feb 2014 21:07:43 GMT
Hey all

So, what I'm looking to do is get N randomly-sampled distinct values from a
column in a table. I'm kind of flummoxed by how to do this without using
TABLESAMPLE, which would require me to add Yet Another Subquery (it'd be
'select these values, from this sample, from these distinct values'). I
could swear I saw a simple sample() function while browsing the
documentation just last week, but I'll be damned if I can find it again.
Can anyone help me out, or is Yet Another Subquery the way to go?

Thanks!

Mime
View raw message