hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <>
Subject Re: Sampling from a single column
Date Thu, 13 Feb 2014 04:19:02 GMT
If it should be sampled using subquery would be inevitable, something like,

select x from (select distinct key as x from src)a where rand() > 0.9 limit

2014-02-12 6:07 GMT+09:00 Oliver Keyes <>:

> Hey all
> So, what I'm looking to do is get N randomly-sampled distinct values from
> a column in a table. I'm kind of flummoxed by how to do this without using
> TABLESAMPLE, which would require me to add Yet Another Subquery (it'd be
> 'select these values, from this sample, from these distinct values'). I
> could swear I saw a simple sample() function while browsing the
> documentation just last week, but I'll be damned if I can find it again.
> Can anyone help me out, or is Yet Another Subquery the way to go?
> Thanks!

View raw message