hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommy Chheng <tommy.chh...@gmail.com>
Subject Re: sampling conditionality from a large table
Date Fri, 01 Oct 2010 18:21:14 GMT
  Thanks, I ended up writing a scala program which uses the hive JDBC 
connector. Performance was still reasonable.

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 9/27/10 11:13 PM, Guru Prasad wrote:
> Hi,
> Please see the attachment.......this might help you.
> It helped me  for solving similar kind of problem.
>
>
> Thanks & Regards
> ~guru prasad
>
> On 09/28/2010 06:20 AM, Tommy Chheng wrote:
>> I have two tables:
>> pages( title, domain, url )
>> top_domains(domain)
>>
>> top_domains was created from a group by domain operation on the pages table.
>>
>>
>> Because the pages table is very large, I only want to be able to sample 5 rows for
each domain in top_domains.
>>
>> in a traditional programming language, i could just use a for loop to iterate on
the domain field and perform a select with a limit 5 clause.
>> Is there a way to express this query in hive?
>> -
>> @tommychheng
>> Programmer and UC Irvine Graduate Student
>> Find a great grad school based on research interests:http://gradschoolnow.com
>>
>>
>>    
>
> This message is intended only for the use of the addressee and may contain information
that is privileged, confidential
> and exempt from disclosure under applicable law. If the reader of this message is not
the intended recipient, or the
> employee or agent responsible for delivering the message to the intended recipient, you
are hereby notified that any
> dissemination, distribution or copying of this communication is strictly prohibited.
If you have received this e-mail
> in error, please notify us immediately by return e-mail and delete this e-mail and all
attachments from your system.

Mime
View raw message