hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabio C." <anyte...@gmail.com>
Subject Re: Dataset for hive
Date Thu, 02 Apr 2015 10:20:51 GMT
https://github.com/hortonworks/hive-testbench
The official procedure to generate and upload the data has never worked for
me (and it looks like it's not a supported software), so it could be a bit
tricky to do it manually and on a single host. The good point is you
already have several queries and you can set the size of the data you want
to generate.

On Thu, Apr 2, 2015 at 8:29 AM, xiaohe lan <zombiexcoder@gmail.com> wrote:

> Hi Vivek Veeramani,
>
> Actually, I already have that. But with the wiki dataset, I can only do
> "select *" queries.
>
> Thanks,
> Xiaohe
>
> On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani <
> vivek.veeramani87@gmail.com> wrote:
>
>> Hi Xiaohe,
>>
>> If it's data set that you're looking for, you can find wikipedia data
>> dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the
>> dumps @ http://meta.wikimedia.org/wiki/Data_dumps.
>>
>> Hope this helps..
>>
>>
>> On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan <zombiexcoder@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I am new to Hive. Just set up a 5 nodes Hadoop environment and want to
>>> have a try on HiveQL.
>>> Is there any dataset I can download to play HiveQL. The dataset should
>>> have several tables some I can write some complex join. About 100G should
>>> be fine.
>>>
>>> Thanks,
>>> Xiaohe
>>>
>>
>>
>>
>> --
>> Thanks ,
>> Vivek Veeramani
>>
>>
>> cell : +91-9632 975 975
>>         +91-9895 277 101
>>
>
>

Mime
View raw message