drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Sequence File Suport
Date Fri, 07 Feb 2014 17:44:33 GMT
Hello Tom,

Steven just submitted a patch for a Hive Serde storage engine. I believe he
successfully was able to read sequence file with this technique. We will be
adding a native reader in the future (for improved performance), but for
now this should be a decent way to get sequence file data into drill. He
currently has the patch up for review, so if you are comfortable applying a
patch, building the project and trying to read some of your data we would
certainly appreciate feedback. It should be merged with mainline in the
near future, which would remove the need to apply the patch.

https://reviews.apache.org/r/17833/

-Jason Altekruse


On Fri, Feb 7, 2014 at 7:51 AM, Sebastian Schelter <ssc@apache.org> wrote:

> There's no need to excuse for asking questions :)
>
>
> On 02/07/2014 02:49 PM, Tom Kiley wrote:
>
>> Hello,
>>
>>
>> Are there plans to support Hadoop's Sequence File (
>> http://wiki.apache.org/hadoop/SequenceFile.)  Or are they already
>> supported
>> and I missed it?  I could see this being useful to use Drill on the output
>> of MapReduce jobs.
>>
>> The sequence files I have are currently all NULL keys and JSON objects as
>> the value.  Does anyone have a recommendation on converting to JSON files
>> or Parquet files for Drill?  The JSON objects are generally the same
>> format, but there may be some outliers with differences.  Some fields may
>> be non-existant in some objects.
>>
>>
>> Thanks,
>> Tom
>>
>> P.S. Apologies for the noob questions.  I've just started looking at
>> Drill.
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message