hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: how to parse a sequence file in my local filesystem
Date Tue, 28 Jun 2011 14:54:55 GMT
If you want to write your own parser, you can always use the
SequenceFile.Reader[1] class. That will let you scan through all of
the key, value pairs in the file and perform whatever operation you
need. The SequenceFile.Reader uses the Hadoop APIs for reading files.
For a local file, that means you need to get a reference to the local
file system. You can do that with something like this:

FileSystem.get(new URI("file:///"), new Configuration());

I've never used the Hadoop APIs to access a Windows file system, so
there may be some peculiarities there.

-Joey

[1] http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/io/SequenceFile.Reader.html


On Mon, Jun 27, 2011 at 8:52 PM, ling cao <ling.caol@gmail.com> wrote:
> Maybe i didn't describe my question clearly, i know the hadoop fs command
> can do it
> but i need to parse it without hdfs environment,
> the file is on my disk(for example: D://test.seq),
> and how to  write a java class to parse it?
>
> 2011/6/27 Joey Echeverria <joey@cloudera.com>
>>
>> If the data is text you can always print out the sequence file using
>> this command:
>>
>> hadoop fs -text file:///my/directory/file.seq
>>
>> This will parse the sequence file, convert each key and value to a
>> string and print it to stdout. Notice the file:// in the path, that
>> will cause hadoop to access the local file system.
>>
>> -Joey
>>
>> On Mon, Jun 27, 2011 at 5:04 AM, ling cao <ling.caol@gmail.com> wrote:
>> > hi
>> > i have a small sequence file (about 1k) which is produced by a hive job,
>> > and
>> > i need to parse it in my local filesystem,not in hdfs
>> > is there any easy way to do it ?
>> > Thanks
>> >
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message