hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From william kinney <william.kin...@gmail.com>
Subject Re: PCAP file format support
Date Thu, 30 Jul 2009 04:21:52 GMT
+1

In general I think you would just need to parse the interesting fields
via a java pcap format reader (or do the byte reading yourself, the
format is pretty standard:
http://wiki.wireshark.org/Development/LibpcapFileFormat), put them
into a Writeable object and write them to the HDFS via SequenceFile
format.

Another option is using a binary serialization package such as avro,
thrift or protobuf and writing the serialized form to the HDFS. You
would then need to write your own InputFormat/RecordReader for it, or
wait for http://issues.apache.org/jira/browse/MAPREDUCE-377 or some
other native support.

Will

On Wed, Jul 29, 2009 at 7:21 PM, Ariel Rabkin<asrabkin@gmail.com> wrote:
> I remember looking at this some months back.
>
> My recollection is that PCAP is a somewhat awkward format to
> MapReduce, since it isn't splittable -- you can't find record
> boundaries, if you start at a random offset.
>
> You may want to do some sort of preprocessing, before you upload your
> logs to HDFS to fix this.  Irritatingly, the existing code I've seen
> for processing PCAP files doesn't seem very friendly to parsing
> arbitrary packet-trace data in-memory.
>
> --Ari
>
> On Tue, Jul 28, 2009 at 8:31 AM, Wasim Bari<wasimbari@msn.com> wrote:
>>
>>
>>
>>
>>
>> Hi,
>>
>>   I have data in PCAP file format (packet capture for network trafficc). Is it possible
to process this file in Hadoop in same format ? Or any supporting tool over hadoop to analyze
data from PCAP files ?
>>
>>
>>
>>
>>
>> Bye
>>
>>
>>
>> Wasim
>>
>
>
>
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
>

Mime
View raw message