chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: PipelineStageWriter doesn't work as expected
Date Fri, 18 Dec 2009 21:24:11 GMT
I agree that the current writer should be kept as it is.  My SeqFileWriter
will be renamed to PipelineSeqFileWriter.  I also like the idea of abstract
class to reduce duplicated coding on the writer implementation.
+1 on 432.

Regards,
Eric

On 12/18/09 9:39 AM, "Jerome Boulon" <jboulon@netflix.com> wrote:

> Hi Eric,
> Can you create another class that takes a writer and make it a pipeline
> writer? The logic for pipeline should be extracted and the current writers
> should be kept clean.
> 
> I'm saying that because I have a new writer implementation and I would have
> to do something similar to what you're doing for near real time monitoring.
> 
> Thanks,
>   /Jerome.
> 
> On 12/18/09 9:16 AM, "Eric Yang" <eyang@yahoo-inc.com> wrote:
> 
>> Correction, the HDFS has been written to HDFS correctly.  Data were stuck at
>> post data processing because the postProcess program crashed.  I still need
>> to determine the cause of postProcess crash.  I think the modified
>> SeqFileWriter does what I wanted, and I will implement next.add() to ensure
>> the ordering can be interchanged.
>> 
>> Regards,
>> Eric
>> 
>> On 12/18/09 8:59 AM, "Eric Yang" <eyang@yahoo-inc.com> wrote:
>> 
>>> I like to make a T on the incoming data.  One writer goes into HDFS, and
>>> another writer enable real time pub/sub to monitor the data.  In my case,
>>> the data are mirrored, not filtered.  However, I am not getting the right
>>> result because it seems the data isn't getting written into HDFS regardless
>>> the ordering of the writer.
>>> 
>>> Regards,
>>> Eric
>>> 
>>> On 12/17/09 9:53 PM, "Ariel Rabkin" <asrabkin@gmail.com> wrote:
>>> 
>>>> What's the use case for this?
>>>> 
>>>> The original motivation for pipelined writers was so that we could do
>>>> things like filtering before data got written.  Then it occurred to me
>>>> that SocketTeeWriter fit fairly naturally into a pipeline.
>>>> 
>>>> Putting it "after" seq file writer wouldn't be too bad --
>>>> SeqFileWriter.add() would need to call next.add().  But I would be
>>>> hesitant to commit that change, without a really clear use case.
>>>> 
>>>> --Ari
>>>> 
>>>> On Thu, Dec 17, 2009 at 8:39 PM, Eric Yang <eyang@yahoo-inc.com> wrote:
>>>>> It works fine after I put SocketTeeWriter first.  What needs to be
>>>>> implemented in SeqFileWriter to be able to pipe correctly?
>>>>> 
>>>>> Regards,
>>>>> Eric
>>>>> 
>>>>> On 12/17/09 5:26 PM, "asrabkin@gmail.com" <asrabkin@gmail.com>
wrote:
>>>>> 
>>>>>> Put the SocketTeeWriter first.
>>>>>> 
>>>>>> sent from my iPhone; please excuse typos and brevity.
>>>>>> 
>>>>>> On Dec 17, 2009, at 8:12 PM, Eric Yang <eyang@yahoo-inc.com>
wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'd setup SocketTeeWriter by itself, and having data stream to
next
>>>>>>> socket
>>>>>>> reader program.  When I tried to configure two writers, i.e.,
>>>>>>> SeqFileWriter
>>>>>>> follow by SocketTeeWriter.  It doesn't work because SeqFileWriter
>>>>>>> isn't
>>>>>>> extending PipelineableWriter.  I went ahead to extend SeqFileWriter
as
>>>>>>> PipelineableWriter and do that and implemented setNextStage method,
>>>>>>> and
>>>>>>> configured collector with:
>>>>>>> 
>>>>>>>  <property>
>>>>>>>    <name>chukwaCollector.writerClass</name>
>>>>>>> 
>>>>>>> <value>
>>>>>>> org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter</v
>>>>>>> alue>
>>>>>>>  </property>
>>>>>>> 
>>>>>>>  <property>
>>>>>>>    <name>chukwaCollector.pipeline</name>
>>>>>>> 
>>>>>>> <value>
>>>>>>> org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter,org.apac
>>>>>>> he.hadoop.chukwa.datacollection.writer.SocketTeeWriter</value>
>>>>>>>  </property>
>>>>>>> 
>>>>>>> SeqFileWriter writes the data correctly, but when connect to
>>>>>>> SocketTeeWriter, there was no data visible in SocketTeeWriter.
>>>>>>> Commands
>>>>>>> works fine, but data streaming doesn't happen.  How do I configure
the
>>>>>>> collector and PipelineStageWriter to be able to write data into
>>>>>>> multiple
>>>>>>> writer?  Is there something on SeqFileWriter that could prevent
this
>>>>>>> from
>>>>>>> working?
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Eric
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


Mime
View raw message