incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirk True <k...@mustardgrain.com>
Subject Re: How to set up HDFS -> MySQL from trunk?
Date Fri, 19 Mar 2010 03:59:31 GMT
Hi Eric,

Awesome - everything's working great now.

So, as you've said, the SQL portion of Chukwa is deprecated, and the
HDFS-based replacement is six months out. What should I do to get the
data from the adapters->collectors->HDFS->HICC? Is the HDFS-based HICC
replacement spec'ed out enough for others to contribute?

Thanks,
Kirk

Eric Yang wrote:
> Hi Kirk,
>
> 1. Host select is currently showing hostname collected from SystemMetrics
> table, hence, you need to have top, iostat, df, sar collected to populate
> SystemMetrics table correctly.  The hostname is also cached in the user
> session, hence you will need to “switch to a different cluster, and switch
> back” or restart hicc to flush the cached hostnames from user session.  The
> hostname selector probably should pickup hostname from a different data
> source in the future release.
>
> 2.  The server should run in UTC.  Timezone was never implemented
> completely.  Hence, server in other timezone will not work correctly.
>
> 3. SQL aggregator (deprecated by the way) running as part of dbAdmin.sh,
> this subsystem will down sample data from weekly table to monthly, yearly,
> decade tables.  I wrote this submodule over a weekend for prototype show and
> tell.  I strongly recommend to avoid SQL part of Chukwa all together.
>
> Regards,
> Eric
>
> On 3/18/10 1:15 PM, "Kirk True" <kirk@mustardgrain.com> wrote:
>
>   
>> Hi Eric,
>>
>> I believe I have most of steps 1-5 working. Data from "/usr/bin/df" is being
>> collected, parsed, stuck into HDFS, and then pulled out again and placed into
>> MySQL. However, HICC isn't showing me my data just yet...
>>
>> The disk_2098_week table is filled out with several entries and looks great.
>> If I select my cluster from the "Cluster Selector" and "Last 12 Hours" from
>> the "Time" widget, the "Disk Statistics" widget still says "No Data
>> available."
>>
>> It appears to be because part of the SQL query includes the host name which is
>> coming across in the SQL parameters as "". However, since the disk_2098_week
>> table properly includes the host name, nothing is returned by the query. Just
>> for grins, I updated the table manually in MySQL to blank out the host names
>> and I get a super cool, pretty graph (which looks great, BTW).
>>
>> Additionally, if I select other time periods such as "Last 1 Hour", I see the
>> query is using UTC or something (at 1:00 PDT, I see the query is using a range
>> of 19:00-20:00). However, the data in MySQL is based on PDT, so no matches are
>> found. It appears that the "time_zone" session attribute contains the value
>> "UTC". Where is this coming from and how can I change it?
>>
>> Problems:
>>
>> 1. How do I get the "Hosts Selector" in HICC to include my host name so that
>> the generated SQL queries are correct?
>> 2. How do I make the "time_zone" session parameter use PDT vs. UTC?
>> 3. How do I populate the other tables, such as "disk_489_month"?
>>
>> Thanks,
>> Kirk
>>
>> Eric Yang wrote: 
>>     
>>>  
>>> Df command is converted into disk_xxxx_week table in mysql, if I remember
>>> correctly.  In mysql are the database tables getting created?
>>> Make sure that you have:
>>>
>>>   <property>
>>>     <name>chukwa.post.demux.data.loader</name>
>>>     
>>> <value>org.apache.hadoop.chukwa.dataloader.MetricDataLoaderPool,org.apache.h
>>> adoop.chukwa.dataloader.FSMDataLoader</value>
>>>   </property>
>>>
>>> In Chukwa-demux.conf.
>>>
>>> The rough picture of the data flows looks like this:
>>>
>>> 1. demux -> Generate chukwa record outputs.
>>> 2. archive -> Generate bigger files by compacting data sink files.
>>>    (Concurrent with step 1)
>>> 3. postProcess -> Look up what files are generated by demux process and
>>>    dispatch using different data loaders.
>>> 4. MetricDataLoaderPool -> Dispatch multiple threads to load chukwa
>>>    record files to different MDL.
>>> 5. MetricDataLoader -> Load sequence file to database by record type
>>>    defined in mdl.xml.
>>> 6. HICC widget has a descriptor language in json.  You can find the widget
>>>    descriptor files in hdfs://namenode:port/chukwa/hicc/widgets which
>>>    embedded the full SQL template like:
>>>
>>>    Query=²select cpu_user_pcnt from [system_metrics] where timestamp between
>>>    [start] and [end]²
>>>
>>>    This will output everything the metrics in JSON format and the HICC
>>>    graphing widget will render the graph.
>>>
>>> If there is no data, look at postProcess.log and make sure the data loading
>>> is not throwing exceptions.  Step 3 to 6 are deprecated, and will be
>>> replaced with something else.  Hope this helps.
>>>
>>> Regards,
>>> Eric
>>>
>>> On 3/17/10 4:16 PM, "Kirk True" <kirk@mustardgrain.com>
>>> <mailto:kirk@mustardgrain.com>  wrote:
>>>
>>>   
>>>  
>>>       
>>>>  
>>>> Hi Eric,
>>>>
>>>> Eric Yang wrote:
>>>>     
>>>>  
>>>>         
>>>>>  
>>>>>  
>>>>> Hi Kirk,
>>>>>
>>>>> I am working on a design which removes MySQL from Chukwa.  I am making
this
>>>>> departure from MySQL because MDL framework was for prototype purpose.
 It
>>>>> will not scale in production system where Chukwa could be host on large
>>>>> hadoop cluster.  HICC will serve data directly from HDFS in the future.
>>>>>
>>>>> Meanwhile, the dbAdmin.sh from Chukwa 0.3 is still compatible with trunk
>>>>> version of Chukwa.  You can load ChukwaRecords using
>>>>> org.apache.hadoop.chukwa.dataloader.MetricDataLoader class or mdl.sh
from
>>>>> Chukwa 0.3.
>>>>>
>>>>>   
>>>>>       
>>>>>  
>>>>>           
>>>>  
>>>> I'm to the point where the "df" example is working and demux is storing
>>>> ChukwaRecord data in HDFS. When I run dbAdmin.sh from 0.3.0, no data is
>>>> getting updated in the database.
>>>>
>>>> My question is: what's the process to get a custom Demux implementation to
>>>> be
>>>> viewable in HICC? Are the database tables magically created and populated
>>>> for
>>>> me? Does HICC generate a widget for me?
>>>>
>>>> HICC looks very nice, but when I try to add a widget to my dashboard, the
>>>> preview always reads, "No Data Available." I'm running
>>>> $CHUKWA_HOME/bin/start-all.sh followed by $CHUKWA_HOME/bin/dbAdmin.sh (which
>>>> I've manually copied to the bin directory).
>>>>
>>>> What am I missing?
>>>>
>>>> Thanks,
>>>> Kirk
>>>>
>>>>     
>>>>  
>>>>         
>>>>>  
>>>>>  
>>>>> MetricDataLoader class will be mark as deprecated, and it will not be
>>>>> supported once we make transition to Avro + Tfile.
>>>>>
>>>>> Regards,
>>>>> Eric
>>>>>
>>>>> On 3/15/10 11:56 AM, "Kirk True" <kirk@mustardgrain.com>
>>>>> <mailto:kirk@mustardgrain.com>
>>>>> <mailto:kirk@mustardgrain.com>  wrote:
>>>>>
>>>>>   
>>>>>  
>>>>>       
>>>>>  
>>>>>           
>>>>>>  
>>>>>>  
>>>>>> Hi all,
>>>>>>
>>>>>> I recently switched to trunk as I was experiencing a lot of issues
with
>>>>>> 0.3.0. In 0.3.0, there was a dbAdmin.sh script that would run and
try to
>>>>>> stick data in MySQL from HDFS. However, that script is gone and when
I
>>>>>> run the system as built from trunk, nothing is ever populated in
the
>>>>>> database. Where are the instructions for setting up the HDFS ->
MySQL
>>>>>> data migration for HICC?
>>>>>>
>>>>>> Thanks,
>>>>>> Kirk
>>>>>>     
>>>>>>  
>>>>>>         
>>>>>>  
>>>>>>             
>>>>>  
>>>>>  
>>>>>
>>>>>   
>>>>>       
>>>>>  
>>>>>           
>>>>  
>>>>         
>>>  
>>>
>>>   
>>>       
>
>   

Mime
View raw message