hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Katzenellenbogen <mich...@cloudera.com>
Subject Re: discrepancy du in dfs are fs
Date Thu, 29 Nov 2012 15:28:28 GMT
It's quite possible that it's normal, considering the MySQL will use
additional space for indexes, table definitions, etc.

You should be able to validate this fairly easily by doing a mysqldump
on the data, and comparing the size of the dump to what it store in
HDFS. Those two numbers should roughly be in the same ballpark.

-Michael

On Nov 29, 2012, at 10:19 AM, "Kartashov, Andy" <Andy.Kartashov@mpac.ca> wrote:

> I also show some discrepancy Sqoop'ing data from MySQL.  Both MySQL "select count(*)
 from.." and "sqoop -eval -query "select count(*).."  return equal number of rows. But after
importing the data into hdfs , hadoop fs -du shows imported data at roughly  1/2 the size
of the actual table size in the MySQL DB.  Is that normal?
>
> Cheers.
>
>
> -----Original Message-----
> From: "Christoph Böhm" [mailto:listenbruder@gmx.net]
> Sent: Wednesday, November 28, 2012 3:10 PM
> To: user@hadoop.apache.org
> Subject: Re: discrepancy du in dfs are fs
>
>
> You're right.
> "du -b" returns the expected value.
>
> Thanks.
> Chris
>
> -------- Original-Nachricht --------
>> Datum: Wed, 28 Nov 2012 20:17:18 +0530
>> Von: Mahesh Balija <balijamahesh.mca@gmail.com>
>> An: user@hadoop.apache.org
>> Betreff: Re: discrepancy du in dfs are fs
>
>> Hi Chris,
>>
>>          Can you try the following in your local machine,
>>
>>               du -b myfile.txt
>>
>>          and compare this with the hadoop fs -du myfile.txt.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>> On Wed, Nov 28, 2012 at 7:43 PM, <listenbruder@gmx.net> wrote:
>>
>>>
>>> Hi all,
>>>
>>> I wonder wy there is a difference between "du" on HDFS and "get" + "du"
>> on
>>> my local machnine.
>>>
>>> Here is an example:
>>>
>>> hadoop fs -du myfile.txt
>>>> 81355258
>>>
>>> hadoop fs -get myfile.txt .
>>> du myfile.txt
>>>> 34919
>>>
>>> --- nevertheless ---
>>>
>>> hadoop fs -cat  myfile.txt | wc -l
>>>> 4789943
>>>
>>> cat myfile.txt | wc -l
>>>> 4789943
>>>
>>>
>>> Any idea?
>>> Thanks.
>>> Chris
> NOTICE: This e-mail message and any attachments are confidential, subject to copyright
and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are
not the intended recipient, please delete and contact the sender immediately. Please consider
the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe
qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts
par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite.
Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Mime
View raw message