hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kartashov, Andy" <Andy.Kartas...@mpac.ca>
Subject RE: discrepancy du in dfs are fs
Date Thu, 29 Nov 2012 16:06:23 GMT
MySQL format is in Mb. I run the following statement :
SELECT TABLE_NAME, table_rows, data_length, index_length, round(((data_length + index_length)
/ 1024 / 1024),2) 'Size in MB'....

Mahesh, good point! My Sqoop actually uses sequencefile for output format. Wow, it is pretty
good space saving after all, sweet....

From: Mahesh Balija [mailto:balijamahesh.mca@gmail.com]
Sent: Thursday, November 29, 2012 10:31 AM
To: user@hadoop.apache.org
Subject: Re: discrepancy du in dfs are fs

Hi Andy,

       I am not very sure, but you can look what format (I mean bytes/kb/mb etc) your mysql
size is in.
       Based on that you may conclude or may be mysql is storing some additional metadata
which could be the reason for difference.

       One more possibility could be whether your HDFS data is compressed/sequence data.

Best,
Mahesh Balija,
Calsoft Labs.
On Thu, Nov 29, 2012 at 8:48 PM, Kartashov, Andy <Andy.Kartashov@mpac.ca<mailto:Andy.Kartashov@mpac.ca>>
wrote:
I also show some discrepancy Sqoop'ing data from MySQL.  Both MySQL "select count(*)  from.."
and "sqoop -eval -query "select count(*).."  return equal number of rows. But after importing
the data into hdfs , hadoop fs -du shows imported data at roughly  1/2 the size of the actual
table size in the MySQL DB.  Is that normal?

Cheers.


-----Original Message-----
From: "Christoph Böhm" [mailto:listenbruder@gmx.net<mailto:listenbruder@gmx.net>]
Sent: Wednesday, November 28, 2012 3:10 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: discrepancy du in dfs are fs


You're right.
"du -b" returns the expected value.

Thanks.
Chris

-------- Original-Nachricht --------
> Datum: Wed, 28 Nov 2012 20:17:18 +0530
> Von: Mahesh Balija <balijamahesh.mca@gmail.com<mailto:balijamahesh.mca@gmail.com>>
> An: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
> Betreff: Re: discrepancy du in dfs are fs

> Hi Chris,
>
>           Can you try the following in your local machine,
>
>                du -b myfile.txt
>
>           and compare this with the hadoop fs -du myfile.txt.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
> On Wed, Nov 28, 2012 at 7:43 PM, <listenbruder@gmx.net<mailto:listenbruder@gmx.net>>
wrote:
>
> >
> > Hi all,
> >
> > I wonder wy there is a difference between "du" on HDFS and "get" + "du"
> on
> > my local machnine.
> >
> > Here is an example:
> >
> > hadoop fs -du myfile.txt
> > > 81355258
> >
> > hadoop fs -get myfile.txt .
> > du myfile.txt
> > > 34919
> >
> > --- nevertheless ---
> >
> > hadoop fs -cat  myfile.txt | wc -l
> > > 4789943
> >
> > cat myfile.txt | wc -l
> > > 4789943
> >
> >
> > Any idea?
> > Thanks.
> > Chris
> >
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and
may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not
the intended recipient, please delete and contact the sender immediately. Please consider
the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe
qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts
par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite.
Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and
may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not
the intended recipient, please delete and contact the sender immediately. Please consider
the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe
qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts
par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite.
Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Mime
View raw message