Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5708D9948 for ; Thu, 29 Nov 2012 16:07:06 +0000 (UTC) Received: (qmail 94861 invoked by uid 500); 29 Nov 2012 16:07:01 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 94519 invoked by uid 500); 29 Nov 2012 16:07:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 94439 invoked by uid 99); 29 Nov 2012 16:06:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2012 16:06:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.47.135.205] (HELO Spam1.prd.mpac.ca) (206.47.135.205) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2012 16:06:53 +0000 Received: from Spam1.prd.mpac.ca (unknown [127.0.0.1]) by IMSVA80 (Postfix) with ESMTP id 4B6CD1D806A for ; Thu, 29 Nov 2012 11:06:25 -0500 (EST) Received: from SMAIL1.prd.mpac.ca (unknown [172.29.2.53]) by Spam1.prd.mpac.ca (Postfix) with ESMTP id 035841D8063 for ; Thu, 29 Nov 2012 11:06:25 -0500 (EST) Received: from SMAIL1.prd.mpac.ca ([fe80::d548:4221:967c:4cfb]) by SMAIL1.prd.mpac.ca ([fe80::18cb:8648:b77f:2b55%11]) with mapi id 14.02.0318.004; Thu, 29 Nov 2012 11:06:24 -0500 From: "Kartashov, Andy" To: "user@hadoop.apache.org" Subject: RE: discrepancy du in dfs are fs Thread-Topic: discrepancy du in dfs are fs Thread-Index: AQHNzXKyzAPLJdIeEkiCL3XCVJpFHJf/p3MAgABaLYCAAOv2gIAAWGwA//+yqRA= Date: Thu, 29 Nov 2012 16:06:23 +0000 Message-ID: References: <20121128141352.201930@gmx.net> <20121128201003.66520@gmx.net> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.29.60.102] Content-Type: multipart/alternative; boundary="_000_BD42F346AE90F544A731516A805D1B8AD8838ESMAIL1prdmpacca_" MIME-Version: 1.0 X-TM-AS-Product-Ver: IMSVA-8.0.0.1304-6.5.0.1024-19402.007 X-TM-AS-Result: No--38.818-5.0-31-10 X-imss-scan-details: No--38.818-5.0-31-10 X-TM-AS-Result-Xfilter: Match text exemption rules:No X-TMASE-MatchedRID: qsaWi0FWcYsFGChUoxoVw750lYduDghOlAwGs8+h6Ui3iiGYvDmY5Nnf JrUSEbFDQjXnUiTolwztNaq60lqdLvUVVWSA4KqTlVHM/F6YkvS6htirbiw/G+QydRUvl3QTtVX Fv+bhUyQuJZG9pou26Wbz1VNBy7LF4tlohwaDmFNIOSHptb5tx6WO+TWJ05Vp4aROJEypr9x5L2 Bpc0g4Fa2al/aJGe4w3BmKmeEluvURyreEy+j3jOYAh37ZsBDCpbLaqJGHevtqSjxROy+AUw0qg BcwVnIhTGXWkP5ReZvyXH6dEQg1mmHjhWUsEu50lwzEyYDh4ncYEJlBUXThegDt6IeehRm8WAcy Py9Hoj+q6gd5FZtuQ3VALpStBs0qnPecQ/hKOMAAz7oVOD+dyv2pG9o7Zf8hBVxUpH2s9HHl8wL M1NcjZqRGASDuetHK54mbosKUuYfnZVzUukVypGerpxpZwg4L+Gz435tISEHimKcLRvsB1U92x1 c6E6fkF5YdjcsqfMe0XdFsGUlXTTScgMqgJnG/Qesjq8XPMbuRnWAkU86ftwhXVCKc/ywVUdfEK c10rU54nNRC+UXW57edbQ6xZg3Ke4t+BIGa1PQvj6wHfIGxybe4hC1yw4datB9ItN9lSLzdKUSB W7I322+5ieh24ZYRx5A9zFPaSN1Kpico48MCFWIG9aXo+qLX+tBX6X5rLKD7fNsbDZdN6Vfy1Fm w+/T047y09EBvYwA= X-Virus-Checked: Checked by ClamAV on apache.org --_000_BD42F346AE90F544A731516A805D1B8AD8838ESMAIL1prdmpacca_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MySQL format is in Mb. I run the following statement : SELECT TABLE_NAME, table_rows, data_length, index_length, round(((data_leng= th + index_length) / 1024 / 1024),2) 'Size in MB'.... Mahesh, good point! My Sqoop actually uses sequencefile for output format. = Wow, it is pretty good space saving after all, sweet.... From: Mahesh Balija [mailto:balijamahesh.mca@gmail.com] Sent: Thursday, November 29, 2012 10:31 AM To: user@hadoop.apache.org Subject: Re: discrepancy du in dfs are fs Hi Andy, I am not very sure, but you can look what format (I mean bytes/kb/mb= etc) your mysql size is in. Based on that you may conclude or may be mysql is storing some addit= ional metadata which could be the reason for difference. One more possibility could be whether your HDFS data is compressed/s= equence data. Best, Mahesh Balija, Calsoft Labs. On Thu, Nov 29, 2012 at 8:48 PM, Kartashov, Andy > wrote: I also show some discrepancy Sqoop'ing data from MySQL. Both MySQL "select= count(*) from.." and "sqoop -eval -query "select count(*).." return equa= l number of rows. But after importing the data into hdfs , hadoop fs -du sh= ows imported data at roughly 1/2 the size of the actual table size in the = MySQL DB. Is that normal? Cheers. -----Original Message----- From: "Christoph B=F6hm" [mailto:listenbruder@gmx.net] Sent: Wednesday, November 28, 2012 3:10 PM To: user@hadoop.apache.org Subject: Re: discrepancy du in dfs are fs You're right. "du -b" returns the expected value. Thanks. Chris -------- Original-Nachricht -------- > Datum: Wed, 28 Nov 2012 20:17:18 +0530 > Von: Mahesh Balija > > An: user@hadoop.apache.org > Betreff: Re: discrepancy du in dfs are fs > Hi Chris, > > Can you try the following in your local machine, > > du -b myfile.txt > > and compare this with the hadoop fs -du myfile.txt. > > Best, > Mahesh Balija, > Calsoft Labs. > > On Wed, Nov 28, 2012 at 7:43 PM, > wrote: > > > > > Hi all, > > > > I wonder wy there is a difference between "du" on HDFS and "get" + "du" > on > > my local machnine. > > > > Here is an example: > > > > hadoop fs -du myfile.txt > > > 81355258 > > > > hadoop fs -get myfile.txt . > > du myfile.txt > > > 34919 > > > > --- nevertheless --- > > > > hadoop fs -cat myfile.txt | wc -l > > > 4789943 > > > > cat myfile.txt | wc -l > > > 4789943 > > > > > > Any idea? > > Thanks. > > Chris > > NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before pri= nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui= l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteur et peu= vent =EAtre couverts par le secret professionnel. Toute utilisation, copie = ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le desti= nataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement l= 'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before pri= nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui= l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteur et peu= vent =EAtre couverts par le secret professionnel. Toute utilisation, copie = ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le desti= nataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement l= 'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel --_000_BD42F346AE90F544A731516A805D1B8AD8838ESMAIL1prdmpacca_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

MySQL format is in Mb. = I run the following statement :

SELECT TABLE_NAME, tab= le_rows, data_length, index_length, round(((data_length + index_length)= / 1024 / 1024),2) 'Size in MB'….

 

Mahesh, good point! My= Sqoop actually uses sequencefile for output format. Wow, it is pretty good= space saving after all, sweet….

 

From: Mahesh Balija [mailto:balijamahesh.mca@gmail.com]
Sent: Thursday, November 29, 2012 10:31 AM
To: user@hadoop.apache.org
Subject: Re: discrepancy du in dfs are fs

 

Hi Andy,

 

       I am not very s= ure, but you can look what format (I mean bytes/kb/mb etc) your mysql size = is in.

       Based on that y= ou may conclude or may be mysql is storing some additional metadata which c= ould be the reason for difference.

 

       One more p= ossibility could be whether your HDFS data is compressed/sequence data.

 

Best,

Mahesh Balija,

Calsoft Labs.

On Thu, Nov 29, 2012 at 8:48 PM, Kartashov, Andy <= ;Andy.Kartashov= @mpac.ca> wrote:

I also show some discrepancy Sqoop'ing data from MyS= QL.  Both MySQL "select count(*)  from.." and "sqo= op -eval -query "select count(*).."  return equal number of = rows. But after importing the data into hdfs , hadoop fs -du shows imported data at roughly  1/2 the size of the actual table size in the MySQL D= B.  Is that normal?

Cheers.



-----Original Message-----
From: "Christoph B=F6hm" [mailto:listenbruder@gmx.net]
Sent: Wednesday, November 28, 2012 3:10 PM
To: user@hadoop.apache.org Subject: Re: discrepancy du in dfs are fs


You're right.
"du -b" returns the expected value.

Thanks.
Chris

-------- Original-Nachricht --------
> Datum: Wed, 28 Nov 2012 20:17:18 +0530
> Von: Mahesh Balija <b= alijamahesh.mca@gmail.com>
> An: user@hadoop.apache.org
> Betreff: Re: discrepancy du in dfs are fs

> Hi Chris,
>
>           Can you try the following in your l= ocal machine,
>
>                du -b myfile.tx= t
>
>           and compare this with the hadoop fs= -du myfile.txt.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
> On Wed, Nov 28, 2012 at 7:43 PM, <
listenbruder@gmx.net> wrote:
>
> >
> > Hi all,
> >
> > I wonder wy there is a difference between "du" on HDFS = and "get" + "du"
> on
> > my local machnine.
> >
> > Here is an example:
> >
> > hadoop fs -du myfile.txt
> > > 81355258
> >
> > hadoop fs -get myfile.txt .
> > du myfile.txt
> > > 34919
> >
> > --- nevertheless ---
> >
> > hadoop fs -cat  myfile.txt | wc -l
> > > 4789943
> >
> > cat myfile.txt | wc -l
> > > 4789943
> >
> >
> > Any idea?
> > Thanks.
> > Chris
> >

NOTICE: This e-mail message and any attachments are = confidential, subject to copyright and may be privileged. Any unauthorized = use, copying or disclosure is prohibited. If you are not the intended recip= ient, please delete and contact the sender immediately. Please consider the environment before printing this e= -mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui l'accompagn= e sont confidentiels, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre = couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris=E9e est interdite. Si vous n= '=EAtes pas le destinataire pr=E9vu de ce courriel, supprimez-le et contact= ez imm=E9diatement l'exp=E9diteur. Veuillez penser =E0 l'environnement avan= t d'imprimer le pr=E9sent courriel

 

NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr= =E9sent courriel et toute pi=E8ce jointe qui l'accompagne sont confidentiel= s, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre couverts par le sec= ret professionnel. Toute utilisation, copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le dest= inataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement = l'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel --_000_BD42F346AE90F544A731516A805D1B8AD8838ESMAIL1prdmpacca_--