hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Efficient query to directory-num-files?
Date Mon, 04 Oct 2010 17:41:13 GMT
- I want to know how many files are in a directory.
- Well, actually, I want to know how many files are in a few thousand directories.
- I anticipate the answer to be approximately four million.
- If I were to pipe "hadoop fs -ls | wc" I estimate a return of about 360MBs of textual ls
data to my client (Each hadoop ls entry is about 90B since it is always "ls -l" style), when
all I really want is the file-count.

Is there a smarter way to do this?


Keith Wiley               kwiley@keithwiley.com               www.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
  -- Keith Wiley

View raw message