hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sasha Dolgy <sdo...@gmail.com>
Subject Re: large files vs many files
Date Sat, 09 May 2009 07:44:44 GMT
Would WritableFactories not allow me to open one outputstream and continue
to write() and sync() ?

Maybe I'm reading into that wrong.  Although UUID would be nice, it would
still leave me in the problem of having lots of little files instead of a
few large files.

-sd

On Sat, May 9, 2009 at 8:37 AM, jason hadoop <jason.hadoop@gmail.com> wrote:

> You must create unique file names, I don't believe (but I do not know) that
> the append could will allow multiple writers.
>
> Are you writing from within a task, or as an external application writing
> into hadoop.
>
> You may try using UUID,
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html, as part of
> your
> filename.
> Without knowing more about your goals, environment and constraints it is
> hard to offer any more detailed suggestions.
> You could also have an application aggregate the streams and write out
> chunks, with one or more writers, one per output file.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message