hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim R. Wilson" <wilson.ji...@gmail.com>
Subject Re: can hadoop read files backwards
Date Thu, 17 Jul 2008 21:59:38 GMT
> does wordcount get the lines in order? or are they random? can i have
> hadoop return them in reverse order?

You can't really depend on the order that the lines are given - it's
best to think of them as random.  The purpose of MapReduce/Hadoop is
to distribute a problem among a number of cooperating nodes.

The idea is that any given line can be interpreted separately,
completely independent of any other line.  So in wordcount, this makes
sense.  For example, say you and I are nodes. Each of us gets half the
lines in a file and we can count the words we see and report on them -
it doesn't matter what order we're given the lines, or which lines
we're given, or even whether we get the same number of lines (if
you're faster at it, or maybe you get shorter lines, you may get more
lines to process in the interest of saving time).

So if the project you're working on requires getting the lines in a
particular order, then you probably need to rethink your approach. It
may be that hadoop isn't right for your problem, or maybe that the
problem just needs to be attacked in a different way.  Without knowing
more about what you're trying to achieve, I can't offer any specifics.

Good luck!

-- Jim

On Thu, Jul 17, 2008 at 4:41 PM, Elia Mazzawi
<elia.mazzawi@casalemedia.com> wrote:
> I have a program based on wordcount.java
> and I have files that are smaller than 64mb files (so i believe each file is
> one task )
> do does wordcount get the lines in order? or are they random? can i have
> hadoop return them in reverse order?
> Jim R. Wilson wrote:
>> It sounds to me like you're talking about hadoop streaming (correct me
>> if I'm wrong there).  In that case, there's really no "order" to the
>> lines being doled out as I understand it.  Any given line could be
>> handed to any given mapper task running on any given node.
>> I may be wrong, of course, someone closer to the project could give
>> you the right answer in that case.
>> -- Jim R. Wilson (jimbojw)
>> On Thu, Jul 17, 2008 at 4:06 PM, Elia Mazzawi
>> <elia.mazzawi@casalemedia.com> wrote:
>>> is there a way to have hadoop hand over the lines of a file backwards to
>>> my
>>> mapper ?
>>> as in give the last line first.

View raw message