hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Strange input read behavior
Date Fri, 16 Nov 2012 06:30:28 GMT
TextInputFormat has no problem so I was able to test Graph jobs with
desired number of tasks.

I prefer the first solution.

On Fri, Nov 16, 2012 at 3:00 PM, Thomas Jungblut
<thomas.jungblut@gmail.com> wrote:
> The problem is, that the tasks were too many for the input to split.
> There are two ways to solve this:
> - make the splitter honor the record boundary, which basically means we
> have to determine the number of records before splitting which is crazy
> - remove the functionality for a "goal-size" and make the split totally
> based on the filesystem's blocksize, which is safer because it takes care
> of record boundaries.
>
> You choose, I would be +1 on the least - we would have to make the error be
> more transparent to users when a job can't be scheduled then.
>
> 2012/11/16 Edward J. Yoon <edwardyoon@apache.org>
>
>> I didn't look at SequenceFile closely when I implement I/O system. So,
>> don't know exactly.
>>
>> FYI, https://twitter.com/QwertyManiac/status/269093180220272640
>>
>> On Thu, Nov 15, 2012 at 11:57 PM, Thomas Jungblut
>> <thomas.jungblut@gmail.com> wrote:
>> > Most interestingly is that we took the stuff from Hadoop, so the bug must
>> > also be contained in Hadoop.
>> >
>> > 2012/11/15 Edward J. Yoon <edwardyoon@apache.org>
>> >
>> >> I think, we have to fix InputFormatters, BSPJobClient, and splitter in
>> >> FileInputFormat (+ unit tests). I'm not sure when can I do it.
>> >>
>> >> On Thu, Nov 15, 2012 at 10:58 PM, Tommaso Teofili
>> >> <tommaso.teofili@gmail.com> wrote:
>> >> > I've tried and it works with a small no of tasks (< 19) but it fails
>> if
>> >> > it's not set (so getting the default behavior).
>> >> > I'm not sure I understand the rationale of the fix without going
>> deeper
>> >> > into the code, I'm just concerned if this is just a corner case or
may
>> >> > affect some others which would be bad.
>> >> > I see that adding some more lines to my test file the error doesn't
>> occur
>> >> > anymore ...
>> >> >
>> >> > If that is not a major issue but just a corner case then it's ok
>> >> otherwise
>> >> > I think it'd be better to fix before releasing.
>> >> > Regards,
>> >> > Tommaso
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > 2012/11/15 Edward J. Yoon <edwardyoon@apache.org>
>> >> >
>> >> >> > Tommaso, your job works with different 'tasknum' correctly
for same
>> >> >> input?
>> >> >>
>> >> >> Not working. (and I found HAMA-476)
>> >> >>
>> >> >> Let's release 0.6 first. I'll fix this problem ASAP, then release
>> 0.6.1.
>> >> >>
>> >> >> What do you think?
>> >> >>
>> >> >> On Thu, Nov 15, 2012 at 7:10 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >> >> wrote:
>> >> >> > I've changed only computeGoalSize().
>> >> >> >
>> >> >> >    protected long computeGoalSize(int numSplits, long totalSize)
{
>> >> >> > -    return totalSize / (numSplits == 0 ? 1 : numSplits);
>> >> >> > +    // The minus 1 is for the remainder.
>> >> >> > +    return totalSize / (numSplits <= 1 ? 1 : numSplits
- 1);
>> >> >> >    }
>> >> >> >
>> >> >> > I don't remember exactly what happens if a split is not on
a record
>> >> >> boundary?
>> >> >> >
>> >> >> > Tommaso, your job works with different 'tasknum' correctly
for same
>> >> >> input?
>> >> >> >
>> >> >> > On Thu, Nov 15, 2012 at 6:23 PM, Thomas Jungblut
>> >> >> > <thomas.jungblut@gmail.com> wrote:
>> >> >> >> Edward changed something to the split behavious last night.
Maybe
>> it
>> >> >> broke
>> >> >> >> it.
>> >> >> >>
>> >> >> >> 2012/11/15 Tommaso Teofili <tommaso.teofili@gmail.com>
>> >> >> >>
>> >> >> >>> Hi guys,
>> >> >> >>>
>> >> >> >>> I was just running a couple of tests with GradientDescentBSP
>> when I
>> >> >> >>> realized that using the newly installed RC5 the algorithm
fails
>> at
>> >> its
>> >> >> very
>> >> >> >>> beginning because it seems it cannot read from input.
>> >> >> >>>
>> >> >> >>> java.io.IOException: cannot read input vector size
>> >> >> >>> at
>> >> >> >>>
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.hama.ml.regression.GradientDescentBSP.getXSize(GradientDescentBSP.java:268)
>> >> >> >>>  at
>> >> >> >>>
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.hama.ml.regression.GradientDescentBSP.getInitialTheta(GradientDescentBSP.java:244)
>> >> >> >>> at
>> >> >> >>>
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.hama.ml.regression.GradientDescentBSP.bsp(GradientDescentBSP.java:72)
>> >> >> >>>  at
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:254)
>> >> >> >>> at
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:284)
>> >> >> >>>  at
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> >> >> >>> at
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >> >> >>>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >> >> >>> at
>> >> >>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> >> >> >>>  at
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >> >> >>>  at
>> >> >> >>>
>> >> >> >>>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >> >> >>> at
>> >> >> >>>
>> >> >> >>>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >> >> >>>  at java.lang.Thread.run(Thread.java:680)
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> Since I didn't change anything on that side and it
works with
>> >> >> >>> 0.6.0-SNAPSHOT I wonder if the latest stuff related
to input
>> split
>> >> >> caused
>> >> >> >>> problems.
>> >> >> >>>
>> >> >> >>> WDYT?
>> >> >> >>>
>> >> >> >>> Tommaso
>> >> >> >>>
>> >> >> >>> p.s.:
>> >> >> >>> I noticed this just after my +1 on the RC vote but
please keep
>> it on
>> >> >> hold
>> >> >> >>> while we track this issue
>> >> >> >>>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Best Regards, Edward J. Yoon
>> >> >> > @eddieyoon
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Best Regards, Edward J. Yoon
>> >> >> @eddieyoon
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message