hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Fiala <e...@fiala.ca>
Subject Re: Weird NPE at TaskLogAppender.flush()
Date Fri, 28 Oct 2011 12:47:14 GMT
Marco,
I'm not familiar with terrier - however, I do notice that the download
package includes [ hadoop-0.20.2+228-core.jar ] - try changing that out for
the jar provided in the distribution.
If that doesn't fix it, look into the other jars provided (or make sure the
ones from your hadoop distro are being sourced prior to those) - your error
on pastebin feels alot like a slight version mismatch.

hth

EF

On Thu, Oct 27, 2011 at 10:43 AM, Marco Didonna <m.didonna86@gmail.com>wrote:

> Hello everybody,
> I am working on Terrier (www.terrier.org) an IR toolkit that leverages
> hadoop for indexing large amount of data (ie documents). I am working
> both local with a small subset of the whole dataset and on amazon EC2
> with the full size dataset. I am experiencing a weird (at least to me)
> exception which occurs always at 66% of the map phase. Here's the log
> http://pastebin.com/XtUkHFYE. I really have no idea where the problem
> could be.
> From the original Terrier3.5 I've only modified the inputformat which
> is used to read the collection of document: I use a custom
> sequencefileinputformat in order to process a custom sequence file
> made up of all the tiny documents of the trec collection (a standard
> document collection used in IR).
> I guess the problem is not here since even using unmodified version of
> terrier I get the same error. In that case, however, there is no
> failure maybe because the authors of terrier use MultiFileCollection.
>
> I'd love to hear from somebody since when running the indexing job on
> the whole dataset the jobs fails because this error happens more than
> once. In pseudo mode, after a failure the job is completed ... on the
> cloud it isn't.
>
> Thanks for your time
>
> Marco Didonna
>
> PS: I use both locally and on the cloud latest version of cloudera
> distribution for hadoop
>



-- 
*Eric Fiala*
*Fiala Consulting*
T: 403.828.1117
E: eric@fiala.ca
http://www.fiala.ca

Mime
View raw message