accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: documentation on dealing with legacy Hadoop versions
Date Fri, 03 Jan 2014 16:15:09 GMT

On 1/3/14, 10:55 AM, Sean Busbey wrote:
> Heya!
> Earlier this week we had a user in IRC that was having difficulty running
> 1.5.0 because their classpath didn't include commons-configuration.
> In one case, they just needed to fix their accumulo-site to include hadoop
> 2 paths. In the other, they were using Apache Hadoop 0.20.2, which has no
> commons-configuration.
> Initially, the user thought they were running a CDH3 version. This turned
> out not to be the case, but it so happens that CDH3 also does not have
> commons-configuration provided by Hadoop.
> This interaction pointed out 2 issues, and I'd like some opinions on how to
> handle them before I file jiras and possibly patches.
> 1) We are not sufficiently warning people about the need for durable sync
> Or maybe we're just not getting across when durable sync is available.
> Hadoop versions are nonsensical for most outsiders, so I think we need to
> spell it out in docs. Waiting for users to start an instance and then look
> at a log is insufficient.

I recently did:

If these are still lacking, then we can open tickets for the omissions. 
I thought I had tracked down everything for Apache Hadoop pretty well 
and got appropriate checks for durable.sync/append and synconclose.

(talking to Sean on IRC) We can make a ticket to add stronger warnings 
about 0.20 releases not supporting append/sync correctly and how you 
can/will lose data.

> I'm thinking we need something similar to what HBase has[1].
> My question is, where should I add this? the README seems like a good
> place, since it already talks about enabling durable sync. How about the
> user manual? Both?

Both is probably good. I don't think we have anything on Hadoop versions 
in the user manual (or the administration manual, if that's still a thing).

> 2) Should we document commons-configuration similar to commons-io?
> The README already has a section about how some older versions of Hadoop
> don't have commons-io. I think the versions given need to be tightened up
> given (1) above (since right now it implicitly refers to versions people
> should not be using).
> The only Hadoop distro I know of that both has proper append support and
> does not have commons-configuration is CDH3. In addition to being a
> vendor-specific version, it is no longer supported by said vendor.
> So would it be preferable to
>    2a) add a note after the commons-io section that gives similar
> instructions for adding commons-configuration?
>    2b) file a jira that points out that users on CDH3 won't have commons
> configuration, document the work around on said ticket, close it as won'tfix
> The idea with the latter approach is that it would give searchers a chance
> to find the information and give us somewhere to point people, while not
> adding to our long-term documentation baggage. The downside is that this
> won't be as accessible to users, so it will be more painful for them (esp
> if they don't have regular internet access).

I'm not sure of what's best to do here. 1.6 undid the provided scope on 
those dependencies because 1.5 was such a pain to deal with in this 
regard (at least that's how I remember it). Perhaps a Jira is a good 
reference point and we can link to the ticket which made that change in 
1.6. I doubt most users will find that on their own, but perhaps some 
might and it at least would keep us from having to repeat the same answer.

> -Sean
> [1]:

View raw message