incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Solovyov <boris.solov...@gmail.com>
Subject Re: Nodetool doesn't shows two nodes
Date Mon, 18 Feb 2013 14:55:49 GMT
I don't think it is cloud at all, and I am no newcomer to sysadmin (though
am relative new to AWS cloud). The mistake is clearly mine, but also
clearly easy to make -- so I assume a lot of other people must make it too.
But the logs don't provide any guidance. Or is this another mistake I make,
which preventes the logs helping me?

Another thing I saw, by the way, the docs say you can clear out your
Cassandra data by "rm -rf /var/lib/cassandra/*" but actually it should be
/var/lib/cassandra/*/*. In RHEL setup, /var/lib/cassandra has root
ownership, so if you remove the 3 subdirectory  under it, Cassandra can't
create directories it needs to run. (I guess the docs assume you are
starting Cassandra by executing the binary, running it as root.) In any
case, if you then start the service, and the directories doesn't exists,
Cassandra dies. But what is the log message? Something totally obscure that
has nothing to do with "my data directory doesn't exists and I can't make
it" :-D

Dont' get me wrong, I am not complaining, so far things are going as well
as I expect from learning complex new opensource software! But just to
point out although all software cannot be perfect documented and have
perfect log messages, there probably 5% of problems/mistakes that made 95%
of time, and "firewall not open on port 7000" or "data directory not there"
seems in those, good idea to have helpful specific log messages. Now, that
my opinion, what is yours, should I file feature requests? As newcomer to
Cassandra I don't want to just walk in like bull in china shop and start
telling everyone what is wrong they should fix. To MAKE ME HAPPY :-D


On Mon, Feb 18, 2013 at 9:44 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> These issues are more cloud specific then they are cassandra specific.
> Cloud executives tell me in white papers that cloud is awesome and you
> can fire all your sysadmins and network people and save money.
>
> This is what happens when you believe cloud executives and their white
> papers, you spend 10+ hours troubleshooting cloud networking problems.
>
> On Mon, Feb 18, 2013 at 9:12 AM, Boris Solovyov
> <boris.solovyov@gmail.com> wrote:
> > I think it is actually more of a problem that there were no error
> messages
> > or other indication of what went wrong in the setup where the nodes
> couldn't
> > contact. Should I file issue report on this? Clearly Cassandra must have
> > tried to contact some IP on port 7000 and failed. Why didn't it log? That
> > would have saved me about 10 hours :-P
> >
> >
> > On Sun, Feb 17, 2013 at 11:54 PM, Jared Biel <
> jared.biel@bolderthinking.com>
> > wrote:
> >>
> >> This is something that I found while using the multi-region snitch -
> >> it uses public IPs for communication. See the original ticket here:
> >> https://issues.apache.org/jira/browse/CASSANDRA-2452. It'd be nice if
> >> it used the private IPs to communicate with nodes that are in the same
> >> region as itself, but I do not believe this is the case. Be aware that
> >> you will be charged for external data transfer even for nodes in the
> >> same region because the traffic will not fall under their free (for
> >> same AZ) or reduced (for intra-AZ) tiers.
> >>
> >> If you continue using this snitch in the mean time, it is not
> >> necessary (or recommended) to have those ports open to 0.0.0.0/0.
> >> You'll simply need to add the public IPs of your C* servers to the
> >> correct security group(s) to allow access.
> >>
> >> There's something else that's a little strange about the EC2 snitches:
> >> "us-east-1" is (incorrectly) represented as the datacenter "us-east".
> >> Other regions are recognized and named properly (us-west-2 for
> >> example) This is kind-of covered in the ticket here:
> >> https://issues.apache.org/jira/browse/CASSANDRA-4026 I wish it could
> >> be fixed properly.
> >>
> >> Good luck!
> >>
> >>
> >> On 17 February 2013 16:16, Boris Solovyov <boris.solovyov@gmail.com>
> >> wrote:
> >> > OK. I got it. I realized that storage_port wasn't actually open
> between
> >> > the
> >> > nodes, because it is using the public IP. (I did find this information
> >> > in
> >> > the docs, after looking more... it is in section on "Types of
> snitches."
> >> > It
> >> > explains everything I found by try and error.)
> >> >
> >> > After opening this port 7000 to all IP addresses, the cluster boots OK
> >> > and
> >> > the two nodes see each other. Now I have the happy result. But my
> nodes
> >> > are
> >> > wide open to the entire internet on port 7000. This is a serious
> >> > problem.
> >> > This obviously can't be put into production.
> >> >
> >> > I definitely need cross-continent deployment. Single AZ or single
> region
> >> > deployment is not going to be enough. How do people solve this in
> >> > practice?
> >
> >
>

Mime
View raw message