cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Decker <decker.christ...@gmail.com>
Subject Re: Rows missing after new node bootstrapped
Date Wed, 17 Nov 2010 21:56:05 GMT
On Tue, Nov 16, 2010 at 6:58 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> I'm pretty sure that "reading an index" and "using pig" are not
> compatible right now.  the m/r support that pig builds on always does
> sequential-scan range queries.
>
Yes it does, I have a specialized LoadFunc to read and load manually
maintained indices (pre-0.7 style), and it works like a charm as long as I
don't do nodetool loadbalance or add new nodes to the cluster.

>
> can you see the missing rows if you do a normal get_slice query for it
> without pig?
>
They are empty, I suspect that the "eventual" in "eventual consistency" hit
me in the head, the empty rows are disappearing at an incredibly slow rate,
I guess it's repairing in the background, but it's taking forever
(100'000'000 rows in the cluster, 2 nodes added and after 3 days it's still
not done migrating to the new nodes).

Could this actually be the case?

Regards,
Chris

B.T.W.: M/R and indices might mix well if we can just fetch the size of the
index, and then we could create the splits telling them to "fetch from index
starting from col n and fetch a max of m" any plans on implementing it?

>
> On Mon, Nov 15, 2010 at 7:03 AM, Christian Decker
> <decker.christian@gmail.com> wrote:
> > I'm using tag cassandra-0.7.0-beta3. I wouldn't know why I need range
> scans
> > since I perform a multi_get on the indexed keys.
> >
> > Regards,
> > Chris
> >
> > On Sun, Nov 14, 2010 at 9:51 AM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>
> >> Are you using a version with working range scans?
> >>
> >> On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker
> >> <decker.christian@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I'm having some doubts about the current state of my cluster. I
> started
> >> > with
> >> > one node, filled it with some 10 million rows, then flushed and
> >> > compacted
> >> > the node. Then I ran a small pig script that read an index and fetched
> >> > the
> >> > matching rows, no problem until this point. Now I add a new node with
> >> > AutoBootStrap turned on, it all seems to work as it chooses a token to
> >> > take
> >> > over some of the first nodes responsibilities, it seems to transfer
> all
> >> > the
> >> > relevant data and everything looks fine. Now if I run the pig script
> >> > again
> >> > it'll produce many empty rows, which points me to believe that these
> >> > rows
> >> > were read from the new node which doesn't yet have the corresponding
> >> > data.
> >> > Now this puzzles me, since I thought the bootstrap would transfer the
> >> > needed
> >> > data, will this eventually return to give me no empty rows or have I
> >> > done
> >> > something terribly wrong?
> >> >
> >> > Regards,
> >> > Chris
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message