cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <SEAN_R_DUR...@homedepot.com>
Subject RE: Manual Indexing With Buckets
Date Fri, 24 Jul 2015 12:09:35 GMT
It is a bit hard to follow. Perhaps you could include your proposed schema (annotated with
your size predictions) to spur more discussion. To me, it sounds a bit convoluted. Why is
a "batch" so big (up to 100 million rows)? Is a row in the primary only associated with one
batch?


Sean Durity - Cassandra Admin, Big Data Team
To engage the team, create a request<https://portal.homedepot.com/sites/bigdata/SitePages/Big%20Data%20Engagement%20Request.aspx>

From: Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
Sent: Friday, July 24, 2015 3:57 AM
To: user@cassandra.apache.org
Subject: Re: Manual Indexing With Buckets

Can anyone take this one?

Thanks
Anuj

Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

________________________________
From:"Anuj Wadehra" <anujw_2003@yahoo.co.in<mailto:anujw_2003@yahoo.co.in>>
Date:Thu, 23 Jul, 2015 at 10:57 pm
Subject:Manual Indexing With Buckets
We have a primary table and we need search capability by batchid column. So we are creating
a manual index for search by batch id. We are using buckets to restrict a row size in batch
id index table to 50mb. As batch size may vary drastically ( ie one batch id may be associated
to 100k row keys in primary table while other may be associated with 100million row keys),
we are creating a metadata table to track the approximate data while insertions for a batch
in primary table, so that batch id index table has dynamic no of buckets/rows. As more data
is inserted for a batch in primary table, new set of 10 buckets are added. At any point in
time, clients will write to latest 10 buckets created for a batch od index in round robin
 to avoid hotspots.

Comments required on the following:
1. I want to know any suggestios on above design?

2. Whats the best approach for updating/deleting from index table. When a row is manually
purged from primary table, we dont know where that row key exists in x number of buckets created
for its batch id?

Thanks
Anuj

Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android>





________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.

Mime
View raw message