incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apoorva Gaurav <apoorva.gau...@myntra.com>
Subject Re: Read performance in map data type
Date Thu, 03 Apr 2014 07:44:39 GMT
client side socket limit : 64K
client side maximum connection per host : 8
read consistency level : Quorum


On Thu, Apr 3, 2014 at 12:59 PM, Shrikar archak <shrikar84@gmail.com> wrote:

> How about the client side socket limits? Cassandra client side maximum
> connection per host and read consistency level?
>
> ~Shrikar
>
>
> On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav <apoorva.gaurav@myntra.com
> > wrote:
>
>> At the client side we are getting a latency of ~350ms, we are using
>> datastax driver 2.0.0 and have kept the fetch size as 500. And these are
>> coming while reading rows having ~200 columns.
>>
>>
>> On Thu, Apr 3, 2014 at 12:45 PM, Shrikar archak <shrikar84@gmail.com>wrote:
>>
>>> Hi Apoorva,
>>> As per the cfhistogram there are some rows which have more than 75k
>>> columns and around 150k reads hit 2 SStables.
>>>
>>> Are you sure that you are seeing more than 500ms latency?  The
>>> cfhistogram should the worst read performance was around 51ms
>>> which looks reasonable with many reads hitting 2 sstables.
>>>
>>> Thanks,
>>> Shrikar
>>>
>>>
>>> On Wed, Apr 2, 2014 at 11:30 PM, Apoorva Gaurav <
>>> apoorva.gaurav@myntra.com> wrote:
>>>
>>>> Hello Shrikar,
>>>>
>>>> We are still facing read latency issue, here is the histogram
>>>> http://pastebin.com/yEvMuHYh
>>>>
>>>>
>>>> On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav <
>>>> apoorva.gaurav@myntra.com> wrote:
>>>>
>>>>> Hello Shrikar,
>>>>>
>>>>> Yes primary key is (studentID, subjectID). I had dropped the test
>>>>> table, recreating and populating it post which will share the cfhistogram.
>>>>> In such case is there any practical limit on the rows I should fetch,
for
>>>>> e.g.
>>>>> should I do
>>>>>        select * form marks_table where studentID = ? limit 500;
>>>>> instead of doing
>>>>>        select * form marks_table where studentID = ?;
>>>>>
>>>>>
>>>>> On Sat, Mar 29, 2014 at 5:20 AM, Shrikar archak <shrikar84@gmail.com>wrote:
>>>>>
>>>>>> Hi Apoorva,
>>>>>>
>>>>>> I assume this is the table with studentId and subjectId  as primary
>>>>>> keys and not other like like marks in that.
>>>>>>
>>>>>> create table marks_table(studentId int, subjectId int, marks int,
>>>>>> PRIMARY KEY(studentId,subjectId));
>>>>>>
>>>>>> Also could you give the cfhistogram stats?
>>>>>>
>>>>>> nodetool cfhistograms <your keyspace> marks_table;
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Shrikar
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav <
>>>>>> apoorva.gaurav@myntra.com> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> We've a schema which can be modeled as (studentID, subjectID,
marks)
>>>>>>> where combination of studentID and subjectID is unique. Number
of studentID
>>>>>>> can go up to 100 million and for each studentID we can have up
to  10k
>>>>>>> subjectIDs.
>>>>>>>
>>>>>>> We are using apahce cassandra 2.0.4 and datastax java driver
>>>>>>> 1.0.4. We are using a four node cluster, each having 24 cores
and 32GB
>>>>>>> memory. I'm sure that the machines are not underperformant as
on same test
>>>>>>> bed we've consistently received <5ms response times for ~1b
documents when
>>>>>>> queried via primary key.
>>>>>>>
>>>>>>> I've tried three approaches, all of which result in significant
>>>>>>> deterioration (>500 ms response time) in read query performance
once number
>>>>>>> of subjectIDs goes past ~100 for a studentID. Approaches are
:-
>>>>>>>
>>>>>>> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map<int,
>>>>>>> int>) and query by subjectID
>>>>>>>
>>>>>>> 2. model as (studentID int, subjectID int, marks int, PRIMARY
>>>>>>> KEY(studentID, subjectID) and query as select * from marks_table
where
>>>>>>> studentID = ?
>>>>>>>
>>>>>>> 3. model as (studentID int, subjectID int, marks int, PRIMARY
>>>>>>> KEY(studentID, subjectID) and query as select * from marks_table
where
>>>>>>> studentID = ? and subjectID in (?, ?, ?....?)  number of subjectIDs
in
>>>>>>> query being ~1K.
>>>>>>>
>>>>>>> What can be the bottlenecks. Is it better if we model as (studentID
>>>>>>> int, subjct_marks_json text) and query by studentID.
>>>>>>>
>>>>>>> --
>>>>>>> Thanks & Regards,
>>>>>>> Apoorva
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Apoorva
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Apoorva
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Apoorva
>>
>
>


-- 
Thanks & Regards,
Apoorva

Mime
View raw message