From user-return-22644-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Dec 1 02:35:26 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 75E0C722D for ; Thu, 1 Dec 2011 02:35:26 +0000 (UTC) Received: (qmail 5023 invoked by uid 500); 1 Dec 2011 02:35:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4971 invoked by uid 500); 1 Dec 2011 02:35:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4963 invoked by uid 99); 1 Dec 2011 02:35:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 02:35:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [64.13.192.36] (HELO cl27.gs01.gridserver.com) (64.13.192.36) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 02:35:17 +0000 Received: from c-24-7-82-252.hsd1.ca.comcast.net ([24.7.82.252]:50600 helo=[192.168.26.105]) by cl27.gs01.gridserver.com with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.69) (envelope-from ) id 1RVwUD-00084n-9M for user@cassandra.apache.org; Wed, 30 Nov 2011 18:34:57 -0800 Message-ID: <4ED6E7AE.50702@syncopated.net> Date: Wed, 30 Nov 2011 18:34:22 -0800 From: Deno Vichas User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: data modeling question References: <4ED698C0.60108@syncopated.net> <4ED6A40A.2010008@syncopated.net> In-Reply-To: Content-Type: multipart/alternative; boundary="------------020609020503050603000301" X-Authenticated-User: 32415 deno@syncopated.net X-Spam-Level: X-Old-Spam-Status: "score=-0.4 tests=ALL_TRUSTED, HTML_MESSAGE version=3.1.7" This is a multi-part message in MIME format. --------------020609020503050603000301 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit here's what i ended up, this seems to work for me. @Test public void readAndWriteSettingTTL() throws Exception { int ttl = 2; String columnFamily = "Quote"; Set symbols = new HashSet(){{ add("appl"); add("goog"); add("ibm"); add("csco"); }}; UUID timeUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); Mutator mutator = HFactory.createMutator(_keyspace, _stringSerializer); for(String symbol : symbols) addInsertionToMutator(columnFamily, timeUUID, mutator, symbol, ttl); mutator.execute(); RangeSlicesQuery rangeSlicesQuery = HFactory.createRangeSlicesQuery(_keyspace, _stringSerializer, _uuidSerializer, _stringSerializer); rangeSlicesQuery.setColumnFamily(columnFamily); rangeSlicesQuery.setKeys("", ""); rangeSlicesQuery.setRange(null, null, false, 1); QueryResult> result = rangeSlicesQuery.execute(); UUID uuid = result.get().getList().get(0).getColumnSlice().getColumns().get(0).getName(); Assert.assertEquals("UUID should be the same", timeUUID, uuid); Assert.assertEquals("We should have 4 records", 4, result.get().getList().size()); Thread.sleep(5000); // wait till TTL hits to make sure keys are getting flushed. QueryResult> result2 = rangeSlicesQuery.execute(); for(Row row : result2.get().getList()) { Assert.assertEquals("We should have no records", 0, row.getColumnSlice().getColumns().size()); } } private void addInsertionToMutator(String columnFamily, UUID columnName, Mutator mutator, String symbol, int ttl) { mutator.addInsertion(symbol, columnFamily, HFactory.createColumn(columnName, "", ttl, _uuidSerializer, _stringSerializer)); } On 11/30/2011 1:56 PM, David McNelis wrote: > You wouldn't query for all the keys that have a column name x exactly. > Instead what you would do is for sector x grab your list of symbols > S. Then you would get the last column for each of those symbols > (which you do in different ways depending on the API), and then test > if that date is within your threshold. If not, it goes into your list > of symbols to fetch. > > Alternatively, you could iterate over the symbols grabbing data where > the date is between range A and B, if you get an empty set / no > columns returned, then you need to re-pull for that symbol. Does that > make sense? > > Either way you end up hitting on each of the individual symbols. > Maybe someone else has a better idea of how to structure the data for > that particular use case. > > On Wed, Nov 30, 2011 at 3:45 PM, Deno Vichas > wrote: > > with the quote CF below how would one query for all keys that have > a column name value that have a timeuuid of later than x minutes? > i need to be able to find all symbols that have not been fetch in > x minutes by sector. i know i get list of symbol by sector from > my sector CF. > > thanks, > deno > > > On 11/30/2011 1:07 PM, David McNelis wrote: > > > Then I would have a column family for quotes where I have the > key as the symbol, the column name as the timestamp, the value > as the quote: > > quote : { > key: symbol > column names: timeuuid > column values: quote at that time for that symbol > } > > > > > > > -- > *David McNelis* > Lead Software Engineer > Agentis Energy > www.agentisenergy.com > c: 219.384.5143 > > /A Smart Grid technology company focused on helping consumers of > energy control an often under-managed resource./ > > --------------020609020503050603000301 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit here's what i ended up, this seems to work for me.


    @Test
    public void readAndWriteSettingTTL() throws Exception {
        int ttl = 2;
        String columnFamily = "Quote";
        Set<String> symbols = new HashSet<String>(){{
                                                add("appl");
                                                add("goog");
                                                add("ibm");
                                                add("csco");
                                            }};

        UUID timeUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis();

        Mutator<String> mutator = HFactory.createMutator(_keyspace, _stringSerializer);
        for(String symbol : symbols) addInsertionToMutator(columnFamily, timeUUID, mutator, symbol, ttl);
        mutator.execute();

        RangeSlicesQuery<String, UUID, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(_keyspace, _stringSerializer, _uuidSerializer, _stringSerializer);
        rangeSlicesQuery.setColumnFamily(columnFamily);
        rangeSlicesQuery.setKeys("", "");
        rangeSlicesQuery.setRange(null, null, false, 1);
        QueryResult<OrderedRows<String, UUID, String>> result = rangeSlicesQuery.execute();

        UUID uuid = result.get().getList().get(0).getColumnSlice().getColumns().get(0).getName();

        Assert.assertEquals("UUID should be the same", timeUUID, uuid);
        Assert.assertEquals("We should have 4 records", 4, result.get().getList().size());

        Thread.sleep(5000); // wait till TTL hits to make sure keys are getting flushed.

        QueryResult<OrderedRows<String, UUID, String>> result2 = rangeSlicesQuery.execute();
        for(Row<String, UUID, String> row : result2.get().getList()) {
            Assert.assertEquals("We should have no records", 0, row.getColumnSlice().getColumns().size());
        }

    }

    private void addInsertionToMutator(String columnFamily, UUID columnName, Mutator<String> mutator, String symbol, int ttl) {
        mutator.addInsertion(symbol, columnFamily, HFactory.createColumn(columnName, "", ttl, _uuidSerializer, _stringSerializer));
    }


On 11/30/2011 1:56 PM, David McNelis wrote:
You wouldn't query for all the keys that have a column name x exactly.  Instead what you would do is for sector x grab your list of symbols S.  Then you would get the last column for each of those symbols (which you do in different ways depending on the API), and then test if that date is within your threshold.  If not, it goes into your list of symbols to fetch.  

Alternatively, you could iterate over the symbols grabbing data where the date is between range A and B, if you get an empty set / no columns returned, then you need to re-pull for that symbol.  Does that make sense?

Either way you end up hitting on each of the individual symbols.  Maybe someone else has a better idea of how to structure the data for that particular use case.

On Wed, Nov 30, 2011 at 3:45 PM, Deno Vichas <deno@syncopated.net> wrote:
with the quote CF below how would one query for all keys that have a column name value that have a timeuuid of later than x minutes?  i need to be able to find all symbols that have not been fetch in x minutes by sector.  i know i get list of symbol by sector from my sector CF.

thanks,
deno


On 11/30/2011 1:07 PM, David McNelis wrote:

Then I would have a column family for quotes where I have the key as the symbol, the column name as the timestamp, the value as the quote:

quote : {
   key: symbol
   column names:  timeuuid
   column values:  quote at that time for that symbol
}






--
David McNelis
Lead Software Engineer
Agentis Energy
c: 219.384.5143

A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.



--------------020609020503050603000301--