Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4A9C72A3.4090904@hydratech.com.au>
Date: Tue, 01 Sep 2009 11:02:27 +1000
From: Andrew Mee <andrew@hydratech.com.au>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.8.1.22) Gecko/20090608 Thunderbird/2.0.0.22 ThunderBrowse/3.2.6.1
 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: user@couchdb.apache.org
Subject: Re: Best way to handle data? Advice wanted
References: <4A9B198D.9010904@hydratech.com.au>
 <F163206D-12B3-4133-A903-99487C99F46B@googlemail.com>
In-Reply-To: <F163206D-12B3-4133-A903-99487C99F46B@googlemail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Simon,
     yeah I thought of that - it will return all the documents and then 
I can't use reduce within couchDB - I would have to filter through all 
these document on the programming end - not bad when you have 10 
documents, but If I had 5000 documents it would cause too much of a slow 
down.

If I could do this reducing separately on the couchDB as an external 
process maybe - but even then it's not really an optimal way of finding 
the data I need.

Regards
Andrew


On 31/08/09 22:28, Simon Metson wrote:
> Hi,
>     As your timestamps are just int's they're nicely sortable. Make a 
> view like:
>
> function(doc) {
>   emit(doc.timestamp, doc);
> }
>
> and query it like:
>
> http://localhost:5984/test/_design/timestamp/_view/sort?startkey=12344&endkey=12347 
>
>
> which will give you documents with a timestamp between 12344 and 12347.
> Cheers
> Simon
>
> On 31 Aug 2009, at 01:30, Andrew Mee wrote:
>
>> I have been using CouchDB (trunk) for a couple of weeks now and while 
>> I have a good grasp on the way it handles data I am unsure the best 
>> way to store some of this data for retrieval purposes. I'll be honest 
>> upfront I have some from an SQL background and so I'm still getting 
>> my head around some of the concepts and while I could have done this 
>> in SQL without a worry, I like the idea of the object and schema-less 
>> storage and the replication option that CouchDB has.
>>
>> Currently one of the document types I am storing is some time 
>> tracking data it look a little like this:
>>
>> {
>>   "|_id|":|"t125142312603660"|,
>>   "|_rev|":|"1-14095bf3c015575a4dc5ec3c7aea1234"|,
>>   "|task|":|"blah"|,
>>   "|cc|":|"HYD"|,
>>   "|timestamp|":|1251423124|,
>>   "|pc|":|"SPR"|,
>>   "|duration|":|5|,
>>   "|username|":|"andrew"|
>> }
>>
>> The timestamp field is a utc unix timestamp.
>>
>> I have been using views with map and reduce functionality to collate 
>> this data into the data I want.
>>
>> However my issue comes when I want to only look at data between 
>> particular time stamps.
>> I thought about collating it for a given day do it looked like 
>> (format is bad I know!):
>> key:["2009-08-26","HYD","SPR","blah"], value: 100 ;
>> where value is summed value of durations. But this doesn't work for 
>> situations of different timezones.
>>
>> I thought that I may be able to use startkey/endkey docid and use the 
>> time stamp as the docid to filter the documents used for the reduce - 
>> but this doesn't seem to be the way it works. (Am I wrong??)
>>
>> I did think about using an external process, but this only works as 
>> both a map and reduce not just a reduce option.
>>
>> I am interested to know your thought on the best way to handle this 
>> data and retrieve it?
>>
>> Regards
>> Andrew M
>>
>