lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Buzz measurement - Aggregate functions
Date Fri, 10 Oct 2008 09:40:32 GMT
Assuming your date data is held as YYYYMMDD and you want daily totals....

        Term startTerm=new Term("date","20080101");
        TermEnum termEnum = indexReader.terms(startTerm);
        do
        {
            Term currentTerm = termEnum.term();
            if(currentTerm.field()!=startTerm.field())
            {
                break;
            }
            System.out.println(currentTerm+" "+termEnum.docFreq());
        }while(termEnum.next());

Should be plenty fast but if you need to avoid counting any deleted docs you'll need to look
at using "TermDocs" in this loop (or optimize your index in advance)

Cheers,
Mark



----- Original Message ----
From: Marcus Herou <marcus.herou@tailsweep.com>
To: java-user@lucene.apache.org
Sent: Friday, 10 October, 2008 10:12:35
Subject: Buzz measurement - Aggregate functions

Hi.

Anyone have an idea of how I would create a query which finds the data
backing a trend graph where date is X and num(docs) is on Y axis ?

This is quite a common use case in "buzz" analysis and currently I'm doing a
stupid query which iterates over the date range and queries lucene for every
date. Not very fast and not very flexible.

More specifically something like this but I need to add free text query as
well and then I cannot use MySQL for performance reasons. Any ideas ?

--clip--
mysql> select count(id) as Y,publishDate as X from FeedItem where
publishDate between "2008-08-01" and "2008-08-31" group by DAY(publishDate)
order by publishDate asc;
+-------+---------------------+
| Y     | X                   |
+-------+---------------------+
| 26663 | 2008-08-01 00:00:00 |
| 22478 | 2008-08-02 00:00:00 |
| 25745 | 2008-08-03 00:00:00 |
| 30576 | 2008-08-04 00:00:00 |
| 31351 | 2008-08-05 00:00:00 |
| 31084 | 2008-08-06 00:00:00 |
| 31245 | 2008-08-07 00:00:00 |
| 29518 | 2008-08-08 00:00:00 |
| 26001 | 2008-08-09 00:00:00 |
| 28687 | 2008-08-10 00:00:00 |
| 32957 | 2008-08-11 00:00:00 |
| 33251 | 2008-08-12 00:00:00 |
| 33062 | 2008-08-13 00:00:00 |
| 33960 | 2008-08-14 00:00:00 |
| 31034 | 2008-08-15 00:00:00 |
| 26726 | 2008-08-16 00:00:00 |
| 27543 | 2008-08-17 00:00:00 |
| 36887 | 2008-08-18 00:00:00 |
| 35376 | 2008-08-19 00:00:00 |
| 34573 | 2008-08-20 00:00:00 |
| 33889 | 2008-08-21 00:00:00 |
| 30604 | 2008-08-22 00:00:00 |
| 26875 | 2008-08-23 00:00:00 |
| 27356 | 2008-08-24 00:00:00 |
| 33438 | 2008-08-25 00:00:00 |
| 33102 | 2008-08-26 00:00:00 |
| 31720 | 2008-08-27 00:00:00 |
| 26133 | 2008-08-28 00:00:00 |
| 22781 | 2008-08-29 00:00:00 |
| 20198 | 2008-08-30 00:00:00 |
|    20 | 2008-08-31 00:00:00 |
+-------+---------------------+


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message