lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Sturge <peter.stu...@gmail.com>
Subject Re: Solr Reporting
Date Thu, 23 Sep 2010 19:53:56 GMT
Yes, that makes sense. So, more of a bulk data export requirement.
If the excel data doesn't have to go out on the web, you could export
to a local file (using a local solj streamer), then publish it,
which might save some external http bandwidth if that's a concern.
We do this all the time using a local solrj client, so if you've got a
big data stream (e.g. an entire core), you don't
have to send it through your outward-facing web servers. Using a
replica to retrieve/export the data might be worth considering as
well.


On Thu, Sep 23, 2010 at 7:21 PM, Adeel Qureshi <adeelmahmood@gmail.com> wrote:
> Hi Peter
>
> I understand what you are saying but I think you are thinking more of report
> as graph and analysis and summary kind of data .. for my reports I do need
> to include all records that qualify certain criteria .. e.g. a listing of
> all orders placed in last 6 months .. now that could be 10000 orders and yes
> I will need probably a report that summarizes all that data but at the same
> time .. I need all those 10000 records to be exported in an excel file ..
> those are the reports that I am talking about ..
>
> and 30000 probably is a stretch .. it might be 10-15000 at the most but I
> guess its still the same idea .. and yes I realize that its alot of data to
> be transferred over http .. but thats exactly why i am asking for suggestion
> on how to do .. I find it hard to believe that this is an unusual
> requirement .. I think most companies do reports that dump all records from
> databases in excel files ..
>
> so again to clarify I definitely need reports that present statistics and
> averages and yes I will be using facets and all kind of stuff there and I am
> not so concerned about those reports because like you pointed out, for those
> reports there will be very little data transfer but its the full data dump
> reports that I am trying to figure out the best way to handle.
>
> Thanks for your help
> Adeel
>
>
>
> On Thu, Sep 23, 2010 at 11:43 AM, Peter Sturge <peter.sturge@gmail.com>wrote:
>
>> Hi,
>>
>> Are you going to generate a report with 30000 records in it? That will
>> be a very large report - will anyone really want to read through that?
>> If you want/need 'summary' reports - i.e. stats on on the 30k records,
>> it is much more efficient to setup faceting and/or server-side
>> analysis to do this, rather than download
>> 30000 records to a client, then do statistical analysis on the result.
>> It will take a while to stream 30000 records over an http connection,
>> and, if you're building, say, a PDF table for 30k records, that will
>> take some time as well.
>> Server-side analysis then just send the results will work better, if
>> that fits your remit for reporting.
>>
>> Peter
>>
>>
>>
>> On Thu, Sep 23, 2010 at 4:14 PM, Adeel Qureshi <adeelmahmood@gmail.com>
>> wrote:
>> > Thank you for your suggestions .. makes sense and I didnt knew about the
>> > XsltResponseWriter .. that opens up door to all kind of possibilities
>> ..so
>> > its great to know about that
>> >
>> > but before I go that route .. what about performance .. In Solr Wiki it
>> > mentions that XSLT transformation isnt so bad in terms of memory usage
>> but I
>> > guess its all relative to the amount of data and obviously system
>> resources
>> > ..
>> >
>> > my data set will be around 15000 - 30'000 records at the most ..I do have
>> > about 30 some fields but all fields are either small strings (less than
>> 500
>> > chars) or dates, int, booleans etc .. so should I be worried about
>> > performances problems while doing the XSLT translations .. secondly for
>> > reports Ill have to request solr to send all 15000 some records at the
>> same
>> > time to be entered in report output files .. is there a way to kind of
>> > stream that process .. well I think Solr native xml is already streamed
>> to
>> > you but sounds like for the translation it will have to load the whole
>> thing
>> > in RAM ..
>> >
>> > and again what about SolrJ .. isnt that supposed to provide better
>> > performance since its in java .. well I guess it shouldnt be much
>> different
>> > since it also uses the HTTP calls to communicate to Solr ..
>> >
>> > Thanks for your help
>> > Adeel
>> >
>> > On Thu, Sep 23, 2010 at 7:16 AM, kenf_nc <ken.foster@realestate.com>
>> wrote:
>> >
>> >>
>> >> keep in mind that the <str name="id"> paradigm isn't completely useless,
>> >> the
>> >> str is a data type (string), it can be int, float, double, date, and
>> >> others.
>> >> So to not lose any information you may want to do something like:
>> >>
>> >> <id type="int">123</id>
>> >> <title type="str">xyz</title>
>> >>
>> >> Which I agree makes more sense to me. The name of the field is more
>> >> important than it's datatype, but I don't want to lose track of the data
>> >> type.
>> >>
>> >> Ken
>> >> --
>> >> View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/Solr-Reporting-tp1565271p1567604.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >
>>
>

Mime
View raw message