lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Matheis <matheis.ste...@googlemail.com>
Subject Re: SOLR for Log analysis feasibility
Date Tue, 30 Nov 2010 12:23:25 GMT
i know, it's not solr .. but perhaps you should have a look at it:
http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/

On Tue, Nov 30, 2010 at 12:58 PM, Peter Karich <peathal@yahoo.de> wrote:

>  take a look into this:
> http://vimeo.com/16102543
>
> for that amount of data it isn't that easy :-)
>
>
>  We are looking into building a reporting feature and investigating
>> solutions
>> which will allow us to search though our logs for downloads, searches and
>> view history.
>>
>> Each log item is relatively small
>>
>> download history
>>
>> <add>
>>        <doc>
>>                <field name="uuid">item123-v1</field>
>>                <field name="market">photography</field>
>>                <field name="name">item 1</field>
>>                <field name="userid">1</field>
>>                <field name="version">1</field>
>>                <field name="downloadType">hires</field>
>>                <field name="itemId">123</field>
>>                <field name="timestamp">2009-11-07T14:50:54Z</field>
>>        </doc>
>> </add>
>>
>> search history
>>
>> <add>
>>        <doc>
>>                <field name="uuid">1</field>
>>                <field name="query">brand assets</field>
>>                <field name="userid">1</field>
>>                <field name="timestamp">2009-11-07T14:50:54Z</field>
>>        </doc>
>> </add>
>>
>> view history
>>
>> <add>
>>        <doc>
>>                <field name="uuid">1</field>
>>                <field name="itemId">123</field>
>>                <field name="userid">1</field>
>>                <field name="timestamp">2009-11-07T14:50:54Z</field>
>>        </doc>
>> </add>
>>
>>
>> and we reckon that we could have around 10 - 30 million log records for
>> each
>> type (downloads, searches, views) so 70 million records in total but
>> obviously must scale higher.
>>
>> concurrent users will be around 10 - 20 (relatively low)
>>
>> new logs will be imported as a batch overnight.
>>
>> Because we have some previous experience with SOLR and because the
>> interface
>> needs to have full-text searching and filtering we built a prototype using
>> SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to
>> collapse on groups of data. For example view History needs to collapse on
>> itemId. Each row will then show the frequency on how many views the item
>> has
>> had. This is achieved by the number of items which have been grouped.
>>
>> The requirements for the solution is to be schemaless to allow adding new
>> fields to new documents easier, and have a powerful search interface, both
>> which SOLR can do.
>>
>> QUESTIONS
>>
>> Our prototype is working as expected but im unsure if
>>
>> 1. has anyone got experience with using SOLR for log analysis.
>> 2. SOLR can scale but when is the limit when i should start considering
>> about sharding the index. It should be fine with 100+ million records.
>> 3. We are using a nightly build of SOLR for the "field collapsing"
>> feature.
>> Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has
>> anyone
>> used this in production?
>>
>> thanks
>>
>
>
> --
> http://jetwick.com twitter search prototype
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message