lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JT <handyrems...@gmail.com>
Subject Re: Questions developing custom functionquery
Date Mon, 21 Oct 2013 18:49:06 GMT
I would agree the "right" way to do this is probably just add the
information I wish to sort on directly, as a date field or something like
that.

The issue is we currently have ~300m documents that are already indexed.
Not all of the fields have stored=true (for good reason, we maintain the
documents externally, about 7TB worth. I didn't want to replicate 7TB of
data twice.) so we cannot update these indexed values.


I was hoping to spend 2-3 days writing a custom query to avoid 2+ months of
indexing everything all over again.



So let me just ask this question, given my current situation, lets say you
had the following field

<str name="resourcename">/path/to/file/month/day/year/file.txt</str>


I simply want to extract the month/day/year and sort based on that.

My current plan was to convert the month, day, year into seconds from right
now, and return that number. Thus sorting ascending, it should return
newest documents first.



-JT


On Fri, Oct 18, 2013 at 3:14 PM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> : Field-Type: org.apache.solr.schema.TextField
>         ...
> : DocTermsIndexDocValues<
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-queries/4.3.0/org/apache/lucene/queries/function/docvalues/DocTermsIndexDocValues.java#DocTermsIndexDocValues
> >.
> : Calling "getVal()" on a DocTermsIndexDocValues does some really weird
> stuff
> : that I really don't understand.
>
> Your TextField is being analyzed in some way you haven't clarified, and
> the DocTermsIndexDocValues you get contains the details of each term in
> that TextField
>
> : Its possible I'm going about this wrong and need to re-do my approach.
> I'm
> : just currently at a loss for what that approach is.
>
> Based on your initial goal, you are most certainly going about this in a
> much more complicated way then you need to...
>
> : > > > My goal is to be able to implement a custom sorting technique.
>
> : > > > Example: <str name="resname">/some
> : > > > example/data/here/2013/09/12/testing.text</str>
> : > > >
> : > > > I would like to do a custom sort based on this resname field.
> : > > > Basically, I would like to parse out that date there (2013/09/12)
> and
> : > > sort
> : > > > on that date.
>
> You are going to be *MUCH* happier (both in terms of effort, and in terms
> of performance) if instead of writing a custom function to parse strings
> at query time when sorting, you implement the parsing logic when indexing
> the doc and index it up front as a date field that you can sort on.
>
> I would suggest something like CloneFieldUpdateProcessorFactory +
> RegexReplaceProcessorFactory could save you the work of needing to
> implement any custom logic -- but as Jack pointed out in SOLR-4864 it
> doesn't currently allow you to do capture group replacements (but maybe
> you could contribute a patch to fix that instead of needing to write
> completely custom code for yourself)
>
> Of maybe, as is, you could use RegexReplaceProcessorFactory to throw away
> non digits - and then use ParseDateFieldUpdateProcessorFactory to get what
> you want?  (I'm not certain - i haven't played with
> ParseDateFieldUpdateProcessorFactory much)
>
> https://issues.apache.org/jira/browse/SOLR-4864
>
> https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html
>
> https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
>
> https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
>
>
>
> -Hoss
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message