lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
Date Fri, 11 Dec 2009 20:54:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789520#action_12789520
] 

Grant Ingersoll commented on SOLR-1131:
---------------------------------------

bq. I'm still -1 on the way this patch deals with the "optimization" issue. I'd like to see
evidence that it makes sense to not use split and trim.

My tests show it to be at least 7 times faster.  But this should be obvious from static analysis,
too.  First of all, String.split() uses a regex which then makes a pass through the underlying
character array.  Then, trim has to go back through and analyze the char array too, not to
mention the extra String creations.  The optimized version here makes one pass and deals solely
at the char array level and only has to do the substring, which I think can be optimized by
the JVM to be a copy on write.

{code}

  public void testDistPerf() throws Exception {
    String [] input = new String[1000000];
    Random random = new Random();
    for (int i = 0; i < input.length; i++){
      input[i] = random.nextInt() + ", " + random.nextInt();
    }
    String [] out = new String[2];
    long time = 0;
    long start = System.currentTimeMillis();
    for (int j = 0; j < 50; j++) {
      for (int i = 0; i < input.length; i++){
        split(input[i], out, 2);
      }
    }
    time = (System.currentTimeMillis() - start);
    System.out.println("Time: " + time);
    time = 0;
    start = System.currentTimeMillis();
    for (int j = 0; j < 50; j++) {
      for (int i = 0; i < input.length; i++){
        DistanceUtils.parsePoint(out, input[i], 2);
      }
    }
    time = (System.currentTimeMillis() - start);
    System.out.println("Time: " + time);
  }

  private String[] split(String externalVal, String[] out, int dimension) {
    out = externalVal.split(",");
    if (out.length != dimension) {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "incompatible dimension
(" + dimension +
              ") and values (" + externalVal + ").  Only " + out.length + " values specified");
    }
    for (int j = 0; j < out.length; j++) {
      out[j] = out[j].trim();
    }
    return out;
  }
{code}

> Allow a single field type to index multiple fields
> --------------------------------------------------
>
>                 Key: SOLR-1131
>                 URL: https://issues.apache.org/jira/browse/SOLR-1131
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt,
SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch,
SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch
>
>
> In a few special cases, it makes sense for a single "field" (the concept) to be indexed
as a set of Fields (lucene Field).  Consider SOLR-773.  The concept "point" may be best indexed
in a variety of ways:
>  * geohash (sincle lucene field)
>  * lat field, lon field (two double fields)
>  * cartesian tiers (a series of fields with tokens to say if it exists within that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message