uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Barborak (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-3017) Getting feature value from feature structure longer than expected
Date Wed, 03 Jul 2013 13:31:20 GMT

    [ https://issues.apache.org/jira/browse/UIMA-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698955#comment-13698955
] 

Mike Barborak commented on UIMA-3017:
-------------------------------------

By the way, I made that change and re-ran my tests and here is what I got. (The extra numbers
in test 4 are related to a comment on the mailing list by Eddie Epstein and show that the
test case is valid.)

Test 1. Get a string from an FS using non-JCas API
round 0 total time 1: 6.915995788s
round 1 total time 1: 6.75501513s
round 2 total time 1: 6.828053193s
round 3 total time 1: 7.109957222s
round 4 total time 1: 6.801362261s

Test 2. Get a string from a POJO via an internal HashMap<String, String>
round 0 total time 2: 0.971410607s
round 1 total time 2: 0.89698105s
round 2 total time 2: 0.923776195s
round 3 total time 2: 0.902767362s
round 4 total time 2: 0.942287453s

Test 3. Get a string from an FS using JCAS API
round 0 total time 3: 1.00207594s
round 1 total time 3: 0.984999156s
round 2 total time 3: 0.983165054s
round 3 total time 3: 0.984362084s
round 4 total time 3: 0.983855836s

Test 4. Get a member string from a POJO
100000000
round 0 total time 4: 0.110064841s
200000000
round 1 total time 4: 0.103566286s
300000000
round 2 total time 4: 0.103578861s
400000000
round 3 total time 4: 0.103654741s
500000000
round 4 total time 4: 0.103731586s

Test 5. Get a FS reference using the JCas API
round 0 total time 5: 11.147611949s
round 1 total time 5: 9.997106886s
round 2 total time 5: 10.313926093s
round 3 total time 5: 10.855920825s
round 4 total time 5: 10.088752304s

Test 6. Get a FS reference using the JCas API and caching jcasType.ll_cas.ll_getFSForRef in
a HashMap
round 0 total time 6: 2.263428484s
round 1 total time 6: 2.261072311s
round 2 total time 6: 2.228876562s
round 3 total time 6: 2.229976547s
round 4 total time 6: 2.24389304s

It looks like test 3 got 15% faster but I don't see much of an effect anywhere else.

Best,
Mike
                
> Getting feature value from feature structure longer than expected
> -----------------------------------------------------------------
>
>                 Key: UIMA-3017
>                 URL: https://issues.apache.org/jira/browse/UIMA-3017
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.3
>         Environment: Linux x86_64
>            Reporter: Mike Barborak
>            Priority: Minor
>
> Should getting a value of a feature in a feature structure be fast? Intuitively, I would
expect performance to be about the same as getting an entry from a Java HashMap or faster
but in my experiments it seems to be 8 times slower. To solve my problem, I wrap my feature
structures with caching Java code but it seems that there might be an opportunity to speed
up UIMA generally.
> My test creates a CAS with a single feature structure in it. It sets a string feature
in that feature structure and then simply gets the value of that feature in a tight loop.
I compare that to an instance of a Java class that has an internal HashMap of strings to strings.
In that case, a method is called on that instance to get an entry from the map in a very tight
loop. 
> I do 5 rounds of each of the loops. The total times for the rounds involving the CAS
were:
> round 0 total time 1: 7.520104509s
> round 1 total time 1: 6.812214938s
> round 2 total time 1: 6.882752307s
> round 3 total time 1: 6.728515004s
> round 4 total time 1: 6.813674956s
> The total times for the rounds just using the Java class were:
> round 0 total time 2: 0.847296054s
> round 1 total time 2: 0.814570347s
> round 2 total time 2: 0.814399859s
> round 3 total time 2: 0.814189383s
> round 4 total time 2: 0.814979357s
> Here is my Java code:
> {code:title=MyTest.java}
> package test;
> import java.io.InputStream;
> import java.util.HashMap;
> import java.util.Map;
> import org.apache.uima.UIMAFramework;
> import org.apache.uima.cas.CAS;
> import org.apache.uima.cas.Feature;
> import org.apache.uima.cas.FeatureStructure;
> import org.apache.uima.cas.Type;
> import org.apache.uima.resource.metadata.TypeSystemDescription;
> import org.apache.uima.util.CasCreationUtils;
> import org.apache.uima.util.XMLInputSource;
> public class MyTest {
>   
>   static class MyClass {
>     Map<String, String> myFeatures = new HashMap<String, String>();
>     
>     void setStringValue(String feature, String value) {
>       myFeatures.put(feature, value);
>     }
>     
>     String getStringValue(String feature) {
>       return myFeatures.get(feature);
>     }
>   }
>   
>   static public void main(String[] argv) throws Exception {
>     InputStream stream = TestSupport.class.getClassLoader().getResourceAsStream("MyTypes.xml");
>     TypeSystemDescription typeSystemDescription = UIMAFramework.getXMLParser().parseTypeSystemDescription(new
XMLInputSource(stream, null));
>     CAS cas = CasCreationUtils.createCas(typeSystemDescription, null, null);
>     Type myType = cas.getTypeSystem().getType("MyType");
>     FeatureStructure fs = cas.createFS(myType);
>     Feature myFeature = myType.getFeatureByBaseName("myFeature");
>     fs.setStringValue(myFeature, "myString");
>     cas.addFsToIndexes(fs);
>     
>     MyClass myInstance = new MyClass();
>     myInstance.setStringValue("myFeature2", "myString2");
>     
>     long iterations = 100000000;
>     double nanoSecsPerSec = 1000000000.0d;
>     
>     for (int round = 0; round < 5; round++) {
>       long start = System.nanoTime();
>       for (long i = 0; i < iterations; i++) {
>         fs.getStringValue(myFeature);
>       }
>       long end = System.nanoTime();
>       System.out.println("round " + round + " total time 1: " + ((end - start) / nanoSecsPerSec)
+ "s");
>     }
>       
>     for (int round = 0; round < 5; round++) {
>       long start = System.nanoTime();
>       for (long i = 0; i < iterations; i++) {
>         myInstance.getStringValue("myFeature2");
>       }
>       long end = System.nanoTime();
>       System.out.println("round " + round + " total time 2: " + ((end - start) / nanoSecsPerSec)
+ "s");
>     }
>   }
> }
> {code}
> Here is my type descriptor:
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
>   <name>MyTypes</name>
>   <description/>
>   <version>1.0</version>
>   <vendor/>
>   <types>
>     <typeDescription>
>       <name>MyType</name>
>       <description/>
>       <supertypeName>uima.cas.TOP</supertypeName>
>       <features>
>         <featureDescription>
>           <name>myFeature</name>
>           <description></description>
>           <rangeTypeName>uima.cas.String</rangeTypeName>
>         </featureDescription>
>       </features>
>     </typeDescription>
>   </types>
> </typeSystemDescription>
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message