lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Tripathi <aditya.tripa...@gmail.com>
Subject FST<CharRef> Util's - Will TopNSearcher work for non-weighted FST with CharSequence Ouputs as weight.
Date Sat, 26 Sep 2015 04:51:54 GMT
Hi,

Question:
Looking at the code I got slightly confused if TopNSearcher would work for
non weighted CharSequence Output FST. I am trying to use a scale up model
and accommodate many tenants on one machine and hence was not planning to
use Pair<Long,CharRef> output. It would have been great if the path output
could be considered as cost and TopNSearcher could use cost of the whole
Path while decding NO_OUTPUT arc. However it goes in infinte loop for the
code snippet given below.

Background: (X of XY problem)
I build an FST<CharRef> with input from one index field and output as
concatenation of two other stored fields.

I wanted to get suggestions from this FST using TopNSearcher search method.
As an experimental code I just wanted to see if shortestPaths would work on
this FST with CharSequence outputs as the cost.

It does not work. Goes in infinte loop.

I think (Haven't digged much though and not at all sure) the problem lies
in the fact that while finding minimum weight arc (no weight here - weight
is output) - the old path is not considered and only the current arc is
compared for NO_OUTPUT. And then it keeps copying this NO_OUTPUT arc back
into the current arc later. This spins it into an infinte loop.

In TopNSearcher search() method, the following line. Line 464 in
org/apache/lucene/util/fst/Util.java
if (comparator.compare(NO_OUTPUT, path.arc.output) == 0)

And then copying the NO_OUTPUT arc back to the current arc spoils the fun
here in line:490
path.arc.copyFrom
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/5.2.0/org/apache/lucene/util/fst/FST.java#FST.Arc.copyFrom%28org.apache.lucene.util.fst.FST.Arc%29>
(scratchArc
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/5.2.0/org/apache/lucene/util/fst/Util.java#Util.TopNSearcher.0scratchArc>
);

The sample code to reproduce this along with Sysout's for seeing how the
FST is formed is given below.

 (If I use PositiveIntOutputs it works fine. Commented lines.)



public static void main(String[] args) throws IOException {

        String inputValues[] = {"aafish4","abcat","abcmonkey6" , "abcdog",
"abcdogs"};
        long outputValues[] = {14,5, 16, 7, 12};
        CharsRef[] outputValuesString = {new CharsRef("pqrfish4"),new
CharsRef("pqcat"),new CharsRef("pqsmonkey6") ,new CharsRef( "pqrsdog"), new
CharsRef("pqrdogs")};

        PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
        CharSequenceOutputs outputsO = CharSequenceOutputs.getSingleton();
        //Builder<Long> builder = new Builder<Long>(INPUT_TYPE.BYTE1,
outputs);
        Builder<CharsRef> builder = new Builder<CharsRef>(INPUT_TYPE.BYTE4,
outputsO);
        BytesRef scratchBytes = new BytesRef();

        IntsRef scratchInts = new IntsRef();

        for (int i = 0; i < inputValues.length; i++) {
          scratchBytes.copyChars(inputValues[i]);
          //builder.add(Util.toIntsRef(scratchBytes, scratchInts),
outputValues[i]);
          builder.add(Util.toIntsRef(scratchBytes, scratchInts),
outputValuesString[i]);

        }
        //FST<Long> fst = builder.finish();
        FST<CharsRef> fst = builder.finish();
        Arc<CharsRef> arc;
        //Arc<Long> arc;

        //Arc<Long> firstArc = fst.getFirstArc(new Arc<Long>());
        Arc<CharsRef> firstArc = fst.getFirstArc(new Arc<CharsRef>());
        arc = firstArc;
        System.out.println("firstArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);

        BytesReader reader = fst.getBytesReader();

        Arc<CharsRef> firstTargetArc = fst.readFirstTargetArc(firstArc, new
Arc<CharsRef>(), reader );
        //Arc<Long> firstTargetArc = fst.readFirstTargetArc(firstArc, new
Arc<Long>(), reader );
        arc = firstTargetArc;
        System.out.println("firstTargetArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);

        //Arc<Long> lastTargetArc = fst.readLastTargetArc(firstArc, new
Arc<Long>(), reader );
        Arc<CharsRef> lastTargetArc = fst.readLastTargetArc(firstArc, new
Arc<CharsRef>(), reader );
        arc = lastTargetArc;
        System.out.println("lastTargetArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);

        //Arc<Long> nextArc = fst.readNextArc(firstTargetArc, reader);
        Arc<CharsRef> nextArc = fst.readNextArc(firstTargetArc, reader);
        arc = nextArc;
        System.out.println("nextArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);


System.out.println("****************************************************************************************************");
        int a=0;
        arc = firstTargetArc;
        while(true) {
            try {

            System.out.println("nextArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);
            arc = fst.readNextArc(arc, reader);
            }catch (Exception e) {
                break;
            }

        }

System.out.println("****************************************************************************************************");

        Comparator<CharsRef> comparator = new Comparator<CharsRef>() {
            public int compare(CharsRef left, CharsRef right) {
              return left.compareTo(right);
            }
          };

          /*Comparator<Long> comparator = new Comparator<Long>() {
              public int compare(Long left, Long right) {
                return left.compareTo(right);
              }
            }; */

          //MinResult<Long> paths[] = Util.shortestPaths(fst, firstArc, 0L,
comparator, 4, false);
         MinResult<CharsRef> paths[] = Util.shortestPaths(fst, firstArc,
new CharsRef(), comparator, 106, true);

      }


}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message