lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: TermEnum - previous() method ?
Date Fri, 20 Jul 2007 14:21:24 GMT
You know, every time I see a solution like this, I have to smack
myself upside the head. As an old C programmer, I have a real
problem "wasting", say,  a few tens of megabytes just to make
my life easier. Of disk space. That costs virtually nothing. And
(as you say, assuming some constraints) doesn't change my
search response time. And won't cost the company time tracing
down bugs. Which will let me work on something *else* that's
might generate revenue.

Siiigggghhhh. Sometime I'll get into the modern world and actually
think of this kind of solution myself.

Erick

On 7/20/07, Will Johnson <wjohnson@getconnected.com> wrote:
>
> One other possibility I've used in the past:
>
> Store 2 fields, one with the normal characters, one with each character
> inversed, A->Z, Y->B, and so on.  As long as you have the function to go
> from the normal->reversed and reversed->normal you can iterate over both
> fields in a forward only mode and just reverse out the strings on the
> display side.
>
> This method makes a number of assumptions about index size constraints,
> character sets; ie ymmv.
>
> - will
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, July 20, 2007 10:05 AM
> To: java-user@lucene.apache.org
> Subject: Re: TermEnum - previous() method ?
>
> There's nothing that I know of built in to Lucene that allows you
> to do anything like termenum.previous(), I'm pretty sure you
> have to roll your own.
>
> Some possibilities:
>
> 1> Depending upon how many terms in your index, you could
> just read all the authors into a Java collection of your choice
> at startup time and reference that list.
>
> 2> You could just iterate from the beginning of the list every
> time, storing the N elements somewhere and stop when
> you get to the ones you need. Do you have any evidence
> that this wouldn't work for you? I was surprised how quickly this
> worked.
>
> 3> A variant of <2> would be to choose some arbitrary term that's
> less than your target, skip to that one, and iterate forward. For
> instance,
> if you were looking for the terms previous to "Erickson", you could
> skip to "Eg", and iterate forward. You'd have to do some heuristics
> in case there was, say, only one author between "Eg" and "Erickson".
>
> 4> You could assemble a list of, say, every 100th term, either at
> startup time or as a permanent part of your index. When you had
> to do a previous, skip to the appropriate term and iterate forward.
> Remember that not all documents need to have the same fields,
> so you can have a "super special document" that contains some
> list like this.
>
> I'd recommend that you take some simple timings of iterating
> through the entire list to see if you need to get fancy or not. The
> critical piece of information you've omitted is any indication of
> how many terms you expect to have to iterate over. If it's
> 10,000 terms, the simple solution will almost certainly work for
> you. If it's 10,000,000,000 then you have to do some fancier
> dancing.
>
> Best
> Erick
>
> On 7/20/07, muraalee <muraalee@gmail.com> wrote:
> >
> >
> > Hi Mark,
> > Thanks for your inputs.
> >
> > >> I do wonder why you want a previous though? It sounds like you
> might be
> > >> better off heading down a different path...
> >
> > In our content, we have indexed Author as separate field. We want to
> > expose
> > a feature to  'browse' the Author list. They can type any author name
> and
> > do
> > a locate from there onwards.. While doing that they need to navigate
> > 'previous' & 'next' author list.
> >
> > e.g AU:Rowling
> >
> > sample code snippet:
> > TermEnum browseTermEnum = indexReader.terms(new Term("AU",
> "rowling"));
> > while(browseTermEnum.next()){
> >   System.out.println( browseTermEnum.term().text());
> > }
> >
> > Navigating the next 10 authors is straight foward becuase we do that
> by
> > calling browseTermEnum.next(), but we couldn't do previous. I did
> consider
> > the approach of all terms while navigation. But the problem is, when
> we do
> > a
> > lookup of term like 'rowling', we didn't iterate from the start of the
> > list,
> > hence we may not have the previous terms..
> >
> > Any thoughts ?
> >
> >
> > markrmiller wrote:
> > >
> > > I am not very familiar with the Lucene file formats, but I think
> that
> > > there is a lot of "this number tells you how far ahead to read" when
> > > enumerating terms. As you might guess, I think this lends toward
> reading
> > > the terms file forward. Not that an index couldn't point you into
> the
> > > terms index somehow (a meta index?). It would make just as much
> sense
> > (or
> > > more) to buffer all of the terms to allow a previous though.
> Depending
> > on
> > > your RAM and index size, this could be an option.
> > >
> > > I do wonder why you want a previous though? It sounds like you might
> be
> > > better off heading down a different path...
> > >
> > > - Mark
> > >
> > >
> > >
> > > muraalee wrote:
> > >>
> > >> Hi All,
> > >> I searched in this forum for anybody looking for need for
> previous()
> > >> method in TermEnum. I found only this link
> > >>
> >
> http://www.nabble.com/How-to-navigate-through-indexed-terms-tf28148.html
> #a189225
> > >>
> > >> Would it be possible to implement previous() method ? I know i am
> > asking
> > >> for quick solution here ;) Just i want to ensure if it not
> implemented,
> > >> there might be a reason. So i can consider alternates approaches to
> > >> implement similar feature..
> > >>
> > >> appreciate your thoughts...
> > >>
> > >> Thanks
> > >> Murali V
> > >>
> > >
> > >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/TermEnum----previous%28%29-method---tf4107296.html
> #a11707160
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message