lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mead Lai <laiqi...@gmail.com>
Subject Re: About "join.search" in 3.4 version.
Date Fri, 21 Oct 2011 06:20:04 GMT
Now I have create a filter by override  "DocIdSet getDocIdSet (IndexReader
reader) throws IOException ".
It works nice, but I feel anxious about the efficiency.
The* limit[]* would contain one hundred thousand article_id inside(10,000),
and fetech one thousand articles by querying keywords on content.

       TopDocs topDocs = searcher.search(resultQuery, filter, *1000*, sort);

 -----Would it be slow and inefficiency?  the total articles is one million
amount documents in our system.
-----Thank you.

 @Override
 public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
  final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
  String[] *limit* = new String[]{"id_11645","id_11646"};
  int[] docs = new int[1];
  int[] freqs = new int[1];
  for (String id : limit) {
   if (id != null) {
    TermDocs termDocs = reader.termDocs(new Term("id", id));
    int count = termDocs.read(docs, freqs);
    if (count == 1) {
     bits.set(docs[0]);
    }
   }
  }
  return bits;
 }

Regards,
Mead


On Fri, Oct 21, 2011 at 9:06 AM, Mead Lai <laiqinyi@gmail.com> wrote:

> Thank you, Mike.
> Are you sure the 'Solr' has implemented 'Join' function.
> I just skims through some tour guids about Solr, and not sure about that.
> Appreciate you very much.
>
> I figure out another way to handler this problem.
> Our system also has duplication of these articles and the records(about who
> and when edit this article) in the database,
> so, I shall search the data with 'time range' condtion in the database
> first, then, use a Lucene Filter to get right results.
>
> SELECT DISTINCT article_ids FROM records r
> WHERE r.edit_date > '2011-09-23' and r.edit_date < '2011-10-19' and
> r.user_id='000000_editor_id'
> and, the article_ids will pass into Lucene to filter the search results.
>
> Althought it's a little clumsy and stupid, it can work for this case.
>
> Regards,
> Mead
>
>
>   On Thu, Oct 20, 2011 at 6:44 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I don't think the new join package in Lucene 3.4 will work for this
>> case; you need more general join implementation, which eg Solr and
>> ElasticSearch have implemented.
>>
>> Generic join hasn't yet been factored out into Lucene (but I think it
>> really needs to be... any volunteers!?).
>>
>> Lucene's join package can handle use cases like nested documents or
>> parent/child, because it requires that you index a single primary row
>> AND all joined documents together as a single block of documents.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Oct 20, 2011 at 3:26 AM, Mead Lai <laiqinyi@gmail.com> wrote:
>> > Hello all,
>> >
>> > Now, I find there is a "org.apache.lucene.search.join" function in
>> Lucene
>> > 3.4 version.
>> > But I found no demo for "join" function in the source code package:
>> > "lucene-3.4.0-src.tar".
>> >
>> > Now I have some articles, which could be modified by editors, like this
>> > relationship:
>> >  an article : modify records = 1:n.
>> >
>> > Document of article: contain the text of this article.
>> > Document of records: article_id, name of editor, date_time(when modify
>> it).
>> > Search condition would be: keywords(search article),name of editor,
>> range of
>> > time(start_time, end_time),
>> > that will find the articles in some particular time which had been
>> modified
>> > by someone.
>> >
>> > E.g: condition = during '2011-09-23' to '2011-10-19', editor: 'Alan',
>> > keyword: 'duck'.
>> > The results will found all articles contain 'duck', and edited by 'Alan'
>> > between '2011-09-23' and '2011-10-19'.
>> > My question is, could "org.apache.lucene.search.join" solve this case?
>> > If possible, thanks for providing some example or clue.
>> >
>> > Thanks for your time.
>> >
>> > Regards,
>> > Mead Lai
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message