lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul നോബിള്‍ नोब्ळ्" <noble.p...@gmail.com>
Subject Re: Large Data Set Suggestions
Date Sat, 08 Nov 2008 12:39:45 GMT
raised an issue
https://issues.apache.org/jira/browse/SOLR-842

On Sat, Nov 8, 2008 at 1:22 PM, Lance Norskog <goksron@gmail.com> wrote:
> In my DIH tests I ran a nested loop where the outer RSS feed gave a list of feeds, and
the inner loop walked each feed. Some of the feeds were bogus, and the DIH loop immediately
failed.
>
> It would be good to have at least "ignoreerrors=true" the way 'ant' does. This would
be set inside each loop. Even better is standard programming language continue/break semantics.
Example:
>
> Outer ignoreerrors=continue
>    Inner #1 ignoreerrors=ignore
>        processing loop
>    Inner #2 ignoreerrors=continue
>        processing loop
>    Inner #3 ignoreerrors=break
>        processing loop
>    Inner #4 ignoreerrors=break
>        processing loop
>
> After an error in an inner loop:
> Inner #1 continues to its next item.
> Inner #2 stops its loop but continues on to Inner #3
> Inner #3 stops it loop AND Inner #4 does not run.
>
>
> Inner #3 has to succeed. If it fails, this loop of Outer fails but Outer continues to
its next item.  Other cases: if Inner #2 is false, does Inner #3 get run? Perhaps instead
of true and false they could be ignore/break/continue where "ignoreerrors=break" means that
an error in Inner #2 would prevent Inner #3.
>
> Lance
>
> -----Original Message-----
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> Sent: Thursday, November 06, 2008 8:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Large Data Set Suggestions
>
> Hi Lance,
> This is one area we left open in DIH. What is the best way to handle this. On error it
should give up or continue with the next?
>
>
>
> On Fri, Nov 7, 2008 at 12:44 AM, Lance Norskog <goksron@gmail.com> wrote:
>> You can also do streaming XML upload for the XML-based indexing. This
>> can feed, say, 100k records in one XML file from a separate machine.
>>
>> All of these options ignore the case where there is an error in your
>> input records v.s. the schema.  DIH gives up on an error. Streaming
>> XML gives up on an error.
>>
>> Lance
>>
>> -----Original Message-----
>> From: Steven Anderson [mailto:sanderson@vsticorp.com]
>> Sent: Thursday, November 06, 2008 5:57 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Large Data Set Suggestions
>>
>>> In that case you may put the file in a mounted NFS directory or you
>>> can serve it out with an apache server.
>>
>> That's one option although someone else on the list mentioned that
>> performance was 10x slower in their NFS experience.
>>
>> Another option is to serve up the files via Apache and pull them via
>> DIH HTTP.
>>
>> Thankfully, there are lots of options, but we need to determine which
>> one will perform best.
>>
>> Thanks,
>>
>> A. Steven Anderson
>> 410-418-9908 VSTI
>> 443-790-4269 cell
>>
>>
>>
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul
Mime
View raw message