incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naresh Yadav <nyadav....@gmail.com>
Subject Re: Blur shell : command to view all data present in a table
Date Fri, 20 Dec 2013 10:53:05 GMT
Hi Aaron,

I still have problem with visibility of data, when i do batch update of
17000 rows in batches of 1000 then that data is not visible
fully for search....Solutions i tried :

1.Setting waitForVisiblity=true on each List<RowMutation> BUT because of
this my time increased to 6 minutes from 1min,
   not touched Table level blur.shard.time.between.refreshs it is
default...this solution increased time by mins hence cannot adopt..

2.After all batches finished then made the thread to sleep..just after
client.mutateBatch(List<RowMutation>) i made thread to sleep by
Thread.currentThread();        Thread.sleep(5000);... after that it comes
out of write API.....this solution also not working...

Please suggest me some way after batch mutates i can check blur updates
over or not AND i make my thread to sleep and then again check that
flag......my api works in createupdate mode so i cannot build that in my
code.....

I want my write api to return after all batch updates are completed by
blur(waitForVisiblity drastically increasing time, so any alternative)

Thanks
Naresh

On Mon, Dec 16, 2013 at 6:24 PM, Aaron McCurry <amccurry@gmail.com> wrote:

> Sorry for taking so long to respond.
>
> On Fri, Dec 13, 2013 at 7:41 AM, Naresh Yadav <nyadav.ait@gmail.com>
> wrote:
>
> > Hi aaron,
> >
> > I am little confused on problem of immediate visibility of data. My case
> i
> > need guaranteed immediate visibility of index.
> >
>
> This is a normal behavior of Lucene based technologies (for the most part).
>  There is a certain amount of time after the data is posted to an index
> writer before the data can be searchable.  We are going to be trying to
> improve this behavior in 0.3, but more than likely there will always be
> some sort of delay.
>
>
> > I tried with flag on the RowMutation object called waitForVisiblity and
> set
> > it true then my same program for inserting
> > 17000 rows started taking more than 5 minutes and even not completing
> > fully, which was before taking 1minute. It starts throwing
> > exception of All connections bad after 5-6 minutes..........If i run with
> > waitForVisiblity=false it works fine in a minute.
> >
>
> With only 17,000 rows I would possibly try using the batch update version
> of the mutate.  Depending on the size of your rows potentially using batch
> sizes of a 1,000.  As far as the exception goes, if you could send the
> stack trace back to the list when can try to fix/debug what's going on.  It
> could have already been fixed in the unreleased 0.2.2.
>
> I think that something like transactions would likely help in this
> situation.  Meaning:
>
> Load all your data.
> Commit (or Rollback)
> After commit everything is visible.
>
> I have been thinking about adding something like this to Blur for awhile,
> but with trying to get 0.2.2 production ready I haven't had time to work on
> new features.
>
>
>
> >
> > Second question is regarding backups..i tried create snapshot and it was
> > success.. I was eager to know if this i can see in windows
> > filesystem and copy it to move to another machine and import(no command
> > found for this) there in hdfs.
> >
>
> Snapshots merely freeze the index to a particular point in time and prevent
> those files from being deleted.  In a future release there will be a way to
> perform MapReduce over these snapshots, also you will be able to control
> the index data through snapshots, and perform backups.  As for now, unless
> you write some code to use them they aren't useful.
>
>
> >
> > Thanks
> > Naresh
> >
> >
> >
> > On Fri, Dec 6, 2013 at 6:33 PM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> >
> > > On Fri, Dec 6, 2013 at 7:54 AM, Naresh Yadav <nyadav.ait@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have few doubts related to blur please help me on this :
> > > >
> > > > 1. Is there a way i can see all rows of data in a blur table ??? did
> > not
> > > > find any blur shell command..
> > > >
> > >
> > > This will give you all the rows.
> > >
> > > query <tablename> *
> > >
> > >
> > > >
> > > > 2. Is delete of data possible with where clause as query (similar to
> > > query
> > > > command)?? I want to delete all data by matching two columns values
> > > through
> > > > blur shell..
> > > >
> > >
> > > Not yet.  https://issues.apache.org/jira/browse/BLUR-130
> > >
> > > This shouldn't difficult to add.
> > >
> > >
> > > >
> > > > 3.After storing 17000 rows then i run queries to get each one then
> that
> > > > returned only 16900 rows...After 5 mins i again run queries to get
> each
> > > one
> > > > then returned all 17000 rows.........Is there solution for this ?? In
> > my
> > > > cased just after inserting data, i need to immediately run query over
> > it.
> > > >
> > >
> > > There is a delay on visibility of data within Blur.  I believe the
> > default
> > > for a given table is 3 seconds, this can be configured by changing this
> > > setting:
> > >
> > > blur.shard.time.between.refreshs=3000
> > >
> > > In the table properties, or in the blur-site.properties file.
> > >
> > > Be aware that decreasing this time will also decrease the speed in
> which
> > > mutates can occur.  Also there is a flag on the RowMutation object
> called
> > > waitForVisiblity if this is set to true the mutate command will not
> > return
> > > until the data is searchable.  NOTE: This will slow things down!  So
> only
> > > do this if you have to wait.
> > >
> > > Hope this helps.
> > >
> > > Aaron
> > >
> > >
> > > >
> > > > Thanks,
> > > > Naresh
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message