Sunday, 15 February 2015

SQL Azure Premium tier is unavailable for more than a minute at a time and we're around 10-20% utilization, if that -


We run a web service that receives 6k + requests per minute during every hour and about per minute during close hours 3k requests about 3 party web services and many data feeds compiled from custom generated images Our service and code are mature, we have been running it for years. A lot of work has gone by the good developers in our service code base.

We are relocating to Azer, and we are seeing some serious problems. For one, we are seeing that our premium P1 SQL Ezur databases are unavailable for 1-2 full-time full minutes. I'm sorry, but it seems absurd. How do we want to run a web service with requests waiting for 2 minutes to access our database? It's going to happen many times in the day. It is low after switching from standard level to premium level, but we are not near our DB's DTU capacity and we are often messing up very difficult.

Our SQL Ezur DB Premium P1 and our load is generally less than 20% according to the new Azure portal, which has some spikes reaching 50-75% each hour. Of course, we can not even trust AZUR's Portal Metrics. The old portal does not give us any data for our SQL, and the new portal is very clearly wrong on time (our DB was not for 1/2 hour, as Graph suggests, but it was down for more than 2 full minutes):

SQL Ezur Use

esior our db In the size of a bit more than 12 GB (in our own SQL server installation), the DB is under 1GB - this is another question, why is it described as 12GB on Azure?). We have done a lot of tunings and good indeses in the last few years.

Our service runs on two D4 cloud service examples Our DB library is waiting 2, 4, 8, 16, 32, and 48 seconds after it completely fails, try again Applying the logic. Controllers are all ASCs, most of our various external service calls are asynchronous. DB access is still largely synchronous but our most heavier queries are ASCIN. We make heavy use of in-memory and redesign caching. The most used of our DB is 1-3 records for each request (those tables are only asked once to check error levels every 10 minutes)

Including logging of those requests Apart from doing, there is not really much in our app's DB access code. We are not anywhere near our DTU allocation at this database, and our DB is still available to be allocated like the server 2000 DTU. If we do not have to be available every day without a 1 + minute period, then we are going to leave Azure.

Is it the best?

In order to show that we are nowhere near our resource limits, at the premium level we should be given second and second guarantee of our DTU level. But, again, we become more than a full solid minute without being able to get a database connection. What is happening?

db stats

I can also say That's when we feel the delay in any of these times, our stats seem to be reset. The above image was just a few minutes before the 1 minute + delay, and it's a few minutes later:

Statistics Reset

We are in touch with the technical staff of AZU and confirm that this is a bug In our platform, our database becomes the reason to go through multiple failures in a day. He said that he will deploy the fixes starting this week and will continue till next month.

Actually, we are having trouble understanding how anyone can run the web service on Azure reliably, our pool of websites randomly goes down for a few minutes in a month, our Takes down public sites if there are more than 500 responses to our cloud service, cutting some traffic in front of it and 502 (as fully undocumented behavior we can tell) . SQL Azure has a very limited performance and obviously it is not ready for prime time.


No comments:

Post a Comment