Wednesday, December 14, 2005

System-wide slowdowns

When I talk to customers with performance problems, I ask these two questions:

1. Is everything slow, or is it just specific stuff like a form, a report or a job (Cary Millsap would call it a Business Function, I think :) )?
2. Has it always been slow or did it just start?
3. Do you know what changed?

If everything really IS slow, ie. what I call a system-wide slowdown, then I still don't think you can find the answer fast and reliably (and certainly not without pure luck combined with decades of experience) by looking at system-wide data such as 'sar', StatsPack, v$sysstat, v$system_event and such.

[With one exception, though: Most of the times, this situation is caused by running too many reports or other heavy jobs at the same time, ie saturating the CPU's with CPU-intensive, long-running jobs. Cary wrote about 'The Magic of 2' some years ago, and he'll 'kill' that topic during his Master Class in Copenhagen in January, so that'll be fun. Anyway, when you can establish that CPU usage is 100% (or 0% idle) it's mostly just because of Month's End or similar situations where many reports have to be run.]

Back to the main topic: In the relatively few cases with system-wide slowdowns I've been involved in where it wasn't just because of too many 'batch' or 'CPU-intensive' jobs running simultaneously, it has been possible to find the reason for the general slowdown by 10046'ing something that was used widely, because whatever slows the system down will also impact this fellow.

To be safe, you might want to 10046 (perhaps it should be called Deep-six when it's level 12....) two or three such typical thingies to be sure.

The symptom of whatever is bothering the whole system will show up in the trace files.

So, again, I don't think you can use the system-wide stuff for anything :-).

This should conclude my trilogy on this topic. 'Baselining' and 'StatsPack' were the first two.

13 Comments:

Blogger Rjamya said...

Mogens,

Maybe you should consider holding those Masters Classes in USA. Some of us will be have a better chance of attending one then.

Raj

4:21 AM  
Blogger shrek said...

yes, hold them over here. besides, i have a place here in Austin with 300 beers to sample.;-)

5:56 AM  
Blogger David Aldridge said...

Earlier in the week we found that the cause of our own system-wide slowdown was that an administrator had incorrectly exited "sam" and it was consuming 99% of the CPU. It had notched up 31 hours of CPU time.

Nice.

6:00 AM  
Blogger Moans Nogood said...

Y'all (as they say down South) might be right there: We should do Master Classes in the USA instead of here. I also - again and again - have to listen to persistent rumors about the USA having many more people than Denmark. So where should we base Miracle US? My old hometown of Baton Rouge, perhaps?

3:10 PM  
Anonymous Raymond said...

Mogens,

I am sick an tired flying to the US and Europe for these classes. It is time that you bring the road show to Australia.

3:15 PM  
Blogger Noons said...

I'll second that, Raymondo! Been too long since the last one all the way back in 2001 - or whatever it was: can't remember what I had for breakfast let alone what happened a few years ago!

Besides, this place is known for having a few beers as well...

On the overall system thing: a clasic situation I still get from time to time is just the plain old "too much SQL being executed". A few years ago it was common to see apps designed with a discrete SQL behind every single field on the screen. With the obvious result: parsing madness, SQL overload.

It STILL is common to see this now, except it's behind every single attribute of an object in a bean - or something else, java is not the sole offender here.

And the worst offender: one discrete SQL execution for every single row needed!
Set-at-a-time processing, wazzat?

I won't name products and architectures as it may offend the listeners but it's obvious which I'm talking about.

Solutions? Not many, I'm afraid. But in these situations the statspack may help: check out the most used SQL, if you got horrendous number of executions of very simple row-by-row statements, it's a good tip to start chasing the "architect" with a baseball bat.

4:45 PM  
Blogger Noons said...

Oh BTW: does anyone know what's going on with ixora.com.au? Haven't been able to get to the site for a while, everything alright with Steve Adams?

5:08 PM  
Blogger Roderick said...

I just like how you say you ask 2 questions and then list 3 :-). Of course, question #3 does get fun to ask because sometimes in the beginning the answer is, "Nothing's changed." And after it's solved, someone will say, "Oh? But, I didn't think that change would affect anything, so I didn't think it was worth mentioning."

There may be more exceptions than you think. There is still a growing "newbie" population out there repeating history. Usually takes a few minutes for an experienced person to look at a statspack and calculate some probability that the problem will likely be in the area of SQL optimization, not using binds, short lived connections, bad design, etc. Then you can decide what direction to investigate first - e.g. 10046 or tkprof or looking outside the database altogether. And you need tons of experience to interpret that stuff too...

Isn't CPU usage a system-wide stat? :-)

Similar to Nuno, I actually saw one Web based application issuing one query for each field and radio button on the page and all the queries were against one column of the same table.

[e.g.
login
select empno from emp where rowid=...;
select deptno from emp where rowid = ...;
select hiredate ...
logout
]
They were all in the top executed SQL section of statspack with the same number of executions.

6:53 PM  
Blogger Doug Burns said...

Mogens,

"Cary wrote about 'The Magic of 2' some years ago, and he'll 'kill' that topic during his Master Class in Copenhagen in January, so that'll be fun."

That's fantastic news! So I'll be discussing a dead topic - http://tinyurl.com/b7mgl

I wish someone had said something ;-)

Cheers,

Doug

1:12 PM  
Anonymous Ram said...

"it has been possible to find the reason for the general slowdown by 10046'ing something that was used widely, because whatever slows the system down will also impact this fellow."

Mogens,

Can a form that is used often, for eg, a ticket availability form in an airline reservation system be a good example of 'something that is used widely'?

Ram.

12:02 AM  
Blogger Moans Nogood said...

That would probably be an excellent example of just that, yes.

Mogens

1:13 PM  
Blogger Roger Snowden said...

Umm... aren't you supposed to check the hit ratios and throw more memory at the db until the ratios get better?

But seriously, do a Masters Class here in the states and I will do one of my famous standup routines. Might even discuss diagnostics or some such.

Maybe I'll talk Ricky Sanchez into showing up and ranting about something. You never know.

7:15 PM  
Blogger GuildWars2Items said...

It takes strength to resist temptations and distractions Diablo iii Power Leveling, it takes strength to do what is right D3 PowerLeveling, it takes strength to do al these things. And all the while, these are the very things that build even more strength D3 Power Leveling.


Being happy doesn't mean that everything is perfect. It means that you've decided to look beyond the imperfections cheap c9 gold, don't go around saying the world owes you a living. The world owes you nothing. It was here first c9 money, I have a simple philosophy: Fill what's empty. Empty what's full. Scratch where it itches c9 gold.

1:32 AM  

Post a Comment

Links to this post:

Create a Link

<< Home