Thursday, August 20, 2009

Replaying traces, Part 5

So I’m now testing my workload on a virtual 64 bit server, again hosted by VMWare, again with only 4 CPUs. And again I find more issues.

I ran this replay 3 times so far. All three times the replay appeared to have hung at the 99% mark, similar to the 8 CPU physical test. The only difference was that, for the final test, I didn’t save the results to disk. I did this because on the second test there appeared to be SQL activity that was suspended by the TRACEWRITE wait type. The drive it was writing to was the only virtual drive; all databases are on SAN storage.

While all three hung at the 99% mark after about 3 hours, each test had different results. The first test was showing transactions against the database throughout the trace (the black line. The blue line is % Processor Time, the yellow line is % Privilege Time, the red line is transactions/sec against tempdb). The Processor time showed activity from the Profiler after the 99% mark .Test 1
The second trace showed no transactions after the 99% mark. This is the one that showed the TRACEWRITE wait type. Notice that there are no transactions against the db after the 3 hour mark this time.Test 2  
Test 3 shows the same pattern as test 1. There were no wait times for tests 1 and 3.Test 3

There is one more issue with these three tests. After stopping the replays, I had to manually kill the Profiler process because profiler became unresponsive. The third test stayed alive and was stopping the process and I let it go. However the next morning I checked and it was still “stopping” after 13 hours. And the Profiler was still responding.

The more I run these replays the more confused I get. I’m going to try one more time. This run will be from a remote machine, I’m only going to capture the counters on the virtual server.

No comments: