Friday, July 24, 2009

Replaying traces, Part 3

I had a few surprises when I began replaying a production workload on a virtual server. I'm using the same workload I captured from a production server and replayed on a physical server in different scenarios. The first virtual test was on a 32 bit server hosted on a VMWare box with CPU affinity turned on. A limitation of virtual servers (at least for now) is that they are limited to 4 CPUs.

The first surprise was that the replay was faster than the 4 CPU test on a physical server, 3:07 virtual compared to 3:40 physical. The second surprise is that the total number of transactions appeared to have dropped.

I'll be looking into this to see what I'm missing, and I'll post more details soon.

Wednesday, July 22, 2009

Replaying traces, Part 2

When I left off, I was discussing using SQL Profiler to replay a trace taken on a production server on different test boxes with different configurations. The end result is to see if we can virtualize our production OLTP database servers.

Our next step in testing was to test replaying a trace on the same server it was captured on. We couldn’t do this on a production server so we used our testing server and assigned our developers to hit the database with our application for one hour. Afterwards we restored the databases to the same state they were in and replayed the trace using the same options we use on the other server tests; using multi-threading, not displaying results, and not replaying server SPIDs. The replay of the trace took only 45 minutes. We probably didn’t capture a large enough workload. But we did rule out the overhead of running the replay when comparing to performance of the production server when the workload was captured.

Back to testing. Since we can’t exactly duplicate the replay of the workload to match the production server, we decided that running the replay on the test server would be our new baseline since we could duplicate that for every test. Testing on the physical box showed the following:

  Production 16 CPU (new baseline) 4 CPU, HT ON 4 CPU, HT OFF
Time

1:07

2:37

3:40

2:44

Avg CPU

17%

24%

83%

77%

We expected the 4 CPU tests to take longer and use more CPU. We were surprised that the 4 CPU test with hyper-threading turned off performed better than with hyper-threading on. So we ran one more test, using all physical processors and turning ht off. This gave us 8 physical processors to work with. This test, though, was incomplete. We ran it twice and both time the replay appeared to hang after completing 99%. Both times we needed to kill the replay after 5 hours. The counters we captured weren’t valid.

I’ll post on testing on a virtual server in a future post.

Technorati Tags: ,