SQL Servings: What I Learned

Showing posts with label What I Learned. Show all posts

Monday, August 16, 2010

What I learned today – CHECKDB and database snapshots

A few days ago I was sitting in my local coffee shop, getting a little light reading done. At work we’re about to move from the Standard Edition of SQL 2005 to the Enterprise Edition and I wanted to learn more about some features of database snapshots. Snapshots are basically just a read-only copy of a database taken at a specific time. Anyway I was reading Chapter 3 – Database and Database Files in the excellent book Microsoft SQL Server 2008 Internals. The discussion turned to space allocated for the snapshots. That’s when I read that DBCC CHECKDB actually runs against a snapshot of the database.

That caught me by surprise. Until then I thought that the CHECKDB ran against the database itself.

Now I thought I was doing OK with protecting our databases. I faithfully run CHECKDB weekly against all databases and check for any errors. I set the CHECKSUM option and verify backups. I occasionally restore databases on a standby server to make sure there’s no problems with the backups. And I monitor database and drive space pretty closely.

But I was flying blind. I have no idea how near I came to having CHECKDB fail because of a lack of space. So I started digging a little deeper into CHECKDB. And I was able to learn a little bit more of how CHECKDB works with snapshots.

For anything concerning CHECKDB, your first resource should be Paul Randal (Blog | Twitter). Read his excellent series on CHECKDB From Every Angle to get a good understanding of what’s going on. And he also wrote Chapter 11 – DBCC Internals in Microsoft SQL Server 2008 Internals, a great book you should read anyway.

CHECKDB needs a consistent view of the database to work with. In SQL 7 it acquired table locks. In SQL 2000 it analyzed the transaction log. Beginning with SQL 2005 and going forward it creates a hidden snapshot on the same volume as the database – you have no control over where it’s placed. If you’re running CHECKDB at the same time that your server has a heavy workload the snapshot can run out of space and you’ll get an error showing that CHECKDB didn’t complete. If that happens you have a few options;

Create your own database snapshot on a drive that has enough space and run CHECKDB against that snapshot. CHECKDB will know that its running against a snapshot and won’t create another one.
Start the SQL service in single user mode.
Put the database in read-only mode.

Once CHECKDB has finished it will delete the snapshot if it created one.

But there are a few gotcha’s with CHECKDB creating the snapshot. First, it has to be created on an NTFS drive. Second, you can’t use the WITH TABLOCK hint. And third, you can’t run against tempdb; but why would you want to?.

I’m sure I’m over simplifying. I still need to finish Chapter 11.

One other thing I’m wondering. If CHECKDB can create a snapshot database in SQL 2005 regardless of the edition, can the Standard Edition create one with a hidden flag?

Monday, July 19, 2010

What I learned today – importing mixed data types from Excel

Recently a coworker came up with an issue I never ran into before. He’s importing a large number of Excel spreadsheets into a SQL 2005 instance. One of the columns contains mixed data; 1, 2, 3, x, y, z for example. What he found is that if the first 20 or so rows are strictly numeric then the connection treats the whole column as numeric, and alphanumeric values are returned as NULLs. He’s querying using OPENROWSET, but it looks like this happens in SSIS as well.

What his research found was that there’s a few registry keys that come into play. There’s also another property he needed to set in his connection string. When he adds “IMEX=1” it allows importing of mixed data types based on the the registry setting “ImportMixedTypes”. The default value is “text”. One drawback to text is that you’re limited to 255 characters. The only other available value is “Majority Type”. When you use Majority Type some of the values are scanned and the data type with the most values is used.

The number of rows scanned is based on the “TypeGuessRow” registry setting. By default it’s set to 8, but valid values are 0 to 16. If you set it to 0 then all rows are sampled. Setting it to a value between 1 and 16 samples that number.

Both keys can be found at HKEY_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/.

Here are a few articles that explain the issue better.

http://jingyang.spaces.live.com/blog/cns!CC21A118B1B5250!318.entry?sa=781031802

http://www.instantpages.ltd.uk/ADODB_WP.htm

UPDATED 7/20/2010: You can read more about the actual issue here.

Thursday, April 15, 2010

What I learned tonight – tempdb size when SQL starts

I went to the April monthly meeting of the Chicago SQL Server User Group tonight to see Brad McGehee (Blog | Twitter). Brad’s in town to present at SQLSaturday # 31 in Chicago and he also gave a talk to us tonight on optimizing tempdb performance.

It was a good session because it went over tempdb and best practices. Most of the information I already knew but it’s always good to hear it again, even as a refresher. And there’s usually something I took for granted but was wrong about. And tonight was no exception.

I knew that the data and log files for tempdb are recreated when the SQL service starts. But I thought that tempdb got it’s db options from the model database. For instance, if the model database is set at 2 MB for the data and 1 MB for the log then tempdb would inherit those same settings. But i was wrong. As Brad mentioned tonight, tempdb starts out as 8 MB for the data and the log file is 1 MB. On my laptop its’s actually 512 KB but maybe I heard him wrong.

Anyway, to properly resize tempdb you should run the ALTER DATABASE command…

USE master;
GO

ALTER DATABASE tempdb 
MODIFY FILE ( NAME = 'tempdev', SIZE = 25600KB , FILEGROWTH = 5120KB );
GO

ALTER DATABASE tempdb 
MODIFY FILE ( NAME = 'templog', SIZE = 2048KB , FILEGROWTH = 1024KB );
GO

…and tempdb will be correctly resized after every restart. And you’ve heard that you should create multiple tempdb files depending on the number of CPU’s you have? Well, just do that at the same time. Here I’m renaming, moving, resizing, and adding a file to tempdb;

USE master;
GO

ALTER DATABASE tempdb 
MODIFY FILE (NAME = 'tempdev', NEWNAME = 'tempdev1', FILENAME = 'C:\SQL2008\DATA\tempdev1.mdf');
GO

ALTER DATABASE tempdb 
MODIFY FILE (NAME = 'templog', FILENAME = 'C:\SQL2008\DATA\templog.ldf');
GO

ALTER DATABASE tempdb 
MODIFY FILE (NAME = 'tempdev1', SIZE = 25600KB , FILEGROWTH = 5120KB);
GO

ALTER DATABASE tempdb 
MODIFY FILE (NAME = 'templog', SIZE = 2048KB , FILEGROWTH = 1024KB);
GO

ALTER DATABASE tempdb 
ADD FILE (NAME = 'tempdev2', 
          FILENAME = 'C:\SQL2008\DATA\tempdev2.ndf', 
          SIZE = 25600KB , FILEGROWTH = 5120KB);
GO

The usual disclaimers go here. Don’t use this script without understanding it as it can screw up your server. Don’t use it in production until you’ve testes it.

And I’ll leave the discussion of multiple tempdb files for now.

Monday, January 18, 2010

What I learned today – the RELOG utility

At the Chicago SQL User group meeting on 1/14/2010 John Jones of NetApp gave a talk on storage performance. it was an informative meeting. For instance, did you know that the latency of a 15K Fibre Channel drive is the same as a 15K SAS drive? But SAS drives are smaller; NetApp is able to put 24 drives in a 4 U rack.

During the presentation, John mentioned the RELOG utility and how it can convert a perfmon counter log from the default .blg extension to a .csv file. I’d never heard of RELOG before so I did a little reading up on it. I googled RELOG and found lots of information. The sources I used for this article were Ask The Performance Team, Microsoft TechNet, and this blog post at My Torn Data Pages. And I found that RELOG can do much more than just reformat a file. It can also filter the original counter log to show data from a sample time slice or only pull out certain counters.

RELOG is a command line utility. It’s available in Windows XP or higher machines, and it’s located in the \system32 directory. If you don’t have it you can get from Microsoft downloads. Also the utility isn’t just for SQL counters. you can run it against any perfmon counter file.

Here’s a list of the switches that RELOG uses:

In my examples I’m using a benchmark counter log I capture from our production servers. It’s already saved as a .csv file so I don’t need to convert it but I certainly could if I needed to. Also I’m typing RELOG in caps, though it’s not necessary.

Using RELOG with just the filename of the counter log will give you the start and end time of the capture plus the number of samples in the file…

…and using the –q switch will list the counters used in the file.

Adding the –o switch and specifying a file name will output the counter names to a separate file (RELOG SQL1_20100106.csv –q –o counters.txt).

So now that I know the times and the counters in my baseline, I can filter for the counters between 9 and 10 AM on 1/5(RELOG SQL1_20100106.csv -b "1/5/2010 9:00:00AM" -e "1/5/2010 10:00:00 AM" –f csv -o SQL1_filtered.csv). I can see that I’ve filtered out most of the sampling and now I’ve got the 237 samples I took between 9 and 10.

And I can also filter by individual counters.Let’s say I want to see the transactions per second against a specific database between 9 and 10 AM: RELOG SQL1_20100106.csv -b "1/5/2010 9:00:00AM" -e "1/5/2010 10:00:00 AM" -c "\MSSQL$MISDB:Databases(tempdb)\Transactions/sec" –f csv -o SQL1_filtered2.csv. Running this, I see the same 237 samples returned, but if I open the file, I’ll just see the counters I asked for:

I can also use a wild card to see the transactions/sec counters against all databases: RELOG SQL1_20100106.csv -b "1/5/2010 9:00:00AM" -e "1/5/2010 10:00:00 AM" -c "\MSSQL$MISDB:Databases(*)\Transactions/sec" –f csv -o SQL1_filtered3.csv

I haven’t tried to save the results to a database yet. According to My Torn Data Pages, there’s a problem with the syntax cited by Microsoft TechNet. I want to test both scenarios, using a DSN and the Native Client.

Thursday, November 12, 2009

What I learned today – GID in sysusers table

I had a question from a colleague yesterday about the GID field in the sysusers table. He had a SQL 2000 database that he had moved to a SQL 2005 instance. When he did this, his third party application stopped working. When he investigated, he noticed that the app was using the GID field in sysusers to validate logins and set roles. After he restored the database on the new server the GID field was 0 for everyone. At this point no one could connect to the database through the app.

In SQL 2000, when you add a user to a database, the GID is 0 for the public role. If you add them to a database role, the GID becomes the UID of the role. I’m not sure what happens if you add a user to more than one database roles, that’s something I’ll have to check out. In SQL 2005 this has changed. The GID for all database users is 0. This is not a field that you can update.

This is a good example of why you should never depend on the system tables being static. Microsoft always warns that they may be updated when you apply patches or when you upgrade.

Tuesday, October 27, 2009

What I learned today – Report Server Execution log package

My first attempt at a data warehouse is going to be collecting data from our report servers. In our environment we have two SQL 2008 instances that host the reports themselves, and they connect to four data centers for the report data. In the data centers we have a table that’s populated with parameters needed for each report; dates, locations, etc. The reports themselves only have two parameters; the datacenter where the data for the report is (we’re using dynamic connection strings) and a guid that identifies the other needed parameters in the data center.

My goal was to build my warehouse from the four data center report parameter tables and the Execution Log tables on the report servers. The report server logs information from each time a report is run; if it was successful, the parameters for the report, the user, and more that would be helpful in debugging a report performance. I wanted to be able to view summaries for each report; for instance how often each report was run for each location during a specified date range, average report durations, or the number of aborted reports.

I was going to build an SSIS package to gather the data for me. Then I read an article in the November 2009 issue of SQL Server Magazine by Tyler Chessman that defined about half of what I want to do. In his article SQL Server Reporting Services Questions Answered, Mr Chessman describes sample reports from Microsoft that you can find on CodePlex. The reports will be interesting enough and I’ll be able to use them, but the best part is Microsoft has already created a package to extract the data from the report server execution log!

This post is meant to be an overview. I’ll post a review of the CodePlex samples soon, and I’ll start laying out my data warehouse design.

Wednesday, October 21, 2009

What I learned today – auto generate create table DDL in SSIS

I’m pretty new to SSIS, only having played around with it to see how things work. This is probably well known to SSIS developers but I just ran across it accidentally. I’m learning SSIS and I’m currently working through Brian Knight’s book “Knight’s 24 – Hour Trainer Microsoft SQL Server 2008 Integration Service”(I’ll review the book when I’m finished). Lesson 15 is walking me through loading the contents of a flat file into a new table in a SQL database with the OLE DB Destination.

When I’ve done this before, I’ve always written a DDL statement to create the table. This time, I connected the Flat File Source, which has my sample data, to the OLE DB Destination.This time, when I clicked to create a new table, SSIS generated the CREATE TABLE script based on the definition of the flat file! The only thing I had to change was the table name, and even this would probably have been handled if I had renamed my destination first.

I didn’t think the table would be created until the task was run. But after clicking OK, the table was listed in my test database. I hadn’t even finished defining the destination!