Niraj Bhatt – Architect’s Blog

Ruminations on .NET, Architecture & Design

What is .LDF – SQL Server?

I was quite surprised to see search engines throwing very less information on this one, yet it’s a frequently asked question. LDF acronym stands for Log Data File (though could have been called – Transactional Log Data File). This is a must to have file, and is created at the same time the database file (.mdf) is created by SQL Server.

So why do you need it? For keeping .mdf consistent. To cut short I will copy these lines straight from wiki :
“SQL Server ensures that any change to the data is ACID-compliant, i.e., it uses transactions to ensure that any operation either totally completes or is undone if fails, but never leaves the database in an intermediate state. Using transactions, a sequence of actions can be grouped together, with the guarantee that either all actions will succeed or none will. SQL Server implements transactions using a write-ahead log. Any changes made to any page will update the in-memory cache of the page, simultaneously all the operations performed will be written to a log, along with the transaction ID which the operation was a part of. Each log entry is identified by an increasing Log Sequence Number (LSN) which ensure that no event overwrites another. SQL Server ensures that the log will be written onto the disc before the actual page is written back. This enables SQL Server to ensure integrity of the data, even if the system fails. If both the log and the page were written before the failure, the entire data is on persistent storage and integrity is ensured. If only the log was written (the page was either not written or not written completely), then the actions can be read from the log and repeated to restore integrity. If the log wasn’t written then integrity is also maintained although the database state remains unchanged as if the transaction never occurred. If it was only partially written, then the actions associated with the unfinished transaction are discarded. Since the log was only partially written, the page is guaranteed to have not been written, again ensuring data integrity. Removing the unfinished log entries effectively undoes the transaction. SQL Server ensures consistency between the log and the data every time an instance is restarted.”

Are there any other uses of .LDF? Yes they can be used for Log Shipping, a high availability solution.

What are the recommended optimizations for LDF file? LDF could be stored on a separate disk having a dedicated disk controller in high volume scenario. Growth of LDF should be managed as daily maintenance activity as it can easily eat up the available storage capacity. A 2 disk RAID 1 is also a good trade off for LDF is terms of performance vs. failure protection (if you are able to invest money on disks RAID 10 is good for MDF (faster read)).

Hope that helps you get started :) .

June 14, 2009 Posted by nirajrules | SQL SERVER | , | 2 Comments

Cost Based Optimization (CBO) vs. Rule Based Optimization (RBO)

These terms were brought up in a recent meeting. I decided to dig them out. These are the optimization strategies used by Database engines for executing a query or a stored procedure. They come into picture after a query or Stored Procedure is compiled and is just about to execute (most databases also cache these generated execution plans). Topic of optimization strategies & their differences can be huge one (guess one can write a book on that) but in this post I will try to keep things simple at a definition level. (An analogy here could be you want to travel from destination A to B, & you have several routes to pick up from.)

Rule Based Optimization: This is an old technique. Basically, the RBO used a set of rules to determine how to execute a query. E.g. If an index is available on a table, the RBO rules can be to always use that index (a RBO for our travel analogy can be avoid all routes with speed brakers). As it turns out that this is simpler to implement but not the best strategy always and can backfire. A Classic example of indexing a gender column is shown here in a similar post. RBO was supported in earlier versions of Oracle. (SQL Server supports table hints which in a way can be compared to RBO, as they force optimizer to follow certain path).

Cost Based Optimization: Motivation behind CBO is to come up with the cheapest execution plan available for each SQL statement. The cheapest plan is the one that will use the least amount of resources (CPU, Memory, I/O, etc.) to get the desired output (in relation to our travel analogy this can be Petrol, time, etc.). This can be a daunting task for DB engine as complex queries can thousands of possible execution paths, and selecting the best one can be quite expensive. For more information on CBO I suggest you go through “Inside MS SQL Server 2005 T-SQL Querying”. CBO is supported by most of databases including Oracle, SQL Server, etc.

(N.B. If you find execution plan selected by DB engine is not the optimal one you can try breaking your query into smaller chunks or changing the query logic)

As a programmer you should strive to ensure that cached query plans are used as much as possible. One of the techniques which can get you going is using parameterized queries & this turns out to be important even if you are using an O/R mapper like NHibernate as shown in this post.

I will look forward to hear your thoughts on above.

(P.S. TOAD from Quest is very useful tool if you want a deep dive into execution plans, just feed your query / SP to it and it will provide many alternatives plans suggesting optimizations & indexes).

June 10, 2009 Posted by nirajrules | Performance Tuning, SQL SERVER | | No Comments Yet

Snapshot vs. LogShipping vs. Mirroring vs. Replication vs. Failover Clustering

All these SQL SERVER terms were quite confusing for me. Luckily, I got to attend Vinod’s session last Saturday at BDOTNET’s UG meet. So I am going to jot down my understanding of them, & will look forward to read your comments on it.

1) Snapshot is a static read only picture of database at a given point of time. Snapshot is implemented by copying a Page (8KB for SQL SERVER) at a time. For e.g. assume you have a table in your DB, & you want to take a snapshot of it. You specify the physical coordinates for storing snapshot & when ever original table changes the affected rows are pushed first to the the snapshot & then changes happen to the DB. (N.B. There is also something called as Snapshot Isolation Level which is different from Database Snapshot).

Usage Scenario: You have a separate DB for report generation, and want to ensure that latest data for that is available. You can periodically take snapshot of your transactional database.

2) Log Shipping is an old technique available since SQL SERVER 2000. Here the transactional log (ldf) is transferred periodically to the standby server. If the active server goes down, the stand by server can be brought up by restoring all shipped logs.

Usage Scenario: You can cope up with a longer down time. You have limited investments in terms of shared storage, switches, etc.

3) Mirroring which was introduced with 2005 edition, works on top of Log Shipping. Main difference is the uptime for the standby server is quite less in mirroring. Standby server automatically becomes active in this case (through help of a broker server which is called as Witness in SQL SERVER parlance), without having to restore logs (actually logs are continuously merged in this scenario – no wonder it’s called Mirror :) ). Additional advantages of Mirroring include support at .NET Framework level (read no switching/routing code – requires ADO.NET 2.0 & higher) plus some new features like Auto Page Recovery introduced with SQL SERVER 2008.

Usage Scenario: You want very less down time and also a cost effective solution in terms of shared storage, switches, etc. Also you are targeting a single database which easily fits in your disks.

4) Replication is used mainly when data centers are distributed geographically. It is used to replicate data from local servers to the main server in the central data center. Important thing to note here is, there are no standby servers. The publisher & subscriber both are active.

Usage Scenario: A typical scenario involves syncing local / regional lookup servers for better performance with the main server in data center periodically, or sync with a remote site for disaster recovery.

5) Failover Clustering is a high availability option only (unlike others above which can be used for disaster recovery as well) used with clustering technology provided by hardware + OS. Here the data / databases don’t belong to either of servers, and in fact reside on shared external storage like SAN. Advantages of a SAN storage is large efficient hot pluggable disk storage.

You might want to look at the licensing options of SQL Server, various editions available and how they map to above features. You can find this information in detail here.

Hope that helps to some extent :) .

December 8, 2008 Posted by nirajrules | SQL SERVER | , , , , | 4 Comments