Niraj Bhatt – Architect's Blog

Ruminations on .NET, Architecture & Design

Tag Archives: SQL SERVER

Pivot, PivotTable and PowerPivot

OK, I am not a data guy. Or let me say I am not an Excel (MS-EXCEL) guy. I have always believed that if you are good at PowerPoint (along with Visio) you probably are an architect and if you are good at Excel you are most likely to be a manager 🙂 . But not knowing Excel well, has always given me a feeling of – “something is missing”. After all, it’s the same data, but the way few people present it, they just make it look so good. While it takes time to develop those skills, understanding Pivot and related features is usually a good step in that direction.

Excel primarily is a spreadsheet tool in which you can dump some data. At times though, data is not everything. For e.g. your retail transactional system captures the sales across product lines, but if a manager needs to predict his stocks for next fiscal based on historic sales, he got to make some sense out of that data. That’s where you got to Pivot the data. The word ‘Pivot’ as Wiki generally describes it – is center point of any rotational system. You have seen them in action while helping your kids on a seesaw. Pivot table on other hand is a data summarization tool found in spreadsheet softwares like Excel. If you have to bridge this tool and the word ‘Pivot’ – you can think you are bending your data to a specific viewpoint using functions like sort, total, average, count, etc. Let’s see it with help of a trivial example. Consider below the sample data in Excel. To bend it to monthly stock view you can click on Insert tab and insert a Pivot Table. This would give you an empty pivot table and a PivotTable field list. Drag and drop your fields to specific areas to bend your data to a viewpoint (N.B. you can have multiple fields for each area and control them via expand / collapse buttons and all your charting knowledge works as is with Pivot tables).

PowerPivot as the name suggests elevates the Pivot table feature to next level making it full-fledged self-service BI tool. An Excel Add-In, PowerPivot connects to almost any external data sources including SQL Server Analysis Services. Once fetched, data stays inside the workbook and using excellent compression techniques the file size on the disk is still relatively small. You can share your PowerPivot workbooks via SharePoint and configure refresh cycles for your workbook to ensure your team always makes their decisions based on recent data. The experience of pivoting though remains same as shown earlier.

Hope that helps 🙂 .

Setting Auto Commit Off SQL Server

In this post I will share a useful tip for SQL Server. While working with a live database, turning Auto Commit Off / Implicit transactions ON can be quite handy, especially for mitigating human errors. This is often the case when you make an unintentional change to your database (I have seen this even in tightly controlled environments) and your only options are to restore a database backup or rollback your changes. Rollback being relatively easy option can be put in place via implict transactions (though by no means it would mitigate all human errors). Implicit transactions can be turned on via simple T-SQL statement or it can be done across all sessions by modifying SQL Server Management Studio Options (Tools -> Options) as shown below



What is .LDF – SQL Server?

I was quite surprised to see search engines throwing very less information on this one, yet it’s a frequently asked question. LDF acronym stands for Log Data File (though could have been called – Transactional Log Data File). This is a must to have file, and is created at the same time the database file (.mdf) is created by SQL Server.

So why do you need it? For keeping .mdf consistent. To cut short I will copy these lines straight from wiki :
“SQL Server ensures that any change to the data is ACID-compliant, i.e., it uses transactions to ensure that any operation either totally completes or is undone if fails, but never leaves the database in an intermediate state. Using transactions, a sequence of actions can be grouped together, with the guarantee that either all actions will succeed or none will. SQL Server implements transactions using a write-ahead log. Any changes made to any page will update the in-memory cache of the page, simultaneously all the operations performed will be written to a log, along with the transaction ID which the operation was a part of. Each log entry is identified by an increasing Log Sequence Number (LSN) which ensure that no event overwrites another. SQL Server ensures that the log will be written onto the disc before the actual page is written back. This enables SQL Server to ensure integrity of the data, even if the system fails. If both the log and the page were written before the failure, the entire data is on persistent storage and integrity is ensured. If only the log was written (the page was either not written or not written completely), then the actions can be read from the log and repeated to restore integrity. If the log wasn’t written then integrity is also maintained although the database state remains unchanged as if the transaction never occurred. If it was only partially written, then the actions associated with the unfinished transaction are discarded. Since the log was only partially written, the page is guaranteed to have not been written, again ensuring data integrity. Removing the unfinished log entries effectively undoes the transaction. SQL Server ensures consistency between the log and the data every time an instance is restarted.”

Are there any other uses of .LDF? Yes they can be used for Log Shipping, a high availability solution.

What are the recommended optimizations for LDF file? LDF could be stored on a separate disk having a dedicated disk controller in high volume scenario. Growth of LDF should be managed as daily maintenance activity as it can easily eat up the available storage capacity. A 2 disk RAID 1 is also a good trade off for LDF is terms of performance vs. failure protection (if you are able to invest money on disks RAID 10 is good for MDF (faster read)).

Hope that helps you get started 🙂 .

Snapshot vs. LogShipping vs. Mirroring vs. Replication vs. Failover Clustering

All these SQL SERVER terms were quite confusing for me. Luckily, I got to attend Vinod’s session last Saturday at BDOTNET’s UG meet. So I am going to jot down my understanding of them, & will look forward to read your comments on it.

1) Snapshot is a static read only picture of database at a given point of time. Snapshot is implemented by copying a Page (8KB for SQL SERVER) at a time. For e.g. assume you have a table in your DB, & you want to take a snapshot of it. You specify the physical coordinates for storing snapshot & when ever original table changes the affected rows are pushed first to the the snapshot & then changes happen to the DB. (N.B. There is also something called as Snapshot Isolation Level which is different from Database Snapshot).

Usage Scenario: You have a separate DB for report generation, and want to ensure that latest data for that is available. You can periodically take snapshot of your transactional database.

2) Log Shipping is an old technique available since SQL SERVER 2000. Here the transactional log (ldf) is transferred periodically to the standby server. If the active server goes down, the stand by server can be brought up by restoring all shipped logs.

Usage Scenario: You can cope up with a longer down time. You have limited investments in terms of shared storage, switches, etc.

3) Mirroring which was introduced with 2005 edition, works on top of Log Shipping. Main difference is the uptime for the standby server is quite less in mirroring. Standby server automatically becomes active in this case (through help of a broker server which is called as Witness in SQL SERVER parlance), without having to restore logs (actually logs are continuously merged in this scenario – no wonder it’s called Mirror 🙂 ). Additional advantages of Mirroring include support at .NET Framework level (read no switching/routing code – requires ADO.NET 2.0 & higher) plus some new features like Auto Page Recovery introduced with SQL SERVER 2008.

Usage Scenario: You want very less down time and also a cost effective solution in terms of shared storage, switches, etc. Also you are targeting a single database which easily fits in your disks.

4) Replication is used mainly when data centers are distributed geographically. It is used to replicate data from local servers to the main server in the central data center. Important thing to note here is, there are no standby servers. The publisher & subscriber both are active.

Usage Scenario: A typical scenario involves syncing local / regional lookup servers for better performance with the main server in data center periodically, or sync with a remote site for disaster recovery.

5) Failover Clustering is a high availability option only (unlike others above which can be used for disaster recovery as well) used with clustering technology provided by hardware + OS. Here the data / databases don’t belong to either of servers, and in fact reside on shared external storage like SAN. Advantages of a SAN storage is large efficient hot pluggable disk storage. You might see DR options like Mirroring used quite frequently with failover clustering. Here’s a good article on adding geo redundancy to a failover cluster setup.

You might want to look at the licensing options of SQL Server, various editions available and how they map to above features. You can find this information in detail here.

Hope that helps to some extent 🙂 .