Cost Based Optimization (CBO) vs. Rule Based Optimization (RBO)
June 10, 2009
Posted by on
These terms were brought up in a recent meeting. I decided to dig them out. These are the optimization strategies used by Database engines for executing a query or a stored procedure. They come into picture after a query or Stored Procedure is compiled and is just about to execute (most databases also cache these generated execution plans). Topic of optimization strategies & their differences can be huge one (guess one can write a book on that) but in this post I will try to keep things simple at a definition level. (An analogy here could be you want to travel from destination A to B, & you have several routes to pick up from.)
Rule Based Optimization: This is an old technique. Basically, the RBO used a set of rules to determine how to execute a query. E.g. If an index is available on a table, the RBO rules can be to always use that index (a RBO for our travel analogy can be avoid all routes with speed brakers). As it turns out that this is simpler to implement but not the best strategy always and can backfire. A Classic example of indexing a gender column is shown here in a similar post. RBO was supported in earlier versions of Oracle. (SQL Server supports table hints which in a way can be compared to RBO, as they force optimizer to follow certain path).
Cost Based Optimization: Motivation behind CBO is to come up with the cheapest execution plan available for each SQL statement. The cheapest plan is the one that will use the least amount of resources (CPU, Memory, I/O, etc.) to get the desired output (in relation to our travel analogy this can be Petrol, time, etc.). This can be a daunting task for DB engine as complex queries can thousands of possible execution paths, and selecting the best one can be quite expensive. For more information on CBO I suggest you go through “Inside MS SQL Server 2005 T-SQL Querying”. CBO is supported by most of databases including Oracle, SQL Server, etc.
(N.B. If you find execution plan selected by DB engine is not the optimal one you can try breaking your query into smaller chunks or changing the query logic)
As a programmer you should strive to ensure that cached query plans are used as much as possible. One of the techniques which can get you going is using parameterized queries & this turns out to be important even if you are using an O/R mapper like NHibernate as shown in this post. A related topic with CBO is that of statistics. Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query. Some situations under which you should update statistics a) If there is significant change in the key values in the index b) If a large amount of data in an indexed column has been added, changed, or removed or the table has been truncated using the TRUNCATE TABLE statement and then repopulated c) Database is upgraded from a previous version. One can use UPDATE STATISTICS / sp_updatestats to update statistics for a table or an index.
I will look forward to hear your thoughts on above.
(P.S. TOAD from Quest is very useful tool if you want a deep dive into execution plans, just feed your query / SP to it and it will provide many alternatives plans suggesting optimizations & indexes).