Performance Testing – Response vs. Latency vs. Throughput vs. Load vs. Scalability vs. Stress vs. Robustness
Normally I find quite a bit of ambiguity when people talk about performance tests, some restrict it to response time whereas some use it to cover a gamut of things they are testing or measuring. In this post, I will put across few thoughts on contrasting between them. Ideally a lot depends on what you are trying to measure. The terms that you will frequently hear in this arena are – Response Time, Latency, Throughput, Load, Scalability, Stress, Robustness, etc. I will try explaining these terms below also throwing some light on how can you measure them.
Response Time – Amount of time system takes to process a request after it has received one. For instance you have API and you want to find how much time that API takes to execute once invoked, you are in fact measuring response time. So how do we measure them? Simple use a StopWatch (System.Diagnostics) – start it before calling API & stop it after API returns. The duration arrived here is quite small so a preferred practice is to call that API in sequential loops say 1000 times, or pass variable load to the API if possible (input/output varies from KBs/MBs/GBs e.g. returning customer array of varied lengths).
Latency – In simplest terms this is Remote Response time. For instance, you want to invoke a web service or access a web page. Apart from the processing time that is needed on the server to process your request, there is a delay involved for your request to reach to server. While referring to Latency, it’s that delay we are talking about. This becomes a big issue for a remote data center which is hosting your service/page. Imagine your data center in US, and accessing it from India. If ignored, latency can trigger your SLA’s. Though it’s quite difficult to improve latency it’s important to measure it. How we measure Latency? There are some network simulation tools out there that can help you – one such tool can be found here.
Throughput – transactions per second your application can handle (motivation / result of load testing). A typical enterprise application will have lots of users performing lots of different transactions. You should ensure that your application meets the required capacity of enterprise before it hits production. Load testing is the solution for that. Strategy here is to pick up a mix of transactions (frequent, critical, and intensive) and see how many pass successfully in an acceptable time frame governed by your SLAs. How to measure it? You normally require a high end professional tool here like Visual Studio Team System (Load Testing feature). Of course, you can try to simulate load through your custom made applications /code but my experience says custom code are good to test for response times; whereas writing custom code for load testing is too much of work. A good load testing tool like VSTS allows you to pick a mix of transactions, simulate network latency, incorporate user think times, test iterations, etc. I would also strongly recommend this testing to be as close as possible to real world with live data.
Scalability – is the measure of how your system responds when additional hardware is added. Does it take new increased load by making use of added resources? This becomes quite important while taking into consideration the growth projections for your application in future. Here we have two options – scale vertically/up (better machine) or horizontally/out (more machines), latter is usually more preferred one. A challenge to scale out is to ensure that your design doesn’t have any server affinity, so that a Load balancer can adjust load across servers. Measuring scalability can be done with help of load balancing tools with a software/hardware NLB in place ensuring system is able to take on new load without any issues. One can monitor performance counters to see whether actual request load has been balanced/shared across servers (I plan to cover NLB in a future post).
Stress testing – Many people confuse this or relate it to load testing. My take which I have found easy to explain is, if you find yourself running tests for more than 24 hours you are doing a stress test. Motivation behind stress test is to find out how easily your system can recover from over loaded (stressed) conditions. Does it limp back to normalcy or gives up completely? Robustness an attribute that is measured as part of stress testing relates to long running systems with almost negligible down time. A simple example here could be memory leak. Does your system release memory after working at peak loads? Another, what happens if a disk fails due to constant heavy I/O load? Does your system lose data? Finding and addressing such concerns is motivation behind stress testing.
I will look forward to read your thoughts on above
.
Cost Based Optimization (CBO) vs. Rule Based Optimization (RBO)
These terms were brought up in a recent meeting. I decided to dig them out. These are the optimization strategies used by Database engines for executing a query or a stored procedure. They come into picture after a query or Stored Procedure is compiled and is just about to execute (most databases also cache these generated execution plans). Topic of optimization strategies & their differences can be huge one (guess one can write a book on that) but in this post I will try to keep things simple at a definition level. (An analogy here could be you want to travel from destination A to B, & you have several routes to pick up from.)
Rule Based Optimization: This is an old technique. Basically, the RBO used a set of rules to determine how to execute a query. E.g. If an index is available on a table, the RBO rules can be to always use that index (a RBO for our travel analogy can be avoid all routes with speed brakers). As it turns out that this is simpler to implement but not the best strategy always and can backfire. A Classic example of indexing a gender column is shown here in a similar post. RBO was supported in earlier versions of Oracle. (SQL Server supports table hints which in a way can be compared to RBO, as they force optimizer to follow certain path).
Cost Based Optimization: Motivation behind CBO is to come up with the cheapest execution plan available for each SQL statement. The cheapest plan is the one that will use the least amount of resources (CPU, Memory, I/O, etc.) to get the desired output (in relation to our travel analogy this can be Petrol, time, etc.). This can be a daunting task for DB engine as complex queries can thousands of possible execution paths, and selecting the best one can be quite expensive. For more information on CBO I suggest you go through “Inside MS SQL Server 2005 T-SQL Querying”. CBO is supported by most of databases including Oracle, SQL Server, etc.
(N.B. If you find execution plan selected by DB engine is not the optimal one you can try breaking your query into smaller chunks or changing the query logic)
As a programmer you should strive to ensure that cached query plans are used as much as possible. One of the techniques which can get you going is using parameterized queries & this turns out to be important even if you are using an O/R mapper like NHibernate as shown in this post.
I will look forward to hear your thoughts on above.
(P.S. TOAD from Quest is very useful tool if you want a deep dive into execution plans, just feed your query / SP to it and it will provide many alternatives plans suggesting optimizations & indexes).
How EBay does it?
I stumbled on this excellent presentation by Randy Shoup. A must watch.
NHibernate – Lessons Learned
As usual doing a Hello World using a technology turns out to be quite simple. It’s only when you start getting into serious stuff, you run into issues & of course learn a lot. I am sharing few of my learning’s on NHibernate over last month. The DB I am using is Oracle which makes these observations more important as there is a lot available out there on NHibernate but only with SQL SERVER.
1) Optimistic locking: NHibernate has an element called <version>, defined right after ID which can handle concurrency for you. I used a TIMESTAMP column in oracle & mapped it to a DateTime property in my C# class.
<version column=”ROWVERSION” type=”DateTime” name=”RowVersion” />
Now when you insert an entity containing version as above, NHibernate would automatically generate the TimeStamp for you & on update it will also do an automatic comparison. If you need to catch the Exception arising out of this issue, it would be StaleObjectStateException.
2) Cascade issues: A good description of why / what are cascades can be found here, though there are still few issues you will run into. Consider an entity called Class, which has a many-to-one relationship with professor.
<many-to-one name=”Professor” column=”A_PROFESSORID” cascade=”none” lazy=”false” />
For unidirectional associations like many-to-one (there is no one-to-many back), it makes sense to assign “none” to cascade, otherwise you will see superfluous updates. In this scenario, whenever you modify class attributes / properties, it will fire updates for Professor.
Though you still may not be able to completely get rid of superfluous updates. Let me put up one more scenario. I have a Department & I have a Class (one-to-many). Now for this you would keep cascade=”all” for obvious reasons as a save / update /delete to Department should also save / update / delete Classes. But if you modify only a single property of department you will still end up updating all classes
. I hope such superfluous updates will be removed with upcoming versions.
In case you require a soft delete (which is the case for many applications, maintaining an IsActive Flag), you can keep cascade option as “Save-Update” in parent (one-to-many).
3) BiDirectional / UniDirectional navigational relationships / associations: If you want to define a one-way only traversal through NHibernate you may not be quite successful. Say, a department has classes (one-to-many). Now if your DB (Domain) says that classes can’t stand on their own they have to always belong to a department then you are in a fix. You must provide a bidirectional reference so that whenever you create a new instance of a class, you can bind it to Department.
E.g.
public virtual void AddUniversityClass(UniversityClass universityClass)
{
if(_classes == null)
_classes = new ObservableList();
universityClass.Department = this; //BiDirectional, Class has a department & department has classes
_classes.Add(universityClass);
}
If you don’t do above your insert of Class would fail as they would have a null foreign key for Department. Sounds obvious but looks more like database structure affecting your object model (navigation), which you normally don’t want. Yet some may argue in favor of it, citing benefits of explicit constraints. One way could be making your navigation relationship (many-to-one for instance) private.
4) For collections don’t forget inverse=”true”: Why? I could have elaborated but it’s already explained here. This tells that responsibility for updating the column value is on the other side (just specifying inverse=’true’ on the collection <one-to-many> association).
5) Oracle supports batching through adonet.batch_size: This was confirmed to me by Ayende and Tomer. Batching would club the CRUD statements & send them to DB at one shot giving decent performance gain. The only thing I don’t like here is the attribute name. After all if the batching is done for oracle also, why call it adonet.batch_size? It was confusing for me atleast
. Also this works only with ODP.NET.
6) Use fetch=”join” for one-to-many & many-to-one relationships: This is to avoid select N + 1 problem. In my infancy stage of programming I use fall into this trap (limited knowledge of SQL
). N.B. outer-join attribute is deprecated.
7) Always use Criteria queries unless you need to do a projection: Criteria Queries are easier to read, they leverage on configuration files (e.g. fetch=”join”), & they safely handle parameters (developers run into concatenation of parameters using HQL). But alas you can’t use Criteria Queries everywhere. For instance if you need to fetch only few fields of a given entity (no associations, only selected properties for a lookup display), HQL is the only way out.
E.g.
string hqlQuery = “select new Department(D.ID, D.Name) from Department as D”; // Your class needs to have this constructor along with the default one
IQuery _query = Session.CreateQuery(hqlQuery);
IList _list = _query.List();
var observablelist = new ObservableList();
foreach (Department department in _list) { observablelist.Add(department); }
N.B. Projections are of utmost importance in NHibernate. Try to avoid bringing any unnecessary data to the client even if you have to resort to untyped objects. You can use a reader like loop to create your typed object graph & throw that graph back to client.
8 ) Many-To-Many doesn’t work with extra columns: Many-To-Many is normally expressed in database with help of third table. Though NHibernate doesn’t require a corresponding class for the third (mapping) table, but if third table contains additional columns other then the primary keys of involved tables, NHibernate won’t be able to support them. Solution we found was to move those additional attributes from the third table to the involved tables.
9) Components are of good value: To be frank I ignored the components completely when I started with NHibernate. In fact the whole concept of Entity vs. Value objects wasn’t clear to me even after going through DDD books. I mean, I got Value objects don’t require an identity but how does that affect my programming. But boy, I was wrong. I could have elaborated on the same here, but would request you to go through Hibernate in Action section 3.5 (starting page 92-93).
10) unsaved-value is important: You can use it to specify default values for columns. NHibernate also uses it to distinguish between new & modified entities.
E.g. Session.SaveOrUpdate(“Department”, department); //unsaved value would be used here to determine whether to do an insert or update. My Id reads as below for Oracle:
<id name=”ID” column=”A_ID” unsaved-value=”0″>
<generator class=”Infrastructure.GenerateID, Infrastructure” /><!– GenerateID is a class that implements IIdentifierGenerator interface, in my case it returns a GUID. There is no identity column in Oracle !–>
</id>
11) NHibernate supports database views: You can have a mapping file point to a database view as well. I normally use it for summary display (complex queries). Ensure you check your performance before getting into it (especially no where clauses attached). Views typically won’t have an primary column but your mapping file would still required one for ID element. So you can pick a column of your choice which is unique and make it part of ID by removing the same from property mapping. You can point generator class to native (an easy way to get away).
<id name=”_year” column=”YEAR” unsaved-value=”0″ access=”field”>
<generator class=”native” />
</id>
12) Parameterized Queries: If you are using NHibernate ideally you would talk to underlying DB with Criteria queries, HQL or raw SQL. NHibernate supports safe parameter passing in all of them as show below:
a) Criteria Queries: ICriteria criteria = Session.CreateCriteria(typeof(Post), “P”).Add(Expression.Eq(“P.PostId”, postId));
b) HQL or Raw SQL: s.CreateQuery(“from Post p where p.Title = :title”).SetString(“title”,”NHibernate”).List();
You also need to add a line to NHibernate configuration as discussed here – <property name=’prepare_sql’>true</property>
13) NHProf: Finally you should use NHProf to improve your understanding on the way NHibernate works.
I would appreciate your observations on above points.