WS-Fed vs. SAML vs. OAuth vs. OpenID Connect

Identity protocols are more pervasive than ever. Almost every enterprise you would come across will have a identity product incubated, tied with a specific identity protocol. While the initial idea behind these protocols was to help enterprise employees use a single set of credentials across applications, but new use cases have shown up since then. In this post, I am going to provide a quick overview of major protocols and the use cases they are trying to solve. Hope you will find it useful.

WS-Fed & SAML are the old boys in the market. Appearing in early 2000s they are widespread today. Almost every major SSO COTS product supports one of these protocol. WS-Fed (WS-Federation) is a protocol from WS-* family primarily supported by IBM & Microsoft, while SAML (Security Assertion Markup Language) adopted by Computer Associates, Ping Identity and others for their SSO products. The premise with both WS-Fed and SAML is similar – decouple the applications (relying party / service provider) from identity provider. This decoupling allows multiple applications to use a single identity provider with a predefined protocol, and not care about the implementation details of identity provider per se.

For web applications, this works via a set of browser redirects and message exchanges. User tries to access web application, the application redirects user to identity provider. User authenticates himself, identity provider issues a claims token and redirects user back to the application. Application then validates the token (trust needs to established out of band between application and IdP), authorizes user access by asserting claims, and allows user to access protected resources. The token is then stored in the session cookie of user browser, ensuring the process doesn’t have be repeated for every access request.

At a high level there isn’t much separating the flow of these two protocols, but they are different specifications with each having its own lingo. WS-Fed is perceived to be less complex and light weight (certainly an exception for WS-* family), but SAML being more complex is also perceived to be more secure. At the end you have to look at your ecosystem including existing investments, partners, in house expertise, etc. and determine which one will provide higher value. The diagram below taken from wiki, depicts the SAML flow.

640px-Saml2-browser-sso-redirect-post

OAuth (Open Standard for Authorization) has different intent (the current version is OAuth 2.0). It’s driving force isn’t SSO but access delegation (type of authorization). In simplest terms, it means giving your access to someone you trust, so that they can perform the job on your behalf. E.g. updating status across Facebook, Twitter, Instagram, etc. with a single click. Option you have is either to go to these sites manually, or delegate your access to an app which can implicitly connect to these platforms to update status on your behalf. Flow is pretty simple, you ask application to update your status on Facebook, app redirects you to Facebook, you authenticate yourself to Facebook, Facebook throws up a consent page stating you are about give this app rights to update status on your behalf, you agree, the app gets an opaque access token from Facebook, app caches that access token, send the status update with access token to facebook, facebook validates the access token (easy in this case as the token was issued by Facebook itself), and updates your status.

OAuth refers to the parties involved as Client, Resource Owner (end-user), Resource Server, and Authorization Server. Mapping these to our Facebook example, Client is the application trying to do work on your behalf. Resource owner is you (you owe the Facebook account), Resource Server is the Facebook (holding your account), and Authorization Server is also Facebook (in our case Facebook issues the access token using which client can update status on Facebook account). It perfectly ok for Resource Server and Authorization Server to be managed by separate entities, it just means more work to establish common ground for protocols and token formats. Below screenshot depicts the OAuth2 protocol flow

OAuth2

Web community liked the lightweight approach of OAuth. And hence, the question came – can OAuth do authentication as well, providing an alternative to heavy lifting protoo WS-Fed and SAML? Enter OpenID Connect is about adding Authentication to OAuth. It aims at making Authorization Server do more – i.e. not only issuing access token, but also an ID token. ID token is a JWT (JSON Web Token) containing information about authentication event, like when it did it occur, etc. and also about subject / user (specification talks of a UserInfo Endpoint to obtain user details). Going back to the Facebook example, here the client not only relies on Facebook to provide us an opaque access token for status updates, but also an ID token which client can consume and validate that the user actually authenticated with Facebook. It can also fetch additional user details it needs via Facebook’s UserInfo Endpoint. Below diagram from OpenID connect spec indicates the protocol flow.

OpenIDConnect

OP in above case is OpenID Provider. All OpenID Providers have the discovery details published via JSON document found by concatenating provider URL with /.well-known/openid-configuration. This document has all the provider details including Authorization, Token and UserInfo Endpoints. Let’s see a quick example with a Microsoft offering called Azure Active Directory (Azure AD). Azure AD being a OpenID Provider, will have the openid configuration for it’s tenant demoad2.onmicrosoft.com available at https://login.microsoftonline.com/demoad2.onmicrosoft.com/.well-known/openid-configuration.

Fairly digestible, isn’t it 🙂 ?

Evolution of software architecture

Software architecture has been an evolutionary discipline, starting with monolithic mainframes to recent microservices. It’s easy to understand these software architectures from an evolution standpoint, rather than trying to grasp them independently. This post is about that evolution. Let’s start with mainframes.

The mainframe era was of expensive hardware, having powerful server capable of processing large amount instructions and client connecting to it via dumb terminals. This evolved as hardware became cheaper and dumb terminals paved way to smart terminals. These smart terminals had reasonable processing power leading to a client server models. There are many variations of client server models around how much processing a client should do versus the server. For instance, client could all the processing having server just act as centralized data repository. The primary challenge with that approach was maintenance and pushing client side updates to all the users. This led to using browser clients where the UI is essentially rendered from server in response to a HTTP request from browser.

Mainframe Client Server

As server started having multiple responsibilities in this new world like serving UI, processing transactions, storing data and others, architects broke the complexity by grouping these responsibilities into logical layers – UI Layer, Business Layer, Data Layer, etc. Specific products emerged to support this layers like Web Servers, Database servers, etc. Depending on the complexity these layers were physically separated into tier. The word tier indicates a physical separation where Web Server, Database server and business processing components run on their own machines.

3 Tier Architecture

With layers and tiers around, the next big thing was how do we structure them what are the ideal dependencies to have across these layers, so that we can manage change better? Many architecture styles showed up as recommended practice most notably Hexagonal (ports and adapters) architecture and Onion architecture. These styles were aimed to support development approaches like Domain Driven Design (DDD), Test Driven Development (TDD), and Behavior Driven Development (BDD). Themes behind these styles and approaches were to isolate the business logic, the core of your system from everything else. Not having your business dependent on UI, database, Web Services, etc. allows for more participation of business teams, simplifies change management, minimizes dependencies, and make the software easily testable.

HexagonalOnion

Next challenge was scale. As compute became cheaper, technology became a way of life causing disruption and challenging the status quo of established players across industries. The problems are different, we are no longer talking of apps that are internal to an organization or mainframes where users are ok with longer wait times. We are talking of global user base with sub second response time. Simpler approach to scale was better hardware (scale up) or more hardware (scale out).  Better hardware is simple but expensive and more hardware is affordable but complex. More hardware meant your app would run on multiple machines. That’s great for stateless compute but not so for storage where the data now would be distributed across machines. This distribution leads us to famous CAP (Consistency, Availability and Partition tolerance) theorem. While there are many articles on CAP, essentially it boils down to – network partitions are unavoidable and we have to accept them. This is going to require is to choose between availability and consistency. You can choose to be available and return stale data to be eventually consistent or you can choose to be strongly consistent and give up on availability (i.e. returning error for the missing data – e.g. you are trying to read from a different node than where you wrote your data). Traditional Database servers are consistent and available (CA) with no tolerance for partition (active DB server catering to all requests). Then there are NoSQL databases with master slave relationship, configurable to support strong consistency or eventual consistency.

CAP Theorem

Apart from scale challenges, today’s technology systems have to often deal with contention. E.g. selecting an airline seat, or a high discount product that everyone wants to buy on a Black Friday. As multitude of users are trying to get access to the same piece of data, it leads to contention. Scaling can’t solve this contention, it can only make it more worse (imagine having multiple records of the same product inventory within your system). This led to specific architecture styles like CQRS (Command Query Responsibility Segregation) & Event Sourcing. CQRS in its simple terms is about separating writes (command) from reads (query). With writes and reads having separate stores and model both can be optimally designed. Write stores in such scenarios typically use Event Sourcing to capture entity state for each transaction. Those transactions are then played back to the read store to make write and read eventually consistent. This model of being eventually consistent would have some implications and needs to be worked with business to keep the customer experience intact. E.g. Banana Republic recently allowed me to order an item. They took my money and later during fulfillment they realized they were out of stock (that is when things became eventually consistent). Now they refunded my money, sent me a sorry email and allowed me 10% discount on my next purchase to value me as a customer. As you would see CQRS and Event Sourcing come with their own set of tradeoffs. They should be used wisely for specific scenarios rather than an overarching style.

CQRS

Armed with above knowledge, you are now probably thinking can we use these different architecture styles within a single system system? For instance, have parts of your system use 2-tier other use CQRS and other use Hexagonal architecture. While this might sound counterproductive, it actually isn’t. I remember building a system for healthcare providers, where every use case was so different – appointments, patient registration, health monitoring, etc. Using a single architecture style across the system was definitely not helping us. Enter Microservices. The microservices architecture style recommends to break your system into a set of services. Each service can then be architected independently, scaled independently, and deployed independently. In a way, you are now dealing with vertical slices of your layers. Having these slices evolve independently, will allow you to adopt a style that’s more befitting to the slice in context. You might ask, while this makes sense for the architecture, but won’t you just have more infrastructure to provision, more units to deploy and more stuff to manage? You are right, and what really makes Microservices feasible is the agility ecosystem comprising of cloud, DevOps, and continuous delivery. They bring automation and sophistication to your development processes.

Microservices

So does this evolution make sense? Are there any gaps in here which I could fill? As always, will look forward for your comments.

Image Credits: Hexagonal Architecture, Microservices Architecture, 3-Tier architecture, client server architecture, CQRS architecture, Onion architecture, CAP Theorem

Windows Azure Portals and Access Levels

When you sign up for Windows Azure you get a subscription and you are made the Service administrator of that subscription.

Image

While this creates a simple access model, things do get little complicated in an Enterprise where users need various levels of access. This blog post would help you understand these access levels. 

Enterprise Administrator
Enterprise Administrator has the ability to add or associate Accounts to the Enrollment and can view usage data across all Accounts. There is no limit to the number of Enterprise Administrators on an Enrollment.
Typical Audience: CIO, CTO, IT Director
URL to GO: https://ea.windowsazure.com

Account Owner
Account Owner can add Subscriptions for their Account, update the Service Administrator and Co-Administrator for an individual Subscription, and can view usage data for their Account. By default all subscriptions are named as ‘Enterprise’ on creation. You can edit the name post creation in the account portal. Under EA usage, only Account Administrators can sign up for Preview features. Recommendation for accounts to be created is either on functional, business or geographic divisions, though creating a hierarchy of accounts would help larger organizations.
Typical Audience: Business Heads, IT Divisional Heads
URL to GO: https://account.windowsazure.com

Service Administrator
Service Administrator and up to nine Co-Administrators per Subscription have the ability to access and manage Subscriptions and development projects within the Azure Management Portal. The Service Administrator does not have access to the Enterprise Portal unless they also have one of the other two roles. It’s recommended to create separate subscriptions for Development and Production, with production having strict restricted access.
Typical Audience: Project Manager, IT Operations
URL to GO: https://manage.windowsazure.com

Co-Administrators
Subscription co-administrators can perform all tasks that the service administrator for the subscription can perform. A co-administrator cannot remove the service administrator from a subscription. The service administrator and co-administrators for a subscription can add or remove co-administrators from the subscription.
Typical Audience: Test Manager, Technical Architect, Build Manager
URL to GO: https://manage.windowsazure.com

That’s it! With above know-how you can create an EA Setup like below

Image

Hope this helps 🙂

Windows Azure vs. Force.com vs. Cloud Foundry

Below is a brief write up of some personal views. Let me know your thoughts.

Windows Azure is the premier cloud offering from Microsoft. It has a comprehensive set of platform services ranging from IaaS to Paas to SaaS. This is a great value proposition for many enterprises looking to migrate to cloud in a phased manner; first move as-is with IaaS and then evolve to PaaS. In addition, Azure has deep integration across Microsoft products –including SharePoint, SQL Server, Dynamics CRM, TFS, etc. This translates to aligned cloud roadmap, committed product support and license portability. Though .NET is the primary development environment for Azure platform, most of the Azure services are exposed as REST APIs. There are JAVA, Ruby and other SDKs available which allows variety of developers to easily leverage Azure platform. Azure also allows customers to spawn Linux VMs, though that’s limited to IaaS offerings.

Force.com allows enterprises to extend Salesforce.com – the CRM from SalesForce. Instead of just providing SDKs and APIs, Salesforce has created force.com as a PaaS platform – so that you focus only on building extensions; rest is managed by Salesforce. Salesforce also provides a marketplace ‘AppExchange’ where companies can sell these extensions to potential customers. Though force.com offers an accelerated development platform (abstracting many programming aspects), programmers still need to learn APEX programming language and related constructs. Some enterprises are considering force.com as their de-facto programming platform – taking it beyond the world of CRM. It’s important to understand the applicability of force.com for such scenarios would typically be limited to transactional business applications. So, where should enterprises go when they need to develop custom applications with different programming stacks and custom frameworks? Salesforce answer is Heroku. Heroku supports all the major programming platforms including Ruby, Node.js, JAVA, etc. with exception of .NET. Heroku uses Debian and Ubuntu as the base operating system.

Many enterprises today are contemplating their move to PaaS cloud citing vendor lock-in. For instance, if they move to Azure PaaS platform their applications would run only on Azure, and they would have to remediate them to port to AWS. It would definitely be great to have a PaaS platform agnostic of a vendor. This is the idea behind open source PaaS platform Cloud Foundry. It’s an effort co-funded by VMware and EMC. VMware offers a Cloud Foundry hosted solution, with the underlying infrastructure being vCloud. Cloud Foundry supports various programming languages like Java, Ruby, Node.js, etc. and frameworks like MySQL, MongoDB, RabbitMQ among others. VMware also offers vFabric, a PaaS platform focused on JAVA spring framework. vFabric is an integrated product with VMWare infrastructure, providing a suite of offerings around Runtime, Data Management and Operations. I feel future of vFabric is likely to depend on the industry adoption of Cloud Foundry (there is also another open source PaaS effort being carried out by Red Hat called OpenShift).

Overview of VMware Cloud Platform

Continuing my discussion on major Cloud Platforms, in this post I will talk about VMware (subsidiary of EMC) – one of the companies that pioneered the era of virtualization. Flagship product of VMware is ESX (VSphere being product, which bundles ESX with vCenter) a hypervisor that runs directly on the hardware (bare metal). As you would expect, VMware is major player in private cloud and data center space. It also has a public IaaS (Infrastructure as a Service) cloud offering and also supports an open source PaaS platform (understandably no SaaS offerings). Below is a quick overview of VMware offerings.

Private CloudvCloud Suite is an end-to-end solution from VMware for creating and managing your own private cloud. The solution has two major components – Cloud Infrastructure and Cloud Management. Cloud Infrastructure components include VMware products like vSphere (cloud OS controlling the underlying infrastructure) and vCloud Director (multitenant self-service portal for provisioning VM instances based on vApp Templates), while Cloud Management consists of operational products like vCenter (centralized extensible platform for managing infrastructure) among others. There are also vCloud SDKs available which you can use to customize the platform to specific business requirements. Also, with last year acquisition of DynamicOps (now called vCloud Automation Center) VMware is extending its product support to other hypervisors in the market. Other vendors too like Microsoft are evolving with similar offerings with Hyper-V, System Center, SPF and Windows Azure Services. It’s important to note though, quite a few enterprises operate a private cloud like setup using VSphere alone and build custom periphery around it as necessary.

Public Cloud – In case you don’t have budget to setup your own datacenter or are looking to build a hybrid approach which helps you do a cloud burst for specific use cases, you can leverage VMware’s vCloud Hybrid Service (AKA vCHS). The benefit here is migration and operation remains seamless, as you would use the same tools (and seamlessly extend your processes) that were being used for in-house Private Clouds.

PaaS Cloud – VMware has a PaaS offering for private clouds called vFabric. vFabric application platform contains various products focused on JAVA Spring Framework stack. Architects can create a deployment topology using drag and drop for their multi-tier applications. Not only they can automate the provisioning, but also scale their applications in accordance with business demand. In addition, VMware is also funding an open source PaaS platform called Cloud Foundry (CF). The value proposition here is you can move this platform to any IaaS vendor (vCloud, OpenStack, etc.), so when you switch between cloud vendors you don’t have to modify your applications. This is contrary to other PaaS offerings which are tied to the underlying infrastructure – e.g. application ready for Azure PaaS would have to undergo remediation to be hosted on Google PaaS. Also, being open source you can customize the CF platform to suite your needs (there is similar effort being carried out by Red Hat called OpenShift).

Finally, you might hear the term vBlock (or vBlock Systems) in context of VMware. VCE (Virtual Computing Environment) – the company which manufactures vBlock Systems was formed by collaboration of Cisco, EMC and VMWare. These vBlock systems racks contain Cisco’s servers & switches, EMC’s storage and VMware virtualization. There are quite a few service providers using vBlock, to create their own set of cloud offerings and services.

Hope this helps!

Overview of Google Cloud Platform

In next few posts, I will try to give a brief overview of major Cloud Computing platforms. As I started writing this post, it reminded me of an incident. Few years back I was chatting with a Microsoft Architect. He proudly told me that if Google were to shut tomorrow, none of the enterprises would care about it. Well, since then things have changed. From a provider of search engine, email and mobile platform (Android), Google has made it ways into enterprises. To add another experience, recently I was visiting a fortune customer and saw one of the account managers using Gmail. While my first reaction was he shouldn’t be checking his personal emails at work (we were discussing something important), he, in fact, was replying to an official email. I learned from him that they were among the early adopters of Google Apps. With those interesting anecdotes, below is quick overview of Google cloud platform.

Google Apps – You can think of Google Apps as a SaaS offering more on the lines of Microsoft Office 365. It includes Gmail, Google Calendar, Docs, Sites, Videos, etc. Value proposition is – you can customize these services under a domain name (i.e. white label). Google charges per user monthly fee for these services (this fee is applicable to Google Apps for Business; Google also offers a free version for educational institutions under brand Google Apps for Education). In addition, Google has created a market place (Google Apps Marketplace), where organizations can buy third party software (partner ecosystem) which further extends Google Apps. As you would expect, Google also provides infrastructure and APIs for third party software developers.

Google Compute Engine – GCE is the IaaS offering of Google. Interestingly, it offers sub hour billing calculated at minute level with minimum of 10 minutes. For now only Linux images / VMs are supported. Here’s a Hello World to get started with GCE. Note that you need to setup your billing profile to get started with GCE.

Google App Engine – GAE is an ideal platform to create applications for Google Apps Marketplace. A PaaS offering from Google – easy to scale as your traffic and data grows. Like Microsoft’s Windows Azure Web Sites, you can serve your app from a custom domain or use a free name on appspot.com domain. You can write your applications using JAVA, Python, PHP or Go. You can download respective SDKs from here along with a plugin for Eclipse (SDKs come with an emulator to simplify development experience). With App Engine you are allowed to register up to 10 applications per account – and all applications can use up to 1 GB of storage and enough CPU and bandwidth to support an application serving around 5 million page views a month at no cost. Developers can also use NoSQL (App Engine Datastore) and relational (Google Cloud SQL) stores for storing their applications data. Google Cloud Storage a similar offering to Windows Azure Blob Storage, allows you to store files and objects up to terabytes in size. App Engine also provides additional services such as URL Fetch, Mail, Memcache, Image Manipulation, etc. to help perform common application tasks.

Google BigQuery – BigQuery is an analytic tool for querying massive datasets. All you need to do is move your dataset to Google’s infrastructure. After that, you can query data using SQL-like queries. These queries can be executed using a browser or command line or even from your application by making calls to BigQuery REST API (client libraries are available for Java, PHP and Python).

So, in a nutshell these are the major offerings of Google Cloud platform encompassing SaaS, PaaS and IaaS. Google Apps appears to be the most widely used of all offerings, with Google claiming more than 5 million businesses running on it.

Hope you found this overview useful.

RTO vs. RPO

While talking of IT Service Continuity planning, an IT aspect of Business continuity planning, terms RTO and RPO have become a common place. While both terms can have different meaning depending on the context, for IT they largely represent acceptable downtime or time to recover IT operations to normal. Below is a brief overview.

RTO – Recovery Time Objective is permissible system downtime after a breakdown event. If downtime exceeds this limit, it’s bound to cause impact to the business (most likely financially). RPO – Recovery Point Objective is the permissible time of data loss during a failover. Though RPO is an involved term, for simplistic example consider a RPO limit of 2 hours set by company X – this could translate that during a disaster event when the secondary site is activated the data loss (sync window) between primary and secondary shouldn’t be more than 2 hours.

Normally, there isn’t one RTO and RPO for a given organization, rather is different and attributed to the service / system in context. Systems with aggressive RTO / RPO are costlier to run compared to the ones with relaxed guidelines. Most enterprises mandate SLAs around RTO / RPO from their service providers. Also, If your primary focus is just around databases you can pick up one of these approaches. Please leave your comments below with additional thoughts on this topic.