Niraj Bhatt – Architect's Blog

Ruminations on .NET, Architecture & Design

Category Archives: Analytics

Overview of Google Cloud Platform

In next few posts, I will try to give a brief overview of major Cloud Computing platforms. As I started writing this post, it reminded me of an incident. Few years back I was chatting with a Microsoft Architect. He proudly told me that if Google were to shut tomorrow, none of the enterprises would care about it. Well, since then things have changed. From a provider of search engine, email and mobile platform (Android), Google has made it ways into enterprises. To add another experience, recently I was visiting a fortune customer and saw one of the account managers using Gmail. While my first reaction was he shouldn’t be checking his personal emails at work (we were discussing something important), he, in fact, was replying to an official email. I learned from him that they were among the early adopters of Google Apps. With those interesting anecdotes, below is quick overview of Google cloud platform.

Google Apps – You can think of Google Apps as a SaaS offering more on the lines of Microsoft Office 365. It includes Gmail, Google Calendar, Docs, Sites, Videos, etc. Value proposition is – you can customize these services under a domain name (i.e. white label). Google charges per user monthly fee for these services (this fee is applicable to Google Apps for Business; Google also offers a free version for educational institutions under brand Google Apps for Education). In addition, Google has created a market place (Google Apps Marketplace), where organizations can buy third party software (partner ecosystem) which further extends Google Apps. As you would expect, Google also provides infrastructure and APIs for third party software developers.

Google Compute Engine – GCE is the IaaS offering of Google. Interestingly, it offers sub hour billing calculated at minute level with minimum of 10 minutes. For now only Linux images / VMs are supported. Here’s a Hello World to get started with GCE. Note that you need to setup your billing profile to get started with GCE.

Google App Engine – GAE is an ideal platform to create applications for Google Apps Marketplace. A PaaS offering from Google – easy to scale as your traffic and data grows. Like Microsoft’s Windows Azure Web Sites, you can serve your app from a custom domain or use a free name on appspot.com domain. You can write your applications using JAVA, Python, PHP or Go. You can download respective SDKs from here along with a plugin for Eclipse (SDKs come with an emulator to simplify development experience). With App Engine you are allowed to register up to 10 applications per account – and all applications can use up to 1 GB of storage and enough CPU and bandwidth to support an application serving around 5 million page views a month at no cost. Developers can also use NoSQL (App Engine Datastore) and relational (Google Cloud SQL) stores for storing their applications data. Google Cloud Storage a similar offering to Windows Azure Blob Storage, allows you to store files and objects up to terabytes in size. App Engine also provides additional services such as URL Fetch, Mail, Memcache, Image Manipulation, etc. to help perform common application tasks.

Google BigQuery – BigQuery is an analytic tool for querying massive datasets. All you need to do is move your dataset to Google’s infrastructure. After that, you can query data using SQL-like queries. These queries can be executed using a browser or command line or even from your application by making calls to BigQuery REST API (client libraries are available for Java, PHP and Python).

So, in a nutshell these are the major offerings of Google Cloud platform encompassing SaaS, PaaS and IaaS. Google Apps appears to be the most widely used of all offerings, with Google claiming more than 5 million businesses running on it.

Hope you found this overview useful.

Pivot, PivotTable and PowerPivot

OK, I am not a data guy. Or let me say I am not an Excel (MS-EXCEL) guy. I have always believed that if you are good at PowerPoint (along with Visio) you probably are an architect and if you are good at Excel you are most likely to be a manager 🙂 . But not knowing Excel well, has always given me a feeling of – “something is missing”. After all, it’s the same data, but the way few people present it, they just make it look so good. While it takes time to develop those skills, understanding Pivot and related features is usually a good step in that direction.

Excel primarily is a spreadsheet tool in which you can dump some data. At times though, data is not everything. For e.g. your retail transactional system captures the sales across product lines, but if a manager needs to predict his stocks for next fiscal based on historic sales, he got to make some sense out of that data. That’s where you got to Pivot the data. The word ‘Pivot’ as Wiki generally describes it – is center point of any rotational system. You have seen them in action while helping your kids on a seesaw. Pivot table on other hand is a data summarization tool found in spreadsheet softwares like Excel. If you have to bridge this tool and the word ‘Pivot’ – you can think you are bending your data to a specific viewpoint using functions like sort, total, average, count, etc. Let’s see it with help of a trivial example. Consider below the sample data in Excel. To bend it to monthly stock view you can click on Insert tab and insert a Pivot Table. This would give you an empty pivot table and a PivotTable field list. Drag and drop your fields to specific areas to bend your data to a viewpoint (N.B. you can have multiple fields for each area and control them via expand / collapse buttons and all your charting knowledge works as is with Pivot tables).

PowerPivot as the name suggests elevates the Pivot table feature to next level making it full-fledged self-service BI tool. An Excel Add-In, PowerPivot connects to almost any external data sources including SQL Server Analysis Services. Once fetched, data stays inside the workbook and using excellent compression techniques the file size on the disk is still relatively small. You can share your PowerPivot workbooks via SharePoint and configure refresh cycles for your workbook to ensure your team always makes their decisions based on recent data. The experience of pivoting though remains same as shown earlier.

Hope that helps 🙂 .