Wednesday, December 26, 2012

The changing face of software development

Change is inevitable and nowhere is it most apparent then in the software technology space. Surely we all adapt to these changes that are quite the norm in software development.  Here is a little retrospective view at ways in which software development, as we know it, is changing.

Developers are writing less code
Increasingly, developers are getting persuaded to write less code and instead use relevant, good quality, well tested frameworks and tools. The availability of several good quality open source and commercial frameworks have contributed to this trend.

Frameworks like Spring, Apache Commons, are not only time tested, but also quality metrics about these frameworks are readily available in the public domain. Gone are the days, when developers had to contend with heavy weight black box frameworks, with suspect and opaque quality, where developers were at the mercy of the vendors technical support.

Also modern popular frameworks are also remarkably non-intrusive in terms of their usage with applications that we are developing. No longer are we stuck coupling our applications to frameworks, especially when we don't want to.This means that alternative frameworks can be used, as drop in replacements. Also this makes its easier to segregate our application code / business logic and isolate it from changing technology.

Right tool for the right job
Ever tried to take out a screw with a hammer.Painful, time consuming and not pretty, about sums it up. Same is true for technology, choosing, the right technology and framework to suit your requirement was always important, but has now become a critical factor in success.

Knowing the features, flexibility, non-functional characteristics, learning curves, of the plethora of frameworks available is one thing,but then having the good judgement to choose the right framework to suit requirements, is a very much in-demand skill, that is now an implicit expectation from all good developers and teams.

In the application development space, the ability to write tonnes of code, is no longer as lucrative as the ability to spot the right tools and frameworks to use, during development.

Read-Understand-Apply cycle is shortening
While the Read-Understand-Apply cycle was always shorter and sharper for those in software development as compared to other professions, with rapidly changing software technology, this cycle is getting even shorter.

Gone are the days, when reasonable time frames for doing end to end proof of concepts were in months. With cloud infrastructure and platform services and instant provisioning and ready made software stacks, pilots and prototypes, are more about integration, and reasonable time frames for implementations have shrunk drastically.

Skills, such as the ability to quickly understand new technologies and frameworks,and to integrate them in meaningful ways to realize business requirements, are very much in demand. Unfortunately, like in coding, there are no standard measures of how well a developer can understand, adopt and apply tools and technologies, to build applications.

Leverage the internet wide knowledge bases
Forget social media based surveys, the most prolific examples of using crowd sourcing, are the way developers are asking for and getting answers, for all their technical queries on the internet, using websites like

Developers are the ones who can best help other developers and internet is turning out to be an incredible platform for sharing, storing and looking up technical queries. Wikis, good old bulletin boards, discussion forums, email archives, you name it. Trouble shooting, using a huge knowledge base called the internet, is really changing the way we are developing software.

Every problem encountered, has a solution just a web search away. Before we get carried away, there are limitations and drawbacks to excessive use of such practices. But on the whole, the ability to use the internet knowledge base to our advantage when doing software development is unarguably, as very good to have skill.

I am often tempted to give candidates for developer interviews, a topic they know little about, and ask them to get meaningful information or a high level solution based on information available on the internet Just as an exercise to test their potential, to be able to use the vast internet knowledge base.

Listen to the end users
A very positive fallout of the widespread usage and adoption of Agile development practices is that, developers no longer rely on requirements document alone. They are having their sprint deliverables, validated  by product owners periodically. This lends to more frequent and involved interaction between end users and developers. Developers are now required to be more aware of business terms and vocabulary, business processes and business priorities. The communication skills that are required for these added responsibilities of the developer also need to be taken into account.

While we may be aware of most of these changes, they somehow never make it, into the way we recruit new developers, into our teams. Here is hoping we start using, some of the points highlighted above, when we are choosing teams for doing application development.

Please feel free to add to the list of points mentioned above, would like to have newer ideas and perspectives.

Yours truely,

Tuesday, October 23, 2012

Summary of use cases, using hadoop in enterprise

  • Need to analysis / summarize / query / store unstructured or semi-structured data. Example:
    • logs
    • sensor data
    • emails
    • blogs
    • web content
    • DOCs / PDFs
    • images
    • videos
  • Ability to support multiple data sources that are producing very disparate and unstructured data
  • Rate at which data is generated is very high, continuous and unpredictable ( say 1 TB per day or per cycle)
  • Data to be analyzed is massively distributed. eg logs
    • Not possible to intercept data being generated at single / known source
  • Using traditional ETL batch processes to summarize data is too time consuming or impractical or expensive
    • Moving all the big data to one storage area network (SAN) or ETL server becomes infeasible with big data volumes. 
    • Even if you can move the data, processing it is slow, limited to SAN bandwidth, and often fails to meet batch processing windows.
  • There is a need to run analytics on raw data
    • Queries that will be run on raw data are not determinate and hence, criteria / parameters for summarizing data are not know upfront
  • Huge amount of data needs to be retained on cheap commodity hardware
    • Using expensive storage, used by RDBMS is not feasible
  • To be continued

Friday, October 12, 2012

Demystifying AOP, Getting started with LTW (load time weaving)

Often, using aspectj AOP, especially with LTW (load time weaving) is shrouded in mystery.
Thought I would write up a little note about getting started with Aspectj AOP LTW.

here goes...all you would need is a simple eclipse java project looking like below.

We would need 
  1. an aspect source file -
  2. a sample service which will get AOP-ed -
  3. a test class with main method -
  4. an Aspectj, LTW related config file - META-INF\aop.xml
  5. aspectjrt-1.7.0.jar and aspectjweaver-1.7.0.jar in your project classpath is the logging aspect. For further details about writing aspectj aspects please refer Listing of the simple aspect in java is below.

package com.ghag.rnd.aspects.ltw;

import org.aspectj.lang.ProceedingJoinPoint;

import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Pointcut;

public class MySimpleLoggerAspect {
@Pointcut("execution(* *(..))")
public void myTraceCall() {
public Object myTrace(ProceedingJoinPoint joinPoint) throws Throwable
System.out.println("myTrace:before call "
Object retVal = null;
retVal = joinPoint.proceed();
System.out.println("myTrace:after call "+
+"."+joinPoint.getSignature().getName() + " retval=" +retVal);
return retVal;

} is as easy as given below:
package com.ghag.rnd.aspects.sample;

public class SampleService {
public String doService(String in){
System.out.println("inside doService");
return in;
} listing is a few more lines of code:
package com.ghag.test;

import com.ghag.rnd.aspects.sample.SampleService;

public class Tester {
public static void main(String[] args) {
new SampleService().doService("Ganesh Ghag");

And finally the META-INF\aop.xml listing is simple a self explanatory, especially the package names ;-)
<aspect name="com.ghag.rnd.aspects.ltw.MySimpleLoggerAspect" />
<!--  <weaver options="-verbose -debug -showWeaveInfo"> -->
<include within="com.ghag.rnd.aspects.sample.*" />
<include within="com.ghag.rnd.aspects.ltw.*" />

Now when you run, just ensure you have the following paramater supplied as JVM argument:
-javaagent:/your dev env local /path/to/aspectj\aspectjweaver-1.7.0.jar

Thats it folks, when you run,, SampleService call will get AOP-ed and give following output:
myTrace:before call com.ghag.rnd.aspects.sample.SampleService.doService
inside doService
myTrace:after call com.ghag.rnd.aspects.sample.SampleService.doService retval=Ganesh Ghag

Getting started with AspectJ AOP with LTW is that easy, folks!

Monday, September 3, 2012

Relevance of Hadoop in Enterprise Application Development

RDBMS compared to hadoop/mapreduce (taken from book 'hadoop a definitive guide')

datasize - gigabytes                     v/s   petabytes
access - interactive and batch      v/s   batch
updates - read and write many times v/s write once read many times
structure - static schema              v/s   dynamic schema
integrity - high                             v/s    low
scaling - non linear                      v/s    linear

Enterprise apps are generally characterized by data volumes in gigabytes rather than petabytes, they are required to be interactive, a user makes a request and instantly reports should show it. Apps often work on transactional data (insert/update/delete), rather than write once, read always. Also enterprise apps need high data integrity.
 In an enterprise app development scenario, most applications are in control of deciding, whether the data they store will be structured or unstructured. why would they chose to store data in unstructured form? if all applications within enterprise store data in structured form, when one enterprise app needs data from another one, it will just access the structured data in that app, thru formal interfaces like direct DB access, soap, rest or a variety of other EAI options, with support for transactions and other sharing semantics.

Enterprise apps may contain data in the form of uploaded documents, etc. this can represent unstructured data, using tools like SOLR and lucene, this unstructured data can be indexed and stored off and searched relatively easily.

Within an enterprise it would not make much sense to gather stats from logs etc, unless its for purposes like finding app features usage through click model analysis. but there too, batch processes can analyse logs periodically and  summary tables can very well store the data, instead of raw logs being retained for years and years and then running hadoop based computations on the raw data

Most hadoop use cases seem to exhibit following characteristics:

  • the data under analysis seems to be unstructured and not under control of design, so that traditional RDBMS/data warehousing/ETL alternatives to hadoop cannot be employed
  • there is a need to retain the raw data as it is, so they need a cheap, scalable and commodity hardware based distributed file system anyways
  • The data is of the order of tera bytes or more rather than giga bytes
Hadoop seems to be well suited for uses such as for a internet search engine, where-in:
  • not only is there massive data of the order of terabytes and petabytes, but more importantly, 
  • this data is produced at a very fast rate, so conventional design of a batch process, summarizing this data periodically is not feasible from economics point of view
  • the massive rate at which data is being generated is not under control or determinate
  • sources which generate the data are massive in numbers and highly distributed.
  • also data sources are unknown and cannot be intercepted. eg it would not be feasible to ask twitter to for an event every time there is a tweet from someone in the world

Most enterprise applications seem to exhibit following characteristics:

  • An enterprise application produces and consumes data in various formats
  • The data produced by the application can be controlled and forced to be in structured format to some extent, including say documents uploaded by users, can be indexed and searchable
  • The rate at which data is produced is not very high like internet scale and also is not uncontrolled or indeterminate
  • data sources are well known and often can be intercepted. eg. you can ask another enterprise app to send out an async message, or poll its database or use some other notification mechanism at the source where data is generated
  • The data consumed by enterprise application is mostly from other enterprise apps, through well established EAI and notifications mechanisms
  • If an enterprise needs huge data from external sources for analytics purpose, there can be ready-made tools, COTS applications which can mine, public domain or external data and give summaries to the enterprise application. The enterprise application should not have to get that external data onboard and write their own hadoop jobs to process that data. Hadoop based analysis tools should ideally find their way into integration with enterprise applications.
  • for example if I were an automobile / finance company with a robust IT applications requirement, to get marketing data I would not think of asking my IT to use hadoop to get that data, instead, i would buy that data, or atleast use tools for getting the data, since that work is not specific to my enterprise. contrast this to having Soap or REST or EAI expertize in my IT, so that all apps in my enterprise can integrate.
I have not yet worked on any hadoop based application nor can I claim extensive experience in big data.
Still I thought putting across my thoughts based on experience in enterprise app development, might bring up questions, related to hadoop relevance in enterprise, that are also bothering minds of other enterprise architects


structured data is data that is organized into entities that have a predefined format like XML docs or database tables. this is realm of rdbms. semi structured data like spread sheets and unstructured data that does not have any internal structure like plain text or image data. map reduce works well on data that is semistructured or unstructured.

Some hadoop use cases mentioned in the book:

  • using user generated track listening data to produce diff charts like weekly charts for top tracks per country and per user. users listen to tracks using own client or one of hundreds of third party clients apps.

  • producing daily and hourly summaries over large amounts of data
    • reports based on these summaries are used to drive engg and non-engg team product decisions. these summaries include reports on growth of users, page views, average time spent on site by users
    • providing performance numbers about ad campaigns that are run on facebook
    • backend processing of site features like "likes" on people, applications etc
  • running ad-hoc jobs over historical data, for analysis for product team
  • as de-facto long term archival store for log datasets
  • to lookup log events by specific attributes, to maintain site integrity and protect users against spam bots
  • complementing existing data warehousing infrastructure, by storing unstructured data

Rack Space

  • Log processing
    • hadoop is used to process logs that are generated by user interactions and end result is in lucene indexes that custmer support can query
  • to improve scalability of rdbms we resort to sharding which causes us to lose analysis of entire data. instead of sharding decided to use hadoop. scaling is linear and processing of raw data can be done parallely, using same algos for small, large or extremely large datasets

hypothetical use cases

  • advertiser insights and performance
    • advertizers have to be provided standard aggregated stats about their ads
  • ad hoc analysis and product / features feedback
  • data analysis for 
    • websites
    • bio informatics
    • oil etc explorations companies

Monday, June 18, 2012

Challenges in using HTML5 based javascript frameworks

Html5 by itself, is not a framework, so we end up using some HTML5 based frameworks. Lots of times these frameworks are also javascript based. But this has its own challenges.

Cross browser compliance is a moving target

A good "tricky" part of any javascript framework goes into trying to attain cross browser compliance. this is such a tricky deal, that javascript frameworks suffer "major" changes in moving from versions x.y.1 to x.y.2. Given the dynamic nature of javascript, application teams are wary of upgrading to newer versions of framework and some prefer to "customize" the framework. Soon, 50% of application team resources turn into "javascript/framework specialist" whose sole job is to "maintain/customize" the framework. This drains application resources and makes framework brittle while the cross browser target is still moving further and further away ;-)

Performance of ui in the browser

Due to complicated DOMs, javascript loading & execution and indiscriminate use of ajax ( I once saw a page trying to use 6 js components, each issuing its own ajax request), UI performance in browser will be major concern. Trying to build "slick" UIs, developers often incur 2-3 times server side times, in browser itself.

Javascript challenges

Javascript being a dynamic language, all issues and problems manifest at runtime, no compiler to help here.
it is oh-so-easy to override behaviour in existing frameworks and there are so many ways to achieve same result. Like all dynamic languages, power without responsibility, on part of developers can lead to code mess and undue complexity in javascript.

The regression testing nightmare

UI are enough of a nightmare for regression testing, when u add a "flexible" dynamic language like javascript to the mix, things really turn nasty.

"Single Source Componentization" difficult to achieve

Components tentacles of such frameworks are spread out over html markup, css, framework javascript and custom javascript, there is no single source UI component (in a traditional sense)

Javascript a necessary evil

.... But everything said and done, javascript frameworks are a necessary evil for web apps that are characterized by the following:
UI requirements are 'content-centric' meaning, UI is dictated by end users and will be changed and customized. Strictly component based frameworks will fragment and provide little reuse under such requests for high level customizations almost akin to a web-site UI

At the other end of the spectrum are UIs in which end users dont have much of a say, eg. operations or monitoring UIs, where arguably, the requests for customizations will not be that high and also, UI design can be dictated and fixed to reasonable extent by fewer stakeholders. For such UIs, strictly component based frameworks (as opposed to content-centric framework) might be a good match.

Types of java script based frameworks

Among the java script frameworks also there are frameworks like 

  • jquery UI which decorate existing html markup and use javascript over it to provide UI components
  • Also there is an emerging class of pure javascript frameworks which lend to componentization and MVC, like Sencha Ext.


Frameworks like GWT and Vaadin(call it GWT on steroids) are also viable alternatives, if you dont need too many UI customizations and thereby dont really need, control over html, css and javascript

The server side interfaces

Choose a UI framework which plugs into your server side using REST, this will ensure that the server side need not be changed at all, when you build mobile clients as alternative UIs. Additionally security considerations will be required when you think of exposing your servers over internet, so they can be accessed via your mobile clients.