Tuesday, March 25, 2014

When and why to use NODE.JS



Node.js ! 

The new kid on the block, has definitely been turning main stream. So, you must have heard buzzwords like async io, server side javascript, etc getting associated with the node hype. Some remarks from the sceptics as well - you cant handle dynamic javascript with your immature development workforce or node.js is ok for hello-world apps but not necessarily for primetime on the enterprise applications landscape.

Well heres my attempt clearing up the mist, with regards to node.js and related frameworks...

When to use Node.js

As very rightly pointed out by the authors of node.js, it is an excellent fit for DIRT-y applications. DIRT stands for Data Intensive, Real Time.

How do Java EE web apps work

Lets start out with a traditional Java EE web applications hosted on say a tomcat container, when a 1000 users concurrently access the web application, for each user request a thread is spawned by the web container. Each thread, does processing, passing control through, various application layers like view, controller, service, data access to finally hit a database or backend, get the relevant data and bring it back to presentation tier from where it may be formatted like an html response and sent back to the client browser as an Http response. The hundreds/thousands of threads running in the web container, ensure scalability of the application. But each thread contains synchronous method calls. Each thread is held up, while the sync call completes.The best to emphasize this is through a database call such as
ResultData data = personDao.updatePersonDetails(person);
 ( above call blocks till all DB rows are fetched )

it is not uncommon for many threads in such an application to be waiting for IO either disk io or network io (as in a database call). As a result one can clearly observe that hardly 10%-20% of cpu is utilized on the web tier, most latency is due to threads waiting on IO.

Can the above latency be reduced?

Enter Node.js with its aysnc io

Node.js at its simplest and most fundamental level prefers non-blocking function calls, through incessant use of callbacks. For example the above blocking database call can be simplified logically to the following async call:
personDao.updatePersonDetails(person, function(err, numRowsAffected ){....} );
Above call issues request to database driver for update but does not wait for results, it continues to next line of execution, and when the database update has been made, the callback i.e. the anonymous functiona gets executed, returning back the error if any and the number of rows affected by the updated.

Though the above difference between using sync blocking functions v/s using async functions with callbacks may seem trivial, if we adopt the async non-blocking function calling as the default way of programming soon things start adding up.

Instead of having 1 thread dedicated for an http request and having thousands of such threads, in node.js we have async functions, which do processing in bits and dont hold up a single thread. The node.js model starts exhibiting more scalability, due to non-blocking nature of calls in the processing stack.

Where are the benchmarks?

Well, consider this, apache webserver uses the multi threaded model whereas nginx uses non-blocking IO, it has been proven several benchmarks that nginx with its non-blocking io shows roughly 20% better scalability than apache webserver. So even for serving static content, we know, async io is always going to be more scalable.

Can Node.js leverage multiple cores? Is it fault tolerant?

This was another valid criticism raised against node,js platform in early days of the framework. We need to understand that node.js server has only a single thread available for the application' usage (there may be others for doing the server's internal housekeeping). Due to the above limitation, node.js processing runs on a single core and is also susceptible to outage since one thread crashing can bring processing down for all clients. Thankfully multiple node.js servers can be run on the same machine with a load balancer upfront. Or node.js modules like "cluster", "upstart" and "forever" ensure, node.js servers can be horizontally scaled and will have features like guaranteed uptime and automatic restarts, on crash.

Node.js for realtime apps

Out of the acronym DIRT, we already saw how node.js is a good fit for "data intensive" applications, well its time to talk about one more category of applications, namely "realtime applications" for which node.js would be suitable.
Again lets consider, traditional web apps, which require to show data that changes frequently at the backend. The traditional approach using Http is for browser based javascript clients to "poll" the server using "auto-refresh" tags. This approach apart from loading up the server is also very unoptimized, instead newer full-duplex protocols like websocket, can be used by the servers to push content, messages, notifications to the client browsers, "whenever" the server events occur.

This event driven approach, to pushing from server to client is very compatible with node.js async and non-blocking APIs, hence there is excellent support in node.js based modules like socket.io, for server push. This makes node.js very suitable for web applications that require high interactivity, server push and notifications.

What is node.js and how useful is it to build real life web apps from scratch?

Well node.js core is really a http server and a platform for hosting web apps. The core node.js hence does not include any features which directly are used in the making of a web application. To create a real life web application, one needs a framework with many features like ability to host static content from the server's native file system, user authentication, session and cookie management, templatable views, nagivation and routing framework, binding model and views, error handling, request body parsing etc.
For providing all above functionalities the core node.js needs to have relevant modules added on top of it
Node.js provides a robust module system called npm stands for Node package manager, which helps in installing such node.js modules.
The most critical module which provides pluggable middle-ware is a module called connect.
On top of connect is a module called express.js which provides a full fledged web application framework.
More custom requirements for the web app can be fulfilled by installing appropriate modules freely available via the npm (node package manager).
Similarly, templating engines like jade and ejs can be plugged into frameworks like express.js, thereby enriching the node.js ecosystem for your web application development.

In subsequent articles, I will be walking through developing a basic node.js web application for doing CRUD (create, read, update, delete) into a mongo database. You will be pleasantly surprised how easy and elegant it is to develop web applications in node.js based frameworks.

Cheers!


2 comments:

Anonymous said...

Very useful overview of node.js in understandable language

Jaganlal said...

Neat and Simple buddy