Cluster computing

Tuesday, January 14, 2014

We discuss PHP today from the LAMP stack. PHP is embedded in HTML. It gets translated to HTML from the server before it reaches the client. PHP is wrapped with <?php , <? or <script language="php">
PHP works with virtually all web server software and most databases. PHP configuration is handled with the php.ini file. Single line comments start with #. Multi line printing starts with a label such as print <<< END or with quotes. PHP is whitespace and case sensitive. The data types include integers, doubles, boolean, NULL, strings etc. Variable scope includes local variables, function parameters, global and static variables. Like C, we have __LINE__, __FILE__, and __FUNCTION__ Loops are similar as well.
Arrays can be numeric, associative and multidimensional. Dot operator is used to concatenate strings. fopen, fclose, fread and filesize are some file methods. functions are declared with the function keyword such as
function writeMessage()
{
echo "Hello";
}
Reference passing can be done with the & operator both in declaration and invocation. Default values for function parameter can be set.


<?php

function cURLcheckBasicFunctions() 

{ 

  if( !function_exists("curl_init") && 

      !function_exists("curl_setopt") && 

      !function_exists("curl_exec") && 

      !function_exists("curl_close") ) return false; 

  else return true; 

} 



// declare

if( !cURLcheckBasicFunctions() ) print_r('UNAVAILABLE: cURL Basic Functions');

$apikey = 'your_api_key_here';

$clientId = 'your_client_id_here';

$clientSecret = 'your_client_secret_here';

$url = 'https://my.api.com?api_key='.$apikey;

$ch = curl_init($url);

$fields = array(

'user' => urlencode('user'));



//url-ify the data for the POST

$fields_string = '';

foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; }

rtrim($fields_string, '&');        

curl_setopt($ch, CURLOPT_URL, $url);

// curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ) ; 

// curl_setopt($ch, CURLOPT_USERPWD, $credentials);

// curl_setopt($ch, CURLOPT_SSLVERSION, 3); 

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);                        

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);

curl_setopt($ch, CURLOPT_POST, count($fields));

curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string);



//execute post

if(curl_exec($ch) === false)

{

    echo 'Curl error: ' . curl_error($ch);

}

else

{

    echo 'Operation completed without any errors';

}



//close connection

curl_close($ch);

?>

This is a sample REST call using curl and PHP

Monday, January 13, 2014

We continue our discussion on EJB in this post. We would like to talk about persistence.
Persistence can be via Entity beans. Persistence can also be via Hibernate and ORM tools. We will discuss both. Bean persistence can be managed via two ways: Container managed persistence (CMP) and bean managed persistence (BMP). In the CMP, the container manages the persistence of the bean. In the latter, the developer implements the lower level persistence mechanism. Typically the latter is used when the CMP limits have been exceeded.
In CMP, entity bean data is managed automatically by the container with a mechanism of its choosing. A container implemented on top of a RDBMS may manage persistence by storing each bean's data as a row in a table. Serialization may also be builtin.
A persistence manager is used to separate the management of bean relationships from the container. The container manages security, transactions etc while the persistence manager manages different databases via different containers. Using this architecture allows entity beans to become more portable across EJB vendors.
Entity beans are mapped to a database using a bean-persistence manager contract called the abstract persistence schema. The persistence manager implements and executes find method based on EJB QL.
The persistence manager generates a mapping of the CMP objects to a persistent data store object. Persistence data stores can vary from relational databases, flat files, and Enterprise Resource Planning (ERP) The mapping is based on the information provided in the deployment descriptor and the bean's abstract persistence schema.
The CMP entity bean and the persistence manager use a contract to define bean to bean, bean to dependent and even dependent to dependent object relationships within an entity bean. When EJB is deployed, the persistence manager is used to generate an instantiated implementation of the EJB class and its dependents using the XML deployment descriptor and the bean class. The instantiated implementation will include the data access code that will read and write the state of the EJB to the database at runtime. The persistence manager also manages the state. The persistence manager generates subclasses for use with the container instead of the abstract classes defined by the bean provider.
CMP is great for database independence and container specific features. The cons are container supported algorithms, portability to other EJB containers, the developer has no access to the view and the generated SQL is not the most efficient.
BMP pros are that it is container independent, standards based (using EJB and JDBC APIs) , has ability to support nonstandard data types, has flexibility for validation logic, and can take advantage of non-standard SQL features. The cons include the following: it is database specific, requires knowledge of SQL, and takes longer to develop.

In this post, we continue our discussion on Java EJB.
The bean class methods must be declared as public and must not be final or static. The throws clause may define arbitrary application exceptions. and the bean class must not define the finalize() method.
The beans can be specified as stateless, stateful and message-driven. Injecting resource dependency into a variable or setter method is done by the EJB container. PersistenceContext can also be injected using the data type EntityManager and the corresponding imports. Then the entity can be found using the EntityManager object
Person p = em.find(Person.class, 10);
New entities can be added as em.persist(p);
Specifying interceptors are optional. The Interceptors annotation is used to annotate the interceptor class with the bean class. The method in the interceptor class annotated with the AroundInvoke annotation then becomes the interceptor method for the bean class.
The PostConstruct and PreDestroy annotations are for notifications immediately after the dependency injection and before first business method and for releasing resources respectively.
EJB Access can be secured with the DeclareRoles, RolesAllowed, DenyAll, PermitAll and RunAs annotations.
Once the Java source code is compiled, deployment descriptors can be created or edited.
Exception handling can be done by try catch blocks and the methods can mention the exceptions
There are several things to consider for a good java EJB program. Sime of these are:
Modular code
Design patterns
Layering
Performance
Security

Sunday, January 12, 2014

This blog post is on study material for Sun certification for J2EE:
Architecture considers the functions of components, their interfaces, their interactions and components. The architecture specification helps with the basis of for application design and implementation steps. This book mentions that flexible systems minimize the need to adapt by maximizing their range of normal situations. In J2EE environment, there could be a JNDI agent - Java Naming and Directory interface agent who knows what systems elements are present, where they are and what services they offer.
Classes and Interfaces for an Enterprise JavaBeans Component include the following: Home(EJBHome) Interface, Remote(EJBObject) Interface, XML deployment descriptor, Bean class, and the Context objects. The Home interface provides the lifecycle operations (Create, remove, find) for an EJB. The JNDI agent is used by the client to locate an EJBHome object.
The remote object interface provides access to the business methods within the EJB. An EJBObject represents a client view of the EJB. The EJBObject is a proxy for the EJB. It exposes the application related interfaces for the object but not the interfaces that allows the container to manage and control the object. The container implements the state management, transaction control, security and persistence services transparently. For each EJB instance, there is a SessionContext object and an EntityContext Object. The context object is used to co-ordinate transactions, security persistence and other system services.
package examples
public interface Service {
   public void sayBeanExample();
}
@Stateless
@TransactionAttribute(NEVER)
@Remote({examples.Service.class})
@ExcludeDefaultInterceptors
public class ServiceBean
implements Service
{
   public void sayBeanExample() {
    System.out.println("Hello From Service Bean!");
}
}
The import statements are used to import say the metadata annotations, the InvocationContext that maintains state between interceptors etc. The @Stateful specifies that the EJB is of type stateful, the @Remote interface specifies the name of the remote interface, the @EJB annotation is used for dependency injection and specifies the dependent "ServiceBean" stateless session bean context. The @Interceptors and @ExcludeClassInterceptors specifies that the bean is associated with an Interceptor class and that the interceptors methods should not fire for the annotated method respectively. The @PreDestroy method is used for cleanup

We looked at a Node.js backbone solution from the book JumpStart Node.js. The book continues with an example for a real-time trades appearing in the browser. Here we add a function to store the exchange data every time it changes. The first task is to store the data, then its transformed and sent to the client.
Instead of transmitting the entire data, it is transformed. When the client makes the initial request, we transmit this data. Client side filters could change. Hence its better to use templates. We can use JQuery's get function to retrieve the template and send an initial 'requestData' message to the server so that the initial data can be sent.
As before, we use the initialize to call render function. We iterate through all the models and render each row individually with a separate view for the real-time trades. With the static template, this is now easier to render than when string was used. With the data loaded, its easier to handle just the updates with a separate method.
Heroku can be used to deploy the Node.Js application.
Express.js supports both production and development settings. One of the differences between the two is the settings that handle errors. In development, we want as much error information as possible while in production we lock it down.
We also provide a catchall handling any request not processed by prior routes.
When it comes to hosting the application, there are several options available - such as IaaS, PaaS, or Amazon's EC2. However, the cost of all this convenience is the loss of control. In general, this is not a problem and the convenience is far worth it.
For those choosing to deploy on either a dedicated server or EC2, it is better to use an option that frequently restarts the application upon any crash or file change. A node-supervisor helps in this regard but for production it is better to use the package forever since it has minimal overhead.
Version control and Heroku deployment should go together in order that we can do rollbacks on unstable change. With incremental changes, 'git push heroku master' could then become a habit.
We did not cover Socket.IO and scoop.it

Saturday, January 11, 2014

Today I'm going to read from a book called Jumpstart Node.js by Don Nguyen.
Just a quick reminder that the Node.js is a platform to write server side applications. It achieves high throughput via non-blocking I/O and a single threaded event-loop. Node.js contains a built-in HTTP server library so Apache or Lightpd is not required.
The book cites an application WordSquared as an introduction to applications in Node.js. This is an online realtime infinite game of Scrabble.
Node.js is available from GitHub via a package manager
Over the http server is a framework called Connect that provides support for cookies, sessions, logging and compression, to name a few.On top of Connect is Express which has support for routing templates and view rendering engine.
Node.js is minimalistic. Access to web server files is provided via fs module. express and routes are available via express and routes module.
Node.js allows callback functions and this is used widely since Node.js is asynchronous. for example
setTimeout(function(){console.log('example');}, 500); The line after this statement is executed immediately while the example is rendered after the timeout.
Node.js picks up changes to code only on restart. This can become tedious after a while, so a node supervisor is installed to automatically restart the system upon changes to file.
MongoLab is a cloud based NoSQL provider and can come in useful for applications requiring a database.
Backbone is the MVC framework for the Node.js. It can be combined with the Node.js framework to provide a rich realtime user interface.
To create a custom stock ticker, a filter for the code of the stock could be implemented. When the user submits a request, Backbone makes a request to the server-side API. The data is placed into a model on the client side. Subsequent changes are made to the model and bindings specify how these changes should be reflected in the user interface. To display the view, we have an initialize and setVisibility function. In the initialize function, a change in the model is bind-ed to the setVisibility function. In the latter we query the properties and set the view accordingly. When the filtering is applied, the stock list is thus updated.

In the previous post, we examined some tools on linux to troubleshoot system issues. We continue our discussion with high cpu utilization issues. One approach is to read logs. Another approach is to take a core dump and restart the process.The ps and kill command comes very useful to take a dump. By logs we mean performance counters logs. For linux, this could come with sar tool or vmstat tool that can run in sampling mode. The logs help identify which core in a multicore processor is utilized and if there's any processor affinity to the code of the process running on that core.
User workloads is also important to analyze if present. High cpu utilization could be triggered by a workload. This is important to identify not merely because the workload will give insight into which component is being utilized but also because the workload also gives an idea of how to reproduce the problem deterministically. Narrowing down the scope of the occurrence throws a lot of light into the underlying issue with the application such as knowing when the problem occurs, which components are likely affected, what's on the stack and what frame is likely on the top of the stack. If there are deterministic steps to reproduce the problem, we can repeatedly trigger the situation for a better study. In such cases the frame, the source code in terms of module, file and line can be identified. Then a resolution can be found.
Memory utilization is also a very common issue. There are two approaches here as well. One approach is to have instrumentation either via linker or via trace to see the call sequences to identify memory allocations. Another approach is to use external tools to capture stacktraces at all allocations so that the application memory footprint can show which allocations have not been freed and the corresponding offending code. Heap allocations are generally tagged to identify memory corruption issues. They work on the principle that the tags at the beginning and the end of an allocation are not expected to be overwritten by the process code since the allocations are wrapped by tags by the tool. Any write access on the tags is likely from a memory corrupting code and a stack trace at such a time will point to the code path. This is very useful for all sizes of allocations and de-allocations.
Leaks and corruptions are two different syndromes that need to be investigated and resolved differently.
In the case of leaks, a codepath may continuously leak memory when invoked. Tagging all allocations and capturing the stack at such allocations or reducing the scope to a few components and tracking the objects created by the component can give an insight into which object or allocation is missed. Corruption on the other hand is usually indeterministic and can be caused by such things as timing issues. The place of corruption may also be random. Hence, it's important to identify from the pattern of corruption which component is likely involved and whether there can be minimal instrumentation introduced to track all such objects that have such a memory footprint.