Cluster computing

Thursday, December 12, 2013

Continue from the book reading:
Here are some of the Apache tricks from the book:
Enabling mod_rewrite: This is an Apache module that's used for URL rewriting, keeping query-string input safe, and keeping the site search-engine-friendly and used for making URLs friendly and to stop hot linking.
A RewriteRule is a configuration directive that can provide a wide range of rewriting abilities. Rewrites can be for simple redirects to complex multiple-rule pattern matching substitutions.
A RewriteCond helps specify a condition for the rule. Http header values, request variables, server variables, time variables and some special variables can all be used inside the RewriteCond. RewriteLog and RewriteLogLevel are directives that can be used to specify a log file where a running record of the internal rewrite processing will be sent and for the verbosity of the log.
URL spell checking is accomplished with the mod_speling module. With mod_speling module, Apache server takes care of finding the intent of the user and suggesting the correction. The CheckSpelling is the configuration directive to be used.
Content Compression is achieved with the mod_deflate module. Since HTML and the resource are in an uncompressed state, this feature allows us to transparently compress the page markup, images, stylesheets and content and send it across the internet where it is uncompressed before rendering.
We use the AddOutputFilterByType directive to accomplish this. The DeflateCompressionLevel sets the level of compression used to shrink the files.
For databases, we create security zones using Apache and basic authentication. This is available via mod_auth_MySQL module. The configuration directives in this case are AuthName for naming a security zone, AuthType for Basic, AuthMySQLDB for dbname, AuthMySQLUser for userid, AuthMySQLPassword for password, AuthMySQLEnable to On, and require valid-user.
The communication between client and server can be set to be encrypted using SSL with the mod_ssl, an Apache module that SSL-enables a website that Apache controls. The configuration directive in this case is the OpenSSL and a certificate is used for the server Then a VirtualHost section is specified in the config file with this certificate.
Apache can also be used as a file repository with the WebDAV. WebDAV stands for Webbased Distributed Authoring and Versioning, enables Apache to allow users to treat an enabled directory as a remote directory or drive on their own computers..

Continuing from the book reading :
This book talks about code efficiency. Performance of code may depend on the machine configuration. For example, a little more memory can help tremendously with page swapping. To speed up a site, there are generally two categories of techniques - namely benchmark and profiling. Benchmarking is experimenting to determine the best approach for something before implementing it for real. Profiling is experimenting on the real thing to see how well it performs. In Benchmarking, there is a timer involved that is started before measuring the duration of some code execution and then stopped right after it. This start-stop is repeated for accumulating the durations. Further the timers could be labeled so that they can be started and stopped independently. The PEAR benchmarking class provides this functionality with its Benchmark_Timer.
Profiling can help you plan the hardware improvements to the existing code. Things like connections, processor, memory can tremendously improve the actual performance.
Web server improvements also will help improve actual performance. Apache is highly configurable. Apache has several httpd processes that run to handle the requests. More processes can be started when the load becomes high and then subsequently the excess can be shutdown. In some cases, these processes may be shut down even if there's work to do to reclaim memory that's been leaking. An httpd process may also crash failing the request but another process can take the same request again. For example, the configuration options to control the number of different httpd processes include MaxClients, MaxRequestsPerChild, StartServers and MinSpareServers. These are specified in the httpd.conf file.
With improvements in hardware, database and Apache, we can focus on code improvements next. PEAR coding standards come useful here.
PEAR provides caching framework Static version of pages can be generated and served instead of regenerating each time. The source code for the page can be stored in pre-parsed format that the Zend engine can readily execute. Even browser cache can be used to improve experience. The LastModified header comes useful to determine the last time the content was changed.
The Expires header can be used to determine how long the page should be valid for. Clients could also use the If-Modified-Since header that can be used by the server to generate a full response or send a Not-Modified response.
Furthermore, output buffering can be used to gather up the contents of the page and send it out all at once. This can be done by including ob_start() and ob_end_flush() through different sections of the html. The ob_start('ob_gzhandler') can be used to make sure the buffered output is compressed.
There are several different caching engines available to choose from. For example, these include Alternative PHP Cache (APC) which is a PHP opcode caching engine and comes with PECL/PEAR, Zend Engine which has also become part of PECL, eAccelerator which is another compiled state PHP caching engine, JPCache which is a memory/disk caching solution and is a PHP library that can be used to store the generated output of a PHP script to disk or in a SQL Server instead of saving the compiled state, and memcached which unlike the output caching systems of APC, eAccelerator and JPCache relies on the caching of backend code objects such as the database result objects and the data-model entities.

Wednesday, December 11, 2013

Summary of book reading on Professional LAMP web development:
This book is a good read even for those familiar with any part of the stack.
The changes introduced in PHP5 are listed as 1) passing variables based on reference by default as opposed to passing by value by default in PHP4.
2) In Php4 we had set_error_handler() to define a user function for error handling. In PHP5, we have the same try-catch-throw semantics as in other languages.
3) There is a builtin Exception class and it has message, code, file and line information.
4) Interfaces have been added to PHP5 and classes can implement multiple interfaces.
5) The Standard Public Library in PHP5 has introduced new set of classes and interfaces. Iterator class along with DirectoryIterator, RecursiveIterator and ArrayIterator are now available.
6) Constructors and destructors by way of __construct() and __destruct() have been added
7) Access modifiers, final, static abstract can be added to control your classes.
8) Instead of overload() in PHP4, we can now call a builtin methods.
MySQL has also some advanced features. For example, we can query multiple tables, do full-text searching, control access, analyze the database and maintain it. Tools such as PhpMyAdmin and MySQL Administrator GUI tool are available for some of these chores. The query text generated from these tools can give a better understanding of what the tool or the application does with the database.
Apache comes with the following features : URL rewriting, URL spell checking, content compression, using MySQL with Apache, Apache and SSL, Apache as a file repository and Summary.
Site Security can be tightened with the following features : controlling access, website attacks, keeping the system current and updated, updating PEAR and PECL packages installed with PHP, writing cron job to do automatic updates and by reducing the likelihood of register_globals exploit or SQL Injection attack. SQL exploits are avoided by initializing all variables.

Continuing from the previous post...
We read that the traffic data and services provided by Inrix overhauls the coverage, detail, availability and timeliness of traffic data and creates avenues for applications where there existed none. For drivers, this translates to more options for planning trips and less blind spots. With more coverage, there's more options and with up-to-date information, there's more predictability and time for evasive response.
For automakers, the detail in the data enables multimodal routing. In addition, the device map and platform independence enables ubiquity without the need to get TMC licenses per geography.
Inrix's traffic can benefit public sector agencies as well. Transportation agencies can better manage road network and historical traffic pattern. Public safety agencies can give more detailed information on road conditions. Emergency management agencies can get a better route to respond first to incidents.
Historical and present-day data can both be presented so that it can be used to plan high occupancy lanes, traffic signal patterns, road sensor deployment and road narrowing schemes. Overall, the timeliness and reliability of data is improved enabling better driving experience and reduced costs. Car manufacturers become more competitive and agencies operate safer.
Wide reachability across geographies and reduced mapping costs translate to savings.
Reducing traffic congestion and improving highway conditions now becomes easier. From short term management of incidents to long term planning of road networks, the insights from the traffic data can help. Furthermore, new avenues for technology providers and mobile applications are enabled.
The highlights for the XD Traffic are:
four million miles of real time coverage, including one million miles of roads never covered before.
independence from devices, maps and geography.
resolutions of up to 250 meter enabling greater detail in data.
Higher reachability across 37 countries with emerging markets
Improvements in traffic data on motorways, highways and arterial
Improvements in road closure and traffic analysis.

Tuesday, December 10, 2013

This post is from the white paper on Inrix web site : Fueling future mobility with big data.
This paper talks about how high quality traffic data and sophisticated analysis helps get people around quickly and efficiently. High quality traffic data helps in more than one way. First it improves journey times. We can see how by seeing more accurate data to the user and improving the satisfaction. Second, the traffic data is a layer that helps build applications for connected cars and smarter cities.
Mobile devices, applications and internet sites help provide digital maps that improve navigation. Navigation technology has become so ubiquitous its no longer a differentiating factor and is demanded as even a built-in feature across utilities. However, traffic data is different from navigation and there are several variables. First coverage for all available roads and not just the busy ones is a differentiating factor for traffic data. Coverage expands the choices for the routes that are not available otherwise and have frequently hurt the driver experience. Second, exact locations and dimensions of a traffic queue is critical to planning routes. This level of detail is generally not available with most providers. Traffic data has been sticky to specific maps and sometimes services, making it available widely and consistently has been a challenge. Timeliness of incidents reports to the driver is critical to knowing change of routes or other impact. There is an industry wide latency in providing such data.
Inrix strives to improve traffic data on all of these fronts with its traffic data service, hoping to make an impact to driving but in how driving shapes city planning. The source of the traffic data is the crowd and this is expected to increase rapidly with more penetration by applications and implementations. The coverage is improving where 1 million miles of road is now added where only 3 million miles were being covered. Moreover, the traffic can now be painted on any map, any device and in several countries.
With the rising popularity of public transit, I'm excited to see improvements in bus traffic locally here on the east side.

Monday, December 9, 2013

I got the following interview question:

Using the following function signature, write a C# function that prints out every combination of indices using Console.WriteLine() whose values add up to a specified sum, n. Values of 0 should be ignored.

public void PrintSumCombinations(List<int> numbers, int n);

· It’s okay to use additional private functions to implement the public function

· Be sure to print out the indices of numbers and not the values at those indices

· Don’t worry too much about memory or CPU optimization; focus on correctness

To help clarify the problem, calling the function with the following input:

List<int> numbers = new List<int> { 1, 1, 2, 2, 4 };

PrintSumCombinations(numbers, 4);

Should result in the following console output (the ordering of the different lines isn’t important and may vary by implementation):

0 1 2 (i.e. numbers[0] + numbers[1] + numbers[2] = 1 + 1 + 2 = 4)

0 1 3

2 3

4

Here is my hint: Generate the variations based on permutations and regardless of the content, then check each sequence for the expected sum.
public void Permute(ref List<int> numbers, ref List<int> candidate, ref bool[] used, int n)

{

if (candidate.Sum() == n)

{
candidate.ForEach(x => Console.Write(x.ToString() + " "));

Console.WriteLine();

}

for (int i = 0; i < numbers.Count; i++)

{
if (used[i]) continue;

candidate.Add(numbers[i]);
used[i] = true;

Permute(ref numbers, ref candidate, ref used, n);

candidate.Remove(candidate.Last());
used[i] = false;

}

}

For combinations, we could take different length substrings and permute them. There may be repetitions but we process just the same.

And here is another way to solve the problem
public static void Combine(ref List<IndexedNumber> numbers, ref List<IndexedNumber> candidate, ref List<List<IndexedNumber>> sequences, int level, int start, int n)

{
for (int i = start; i < numbers.Count; i++)

{
if (candidate.Contains(numbers[i]) == false)

{

candidate[level] = numbers[i];
if (candidate.Sum() == n)

sequences.Add(new List<IndexedNumber>(candidate));

if (i < numbers.Count - 1)

Combine(ref numbers, ref candidate, ref sequences, level + 1, start + 1, n);

candidate[level] = new IndexedNumber() { Number = 0, Index = -1 };

}

}

}

sample maven test
// modify pom.xml to include junit
import org.junit.Assert;
import org.junit.Test;

@Test
public void test()
{
Rectangle sample = new Rectangle(1,2);
double area = sample.Area();
Assert.assertEquals(area, 2);
}
}
mvn test

interface IMeasurable
{
double Area() throws illegalAccessException();
}
abstract class shape implements IMeasurable
{
volatile double x;
volatile double y;
shape (double a, double b)
{
x = a;
y = b;
}
double Area() throws illegalAccessException();
{
throw new illegalAccessException();
}
}
class Rectangle extends shape
{
Rectangle (double x, double y)
{
super(x,y);
}
final void PrintMe()
{
System.Out.PrintLn("I'm a rectangle with length : " + X + "and breadth :" + Y + "and Area :" Area());
}
double Area() throws illegalAccessException()
{
return x*y;
}
}