Cluster computing

Friday, January 10, 2014

Troubleshooting high cpu usage in c++ programs on Linux requires the same discipline and order in debugging as it does anywhere else. The threads in the program could be hogging cpu because its spinning in a loop or because it's doing a computationally intensive routine. High CPU utilization may not always mean busy thread, it could be spinning. The uptime tool in Linux gives output for what has been happening for the last fifteen minutes. Using top, we can see what processes are the biggest contributors to the problem. Sar is yet another tool that can give more detailed information on the CPU utilization trends. The data is often available offline for use with tools like isag to generate charts or graph. The raw data for the sar tool is stored under /var/log/sa where the various files represent the days of the respective month.The ps and pstree are also useful tools for system analysis. ps -L option gives thread level information. mpstat command is used to report on each of the available CPU on a multiprocessor server. Global average activities among all CPUs are also reported. The KDE system guard (KSysGuard) is the KDE task manager and performance monitor. It enables monitoring of local and remote hosts. The vmstat tool provides information about processes, memory, paging, block I/O, traps and CPU activity. The vmstat command displays either average data or actual samples. The vmstat tool report cpu time as one of the following four:
us: time spent running non-kernel code (user time, including nice time)
sy: time spent running kernel code (system time)
id: time spent idle,
wa: time spent waiting for IO.
vmstat can run in a sampling mode.
Yielding between such computations is a good programming practice but to know where to add these instructions to relieve the cpu requires first to figure out who the culprit is.
Valgrind is a useful tool for detecting memory leaks.
The GNU C library comes with builtin functionality to help detect memory issues. however, it does not log the call stacks of the memory allocations it tracks. There are static code analysis tools that can significantly detect code issues much earlier.

Event monitoring software can accelerate software development and test cycles. Event monitoring data is usually machine data generated by the IT systems. Such data can enable real-time searches to gain insights into user experience. Dashboards with charts can then help analyze the data. This data can be accessed over TCP, UDP and HTTP. Data can also be warehoused for analysis. Issues that frequently recur can be documented and searched more quickly with the availability of such data leading to faster debugging and problem solving. For example, data can be queried to identify errors in the logs which could be addressed remotely.
Machine data is massive and generated in streams. Being able to quickly navigate the volume to find the most relevant information for triaging issues is a differentiating factor for the event monitoring software. Early warning notifications, running rules engine, detecting trends are some of the features that enable not only rapid development and test by providing feedback of deployed software but also increase customer satisfaction as code is incrementally build and released.
Data is available to be collected, indexed, searched and reported. Applications can target specific interests such as security or correlations for building rules and alerts. Data is also varied such as from network, from applications, and from enterprise infrastructure. Powerful querying increases the usability of such data. For example, security data may inform about threats, the ability to include non-security user and machine data may add insight into unknown threats. Queries could also cover automated anomaly and outlier detection that help with understanding advanced threats. Queries for such key valued data can be written using PIG commands such as load/read, store/write, foreach/iterate, filter/predicate, group-cogroup, collect, join, order, distinct, union, split, stream, dump and limit. The depth and breadth of possibilities with event monitoring data seems endless. As more and more data becomes available and richer and powerful analytical techniques grow, this will help arm the developers and operation engineers to better address the needs of the organization. Some of the differentiators of such software include the ability to have one platform, fast return on investment, ability to use different data collectors, use non-traditional flat file data stores, ability to create and modify existing reports, ability to create baselines and study changes, programmability to retrieve information as appropriate and ability to include compliance, security, fraud detection etc. If applications are able to use the event monitoring software, it will be evident from the number of applications that are written.

Thursday, January 9, 2014

bool IsMatch(string input, string pattern)
{
if (string.IsNullOrWhiteSpace(input) || string.IsNullOrWhiteSpace(pattern)) return false;
var constants = pattern.split(new char[] { '*', '/' });
int start = -1;
var index = constants.Select( x => { int s = pattern.IndexOf(x, start + 1); start = s; return s;} ).ToList();
int prev = 0;
string wildcards = string.empty;
for (int i = 0; i < index.Length; i++)
{

// start must be the specified literal
start = input.IndexOf(constants[i]);
if (start == -1) return false;

// skip only as specified by wildcards
if (wildcards.Length > 0 )
{
int c = wildCards.Count(x => x == '?') ;
int last = start;
while (start != -1)
{
   if (start - prev - 1 < c) continue;
   else break;
   start = input.IndexOf(constant[i], start + 1);
}
if (start == -1 || start - prev - 1 < c) return false;
}
if (start == -1) start = last;

int wildcardsStart = index[i] + constants[i].Length;
wildcardsLen = (i+1<index.Length) ? index[i + 1] - wildcardsStart-1 : 0;
wildcards = pattern.SubString(wildcardsStart, wildCardsLen);

Debug.Assert (wildcards.Length > 0 || wildCards.ForEach(x => x == '?' || x = '*'));

}
return true;
}
string input can be "ABCDBDXYZ"
string pattern can be "A*B?D*Z"

This is a read from the blog post by Julian James. iOS continuous integration builds can be setup with HockeyApp, OSX Mavericks, Server and XCode. These are installed first and Xcode remote repositories point to source control. BotRuns are stored in their own folder. Project scheme can be edited to include pre-action and post-action schemes.Bots can be specified with a schedule and a target. OCMock and OCHamcrest can be used for unit-testing. Archive post-action completion signals the availability of the latest build. Instruments can then be run with Javascript file to test the UI. Then when the bot is run, it could show the number of tests passed.
Javascript for UI automation can make use of the iOS UI automation library reference. This has an object model for all the UI elements and document navigation. Workflows are automated using the different methods on these objects and UIANavigationBar. The target is obtained with UIATarget, the application is referred to with UIAApplication, pages are available via UIAPageIndicator etc. A variety of controls are available via UIAButton, UIAElement, UIAPicker, UIAPickerWheel, UIAPopover, UIASearchBar, UIASecureTextField, UIATabBar, and UIAWebView. Organzation of UI elements can be used with UIATableCell, UIATableGroup and UIATableView to access the individual cells. UIATabGroup allows navigation between tabs.

Wednesday, January 8, 2014

A recap of on five progressive steps by Barry Wise to database normalization
We start out with an example where we store user's name, company, company address, and personal urls - say url1 and url2
The zero form is when all of this is in a single table and no normalization has occurred.
The first normal form is achieved by
1) eliminating repeating groups in individual tables
2) creating a separate table for each set of data
3) identify each set of related data with a primary key
so this yields a table where the user information is repeated for each url so url field limitation is solved
The Second normal form is achieved by
1) Creating separate tables for a set of values that apply to multiple records
2) Relate these tables with a foreign key
Basically, we break the url values into a separate table so we can add more in the future
The third normal form is achieved by
1) eliminating fields that do not depend on the key
Company name and address have nothing to do with the user id, so they are broken off into their own table
The fourth and higher form depend on data relationships involving one-one, one-to-many and many-to-many.
The Fourth normal form is
1) In many to many relationship, independent entities cannot be stored in the same table.
To get many users related to many urls, we define a url_relations where they user id and url id are paired.
The next normal form is the Fifth normal form which suggests that
1) The original table must be reconstructed from the tables into which it has been broken down. This is a way to check that no new columns have been added.
As always, remember that denormalization has its benefits as well.
Also, Litt's tips additionally mentions the following :
1) create a table for each list. More than likely every list will have additional information
2) create non-meaningful identifiers.
This is to make sure that business rule changes do not affect the primary identifier

Barry Wise on five progressive steps to database normalization
We start out with an example where we store user's name, company, company address, and personal urls - say url1 and url2
The zero form is when all of this is in a single table and no normalization has occured.
The first normal form is achieved by
1) eliminating repeating groups in individual tables
2) creating a separate table for each set of data
3) identify each set of related data with a primary key
so this yields a table where the user information is repeated for each url so url field limitation is solved
The Second normal form is achieved by
1) Creating separate tables for a set of values that apply to multiple records
2) Relate these tables with a foreign key
Basically, we break the url values into a separate table so we can add more in the future
The third normal form is achieved by
1) eliminating fields that do not depend on the key
Company name and address have nothing to do with the user id, so they are broken off into their own table
The fourth and higher form depend on data relationships involving one-one, one-to-many and many-to-many.
The Fourth normal form is
1) In many to many relationship, independent entities cannot be stored in the same table.
To get many users related to many urls, we define a url_relations where they user id and url id are paired.
The next normal form is the Fifth normal form which suggests that
1) The original table must be reconstructed from the tables into which it has been broken down. This is a way to check that no new columns have been added.
As always, remember that denormalization has its benefits as well.
Also, Litt's tips additionally mentions the following :
1) create a table for each list. More than likely every list will have additional information
2) create non-meaningful identifiers.
This is to make sure that business rule changes do not affect the primary identifier

Some T-SQL queries
SELECT t.name as tour_name, COUNT(*)
FROM Upfall u INNER JOIN trip t
on u.id = t.stop
GROUP BY t.name
HAVING COUNT(*) > 6
Aggregate funcions - AVG(), MAX(), MIN(), MEDIAN(), COUNT(), STDEV(), SUM(), VARIANCE()

--Summarizing rows with rollup
SELECT t.name AS tour_name, c.name as county_name COUNT(*) as falls_count
FROM upfall u INNER JOIN trip t
ON U.id = t.stop
INNER JOIN county c ON u.county_id = c.id
GROUP BY t.name, c.name with ROLLUP

SELECT t.name as tour_name,
c.name as county_name
COUNT(*) as falls_count
GROUPING(t.name) as n1 -- test null from cube
GROUPING(t.name) as n2 -- test null from cube
from upfall u INNER JOIN trip t
ON u.id = t.stop
INNER JOIN county c
ON u.county_id = c.id
WHERE t.name = 'Munising'
GROUP BY t.name,c.name WITH CUBE

--RECURSIVE QUERIES
WITH recursiveGov
(level, id, parent_id, name, type) AS
(SELECT 1, parent.id, parent.parent_id, parent.name, parent.type
FROM gov_unit parent
WHERE parent.parent_id IS NULL
UNION ALL
SELECT parent.level + 1, child.id, child.parent_id, child.name, child.type
FROM recursiveGov parent, gov_unit child
WHERE child.parent_id = parent.id)
SELECT level, id, parent_id, name, type
FROM recursiveGov

CREATE TABLE COUNTRY(
ID int identity (1,1),
NAME varchar(15) NOT NULL,
CODE Varchar(2) DEFAULT 'CA'
CONSTRAINT code_not_null NOT NULL
CONSTRAINT code_check
CHECK (country IN ('CA', 'US')),
indexed_name VARCHAR(15),
CONSTRAINT country_pk
PRIMARY KEY(ID)
CONSTRAINT country_fk01
FOREIGN KEY (name,code)
REFERENCES parent_example (name,country),
CONSTRAINT country_u01
UNIQUE(name,country)
CONSTRAINT country_index_upper
CHECK(indexed_name = UPPER(name))
);