Cluster computing

Saturday, July 13, 2013

reports and xslt and load test runs.

XSLT transformation enables test results to be displayed. This is how we prepare the data for display or mailing out to subscribers. The resulting xhtml is easy to share. First we get the results from a trx file or the stored procedure execution in a database. This gives us the data in the form of a xml or dataset. Then we take create the xslt with the summary we would like to see. Note that visual studio has default summary view and results that you can open from a load test run using the open and manage results button on the toolbar. This already converts the summary and the results to an html that can be cut and pasted into any application using object linking and embedding technology. The views we create with xslt merely defines a customized view using headings, row and columns to summarize the data.
The results from a trx file or a stored procedure execution need not be wrapped in a html. It can be converted to xml or excel file with a load test plugin. The load test plugin will simply have an event handler invoked at the end of the relevant test execution and can be written in C#.
Likewise xslt transform and database mail can be written as a SQL Stored procedure. So that newer additions of test runs can trigger database mail. This also scales well to enterprise load. where the runs and the results could be stored in the order of gigabytes. It is easier to design the html and transforms using tools such as report manager and word prior to moving it inside a stored procedure.
Reports can be generated for all kinds of runs. For performance testing, these runs are usually, load test, stress test, and capacity test. The load test determines the throughput required to support the anticipated peak production load, the adequacy of a hardware environment and the adequacy of a load balancer. It also detects functionality errors under load, collects data for scalability and capacity planning. However, it is not designed for speed of response. Stress test determines if data can be corrupted by overstressing the system, provides an estimate of how far beyond the target load an application can go before causing failures and errors in addition to slowness, allows establishing application monitoring triggers to warn of impending failures and helps to plan what kind of failures are most valuable to plan for. Capacity test provides information about how workload can be handled to meet the business requirements, provide actual data that capacity planners can use to validate or enhance their models or predictions, and determines current usage and capacity of the existing system as well as trends to aid in capacity planning. Note that in practice, the most frequently used tests are smoke test which is the initial run of the performance test to see if your application can perform its operation under normal load. For all these runs, reports generation and subscription is somewhat similar.

Friday, July 12, 2013

Publishing load test results.

In the Visual Studio, when we open a load test, we see an option to "open and manage Results" in the toolbar. This brings up a dialog box which lists the results associated with the loadtest. Each of these results can be selected and opened. Opening a result brings up the summary view by default. This view can then be cut and paste in the body of an e-mail for reporting. Alternatively it can be exported to a file on the fileshare.

SQL Services Reporting Manager provides great functionality to design custom reports. These reports can draw data using SQL queries. They can also be subscribed with e-mail registration.

Team Foundation Server enables automation of a performance test cycle. The steps involved in a performance test cycle are as follows:
1. Understand the process and compliance criteria
2. Understand the system and the project plan
3. Identify performance acceptance criteria
4. Plan performance-testing activities
5. Design tests
6. Configure the test environment
7. Implement the test design
8. Execute the work items
9. Report results and archive data
10. Modify the pain and gain approvals for Modifications
11. Return to activity 5
12. Prepare the final report.

The first step involves getting a buy-in on the performance testing prior to the testing and to comply with the standards, if any. The second step is to determine the use case scenarios and their priority. The third step is to determine the requirements and goals for performance testing as determined with stakeholders, project documentation, usability study and competitive analysis. The goals should be articulated in a measurable way and recorded somewhere. Plan work items to project plans and schedule them accordingly. This planning is required to line up the activities ahead of time. Designing performance tests involves identifying usage scenarios, user variances and generating test data. Test are designed based on real operations and data, to produce more credible results and enhance the value of performance testing. Tests include component-level testing. Next, configure the environments using load-generation and application monitoring tools, isolated network environments and ensuring compatibility all of which takes time. Test Designs are implemented to simulate a single or virtual user. Next, work items are executed in the order of their priority, their results evaluated and recorded, communicated and test plan adapted. Results are then reported and archived. Even if there are runs that may not all be usable, they are sometimes archived with appropriate labels. After each testing phase, it is important to review the performance test plan. Mark the test plan that have been completed and evaluated and submit for approval. Repeat the iterations. Finally, prepare a report to be submitted to the relevant stakeholders for acceptance.

Thursday, July 11, 2013

Mailing data and database objects best practices:
Data can be attached to mail in several different file formats including Excel, rtf, csv, text, HTML.
Data can also be included in the body of the mail.
You would need access to MAPI or SMTP to send mails
When sending a data access page, share the database so that users can interact with the page.
Create the data access page using UNC paths so that they are not mapped to local drives
Store the database and the page on the same server.
Publish from a trusted intranet security zone
Send a pointer instead of a copy of the HTML source code
For intranet users, UNC and domains alleviate security considerations while the same can be used to demand permissions for external users.
Always send the page to yourself and view the code before mailing others.

System generated mails for periodic activities or alerts are common practice in most workplace. There are several layers from which such mails can be generated. SQL Server has a xp called sendmail that can send messages to an smtp server. It needs to be enabled via server configurations. The sendmail xp can be directly invoked from the stored procedures which are very close to the data.
SSRS is another layer from which well-formed reports can be mailed out. These are again designed and sent out from SSRS. The TFS or source control is another place which can send mail. Automated performance reports can also be sent out this way.

Tuesday, July 9, 2013

REST API : resource versus API throttling

REST APIs should have cost associated with the resources rather than the APIs because there is no limit on the number of calls made per API. If the response sizes can be reduced with inline filter, then it directly translates to savings for both the sender and the receiver.

Some of the performance degradations occur due to :
premature optimization
guessing
caching everything
fighting the framework

Performance can be improved with
1) finding the target baseline
2) knowing the current state
3) profiling to find bottlenecks
4) removing bottlenecks
5) repeating the above

Request distribution per hour, most requested, http statuses returned, request duration, failed requests etc all help with the analysis. Server logs can carry all this information. Tools to parse the logs for finding these information could help. Process id and memory usage can directly be added to the server logs. Server side and client side performance metrics help to isolate issues.

Benchmarks are available for performance testing of APIs. CDN should not matter in performance measurements. use static file return as baseline. Separate out I/O and CPU bound processes.

Courtesy : rails performance best practices

Monday, July 8, 2013

Full text and semantic extraction in SQL Server 2012

Here are some sample queries for semantic extraction of keyphrases in SQL Server 2012.
SET @Title = 'TestDoc.docx'

SELECT @DocID = DocumentID
FROM Documents
WHERE DocumentTitle = @Title

# Finds the keyphrases in a document.
SELECT @Title as Title, keyphrase, score
FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID)
ORDER by score DESC

# Finds similar documents
SELECT @Title as SourceTitle, DocumentTitle as MatchedTitle,
DocumentID, score
FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID)
INNER JOIN Documents ON DocumentID = matched_document_key
ORDER BY score DESC

# Finds keyphrases that make documents similar or related
SELECT @SourceTitle as SourceTitle, @MatchedTitle as MatchedTitle, keyphrase, score
FROM SEMANTICSIMILARITYDETAILSTABLE(Documents, DocumentContent, @SourceDocID, DocumentContent, @MatchedDocID)
ORDER BY score DESC

You can use FileTables to store documents in SQL Server. These are special tables built on top of FILESTREAM.
A FileTable enables application to access files and documents as if they were stored in the filesystem without
requiring any changes to the application.

You can enable semantic search on columns using semantic index.
To create a semantic index when there is no fulltext index

CREATE FULLTEXT CATALOG ft as DEFAULT
GO

CREATE UNIQUE INDEX ui_ukDescription
ON MyTable.Description(DescriptionID)
GO

CREATE FULLTEXT INDEX ON MyTable.Description
(Description, Language-1033, Statistical_Semantics)
KEY INDEX DescriptionID
WITH STOPLIST = SYSTEM
GO

or Add semantic indexing to one that has fulltext index
ALTER FULLTEXT INDEX ON MyTable.Description
ALTER COLUMN Description
ADD Statistical_Semantics
WITH NO POPULATION
GO

Sunday, July 7, 2013

Application Partition for DNS

Application partitions for DNS
Application partitions are user defined partitions that have a custom replication scope. Domain controllers can be configured to host any application partition irrespective of their domains so long as they are in the same forest. This decouples the DNS data and its replication from the domain context. You can now configure AD to replicate only the DNS data between the domain controllers running the DNS service within a domain or forest.
The other partitions are DomainDnsZones and ForestDnsZones. The system folder is the root level folder to store DNS data. The default partitions for Domain and Forest are created automatically.
Aging and scavenging When the DNS records build up, some of the entries become stale when the clients have changed their names or have moved. These are difficult to maintain as the number of hosts increases. Therefore a process called scavenging is introduced in the Microsoft DNS server that scans all the records in a zone and removes the records that have not been refreshed in a certain time period. when the clients register themselves with the dynamic DNS, their registrations are set to be renewed every 24 hours by default. Windows DNS will store this timestamp as an attribute of the DNS record and is used with scavenging. Manual record entries have timestamps set to zero so they are excluded from scavenging.
"A "no-refresh interval" for the scavenging configuration option is used to limit the amount of unnecessary replication because it defines how often the DNS sever will accept the DNS registration refresh and update the DNS record.
This is how often the DNS server will propagate a timestamp refresh from the client to the directory or file-system. Another option called the refresh interval specifies how long the DNS server must wait following a refresh for a record to be eligible for scavenging and this is typically seven days.