Cluster computing

Thursday, July 11, 2013

Mailing data and database objects best practices:
Data can be attached to mail in several different file formats including Excel, rtf, csv, text, HTML.
Data can also be included in the body of the mail.
You would need access to MAPI or SMTP to send mails
When sending a data access page, share the database so that users can interact with the page.
Create the data access page using UNC paths so that they are not mapped to local drives
Store the database and the page on the same server.
Publish from a trusted intranet security zone
Send a pointer instead of a copy of the HTML source code
For intranet users, UNC and domains alleviate security considerations while the same can be used to demand permissions for external users.
Always send the page to yourself and view the code before mailing others.

System generated mails for periodic activities or alerts are common practice in most workplace. There are several layers from which such mails can be generated. SQL Server has a xp called sendmail that can send messages to an smtp server. It needs to be enabled via server configurations. The sendmail xp can be directly invoked from the stored procedures which are very close to the data.
SSRS is another layer from which well-formed reports can be mailed out. These are again designed and sent out from SSRS. The TFS or source control is another place which can send mail. Automated performance reports can also be sent out this way.

Tuesday, July 9, 2013

REST API : resource versus API throttling

REST APIs should have cost associated with the resources rather than the APIs because there is no limit on the number of calls made per API. If the response sizes can be reduced with inline filter, then it directly translates to savings for both the sender and the receiver.

Some of the performance degradations occur due to :
premature optimization
guessing
caching everything
fighting the framework

Performance can be improved with
1) finding the target baseline
2) knowing the current state
3) profiling to find bottlenecks
4) removing bottlenecks
5) repeating the above

Request distribution per hour, most requested, http statuses returned, request duration, failed requests etc all help with the analysis. Server logs can carry all this information. Tools to parse the logs for finding these information could help. Process id and memory usage can directly be added to the server logs. Server side and client side performance metrics help to isolate issues.

Benchmarks are available for performance testing of APIs. CDN should not matter in performance measurements. use static file return as baseline. Separate out I/O and CPU bound processes.

Courtesy : rails performance best practices

Monday, July 8, 2013

Full text and semantic extraction in SQL Server 2012

Here are some sample queries for semantic extraction of keyphrases in SQL Server 2012.
SET @Title = 'TestDoc.docx'

SELECT @DocID = DocumentID
FROM Documents
WHERE DocumentTitle = @Title

# Finds the keyphrases in a document.
SELECT @Title as Title, keyphrase, score
FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID)
ORDER by score DESC

# Finds similar documents
SELECT @Title as SourceTitle, DocumentTitle as MatchedTitle,
DocumentID, score
FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID)
INNER JOIN Documents ON DocumentID = matched_document_key
ORDER BY score DESC

# Finds keyphrases that make documents similar or related
SELECT @SourceTitle as SourceTitle, @MatchedTitle as MatchedTitle, keyphrase, score
FROM SEMANTICSIMILARITYDETAILSTABLE(Documents, DocumentContent, @SourceDocID, DocumentContent, @MatchedDocID)
ORDER BY score DESC

You can use FileTables to store documents in SQL Server. These are special tables built on top of FILESTREAM.
A FileTable enables application to access files and documents as if they were stored in the filesystem without
requiring any changes to the application.

You can enable semantic search on columns using semantic index.
To create a semantic index when there is no fulltext index

CREATE FULLTEXT CATALOG ft as DEFAULT
GO

CREATE UNIQUE INDEX ui_ukDescription
ON MyTable.Description(DescriptionID)
GO

CREATE FULLTEXT INDEX ON MyTable.Description
(Description, Language-1033, Statistical_Semantics)
KEY INDEX DescriptionID
WITH STOPLIST = SYSTEM
GO

or Add semantic indexing to one that has fulltext index
ALTER FULLTEXT INDEX ON MyTable.Description
ALTER COLUMN Description
ADD Statistical_Semantics
WITH NO POPULATION
GO

Sunday, July 7, 2013

Application Partition for DNS

Application partitions for DNS
Application partitions are user defined partitions that have a custom replication scope. Domain controllers can be configured to host any application partition irrespective of their domains so long as they are in the same forest. This decouples the DNS data and its replication from the domain context. You can now configure AD to replicate only the DNS data between the domain controllers running the DNS service within a domain or forest.
The other partitions are DomainDnsZones and ForestDnsZones. The system folder is the root level folder to store DNS data. The default partitions for Domain and Forest are created automatically.
Aging and scavenging When the DNS records build up, some of the entries become stale when the clients have changed their names or have moved. These are difficult to maintain as the number of hosts increases. Therefore a process called scavenging is introduced in the Microsoft DNS server that scans all the records in a zone and removes the records that have not been refreshed in a certain time period. when the clients register themselves with the dynamic DNS, their registrations are set to be renewed every 24 hours by default. Windows DNS will store this timestamp as an attribute of the DNS record and is used with scavenging. Manual record entries have timestamps set to zero so they are excluded from scavenging.
"A "no-refresh interval" for the scavenging configuration option is used to limit the amount of unnecessary replication because it defines how often the DNS sever will accept the DNS registration refresh and update the DNS record.
This is how often the DNS server will propagate a timestamp refresh from the client to the directory or file-system. Another option called the refresh interval specifies how long the DNS server must wait following a refresh for a record to be eligible for scavenging and this is typically seven days.

Friday, July 5, 2013

Active directory conditional forwarding

Active directory has a feature where by one or more IP address can be specified to forward name resolutions to that are not handled by the local DNS server. The conditional forwarder definitions are also replicated via Active Directory. Together with the forward and reverse lookup zones in the active directory these can be set via the DNS mmc management console. The DNS servers are usually primary or secondary in nature. The primary stores all the records of the zone and the secondary gets the contents of its zone from the primary. Each update can flow from the primary to the secondary or the secondary may pull the updates periodically or on demand. All updates have to be made to the primary. Each type of server can resolve name queries that come from hosts for the zones. The contents of the zone file can also be stored in the active directory in a hierarchical structure. The DNS structure can be replicated among all DCs of the domain, each DC holds a writeable copy of the DNS data. The DNS objects stored in the Active Directory could be updated on any DC via LDAP operations or through DDNS against DCs that act as DNS servers when the DNS is integrated with the Active Directory.
The DNS "island" issue sometimes occurs due to improper configuration. AD requires proper DNS resolution to replicate changes and when using integrated DNS, the DC replicates DNS changes throught AD replication. This is the classic chicken and egg problem. If the DC configured as name server points to itself and its IP address changes, the DNS records will successfully be updated locally but other DCs cannot resolve this DC's IP address unless they point to it. This causes replication fail and effectively renders the DC with the changed IP address an island to itself. This can be avoided when the forest root domain controllers that are the name servers are configured to point at root servers other than themselves.

Wednesday, July 3, 2013

Active Directory delegation options

Active Directory delegation options :

You can set which servers are authoritative for the Active Directory - related zones. These could be the DNS Servers or domain controllers. A straightforward option could be to delegate DNS namespaces to domain controllers and allow them to host the DNS zones. The decisions are dependent on ADs and DNS teams push and pull, initial setup and configuration of zones, support and maintenance of the zones, integration issues with existing administration software and practices. The first factor is about the autonomy and management of records when they are in an existing DNS servers as opposed to the domain controller. The initial population of the AD resource records can be burdensome. DNS servers may need to be configured to allow the domain controllers to perform DDNS updates. DNS administrators will need to configure DDNS to only allow domain controllers to update certain zones in order to mitigate security risks of allowing domain controllers to update any DNS records in the server. Support and maintenance is minimal with DDNS. By delegating AD DNS zones, clients can still point to the same DNS servers that they were so integration is easier.
Standalone Active Directory is useful to create isolated test or lab networks. To setup such an environment, the DNS service is installed on a DC in the forest, the DNS zones for the AD domains are added, and then the DNS server is configured to forward unresolved queries to one or more of the existing corporate DNS servers. The primary DNS server for all clients in the forest is pointed to the DC.
Active Directory Integrated DNS zones are used when AD DNS zones are used with DC. DNS servers are usually primary or secondary. The primary server holds the data for the zone in a file on the host and reads the entries from there. There is usually only one primary server. The secondary servers gets the contents of its zone from the primary that is authoritative for the zone. The contents of the secondary file are then updated periodically.
Background loading of DNZ zones is a relatively new feature. Prior to that the DNS Server service would not become available until it completed all of the zones it hosted from AD. This can take quite some time. The DNS sever now no longer waits until every zone is loaded but instead loads them in the background and makes the zones available for query/update.