Cluster computing

Monday, July 8, 2013

Full text and semantic extraction in SQL Server 2012

Here are some sample queries for semantic extraction of keyphrases in SQL Server 2012.
SET @Title = 'TestDoc.docx'

SELECT @DocID = DocumentID
FROM Documents
WHERE DocumentTitle = @Title

# Finds the keyphrases in a document.
SELECT @Title as Title, keyphrase, score
FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID)
ORDER by score DESC

# Finds similar documents
SELECT @Title as SourceTitle, DocumentTitle as MatchedTitle,
DocumentID, score
FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID)
INNER JOIN Documents ON DocumentID = matched_document_key
ORDER BY score DESC

# Finds keyphrases that make documents similar or related
SELECT @SourceTitle as SourceTitle, @MatchedTitle as MatchedTitle, keyphrase, score
FROM SEMANTICSIMILARITYDETAILSTABLE(Documents, DocumentContent, @SourceDocID, DocumentContent, @MatchedDocID)
ORDER BY score DESC

You can use FileTables to store documents in SQL Server. These are special tables built on top of FILESTREAM.
A FileTable enables application to access files and documents as if they were stored in the filesystem without
requiring any changes to the application.

You can enable semantic search on columns using semantic index.
To create a semantic index when there is no fulltext index

CREATE FULLTEXT CATALOG ft as DEFAULT
GO

CREATE UNIQUE INDEX ui_ukDescription
ON MyTable.Description(DescriptionID)
GO

CREATE FULLTEXT INDEX ON MyTable.Description
(Description, Language-1033, Statistical_Semantics)
KEY INDEX DescriptionID
WITH STOPLIST = SYSTEM
GO

or Add semantic indexing to one that has fulltext index
ALTER FULLTEXT INDEX ON MyTable.Description
ALTER COLUMN Description
ADD Statistical_Semantics
WITH NO POPULATION
GO

Sunday, July 7, 2013

Application Partition for DNS

Application partitions for DNS
Application partitions are user defined partitions that have a custom replication scope. Domain controllers can be configured to host any application partition irrespective of their domains so long as they are in the same forest. This decouples the DNS data and its replication from the domain context. You can now configure AD to replicate only the DNS data between the domain controllers running the DNS service within a domain or forest.
The other partitions are DomainDnsZones and ForestDnsZones. The system folder is the root level folder to store DNS data. The default partitions for Domain and Forest are created automatically.
Aging and scavenging When the DNS records build up, some of the entries become stale when the clients have changed their names or have moved. These are difficult to maintain as the number of hosts increases. Therefore a process called scavenging is introduced in the Microsoft DNS server that scans all the records in a zone and removes the records that have not been refreshed in a certain time period. when the clients register themselves with the dynamic DNS, their registrations are set to be renewed every 24 hours by default. Windows DNS will store this timestamp as an attribute of the DNS record and is used with scavenging. Manual record entries have timestamps set to zero so they are excluded from scavenging.
"A "no-refresh interval" for the scavenging configuration option is used to limit the amount of unnecessary replication because it defines how often the DNS sever will accept the DNS registration refresh and update the DNS record.
This is how often the DNS server will propagate a timestamp refresh from the client to the directory or file-system. Another option called the refresh interval specifies how long the DNS server must wait following a refresh for a record to be eligible for scavenging and this is typically seven days.

Friday, July 5, 2013

Active directory conditional forwarding

Active directory has a feature where by one or more IP address can be specified to forward name resolutions to that are not handled by the local DNS server. The conditional forwarder definitions are also replicated via Active Directory. Together with the forward and reverse lookup zones in the active directory these can be set via the DNS mmc management console. The DNS servers are usually primary or secondary in nature. The primary stores all the records of the zone and the secondary gets the contents of its zone from the primary. Each update can flow from the primary to the secondary or the secondary may pull the updates periodically or on demand. All updates have to be made to the primary. Each type of server can resolve name queries that come from hosts for the zones. The contents of the zone file can also be stored in the active directory in a hierarchical structure. The DNS structure can be replicated among all DCs of the domain, each DC holds a writeable copy of the DNS data. The DNS objects stored in the Active Directory could be updated on any DC via LDAP operations or through DDNS against DCs that act as DNS servers when the DNS is integrated with the Active Directory.
The DNS "island" issue sometimes occurs due to improper configuration. AD requires proper DNS resolution to replicate changes and when using integrated DNS, the DC replicates DNS changes throught AD replication. This is the classic chicken and egg problem. If the DC configured as name server points to itself and its IP address changes, the DNS records will successfully be updated locally but other DCs cannot resolve this DC's IP address unless they point to it. This causes replication fail and effectively renders the DC with the changed IP address an island to itself. This can be avoided when the forest root domain controllers that are the name servers are configured to point at root servers other than themselves.

Wednesday, July 3, 2013

Active Directory delegation options

Active Directory delegation options :

You can set which servers are authoritative for the Active Directory - related zones. These could be the DNS Servers or domain controllers. A straightforward option could be to delegate DNS namespaces to domain controllers and allow them to host the DNS zones. The decisions are dependent on ADs and DNS teams push and pull, initial setup and configuration of zones, support and maintenance of the zones, integration issues with existing administration software and practices. The first factor is about the autonomy and management of records when they are in an existing DNS servers as opposed to the domain controller. The initial population of the AD resource records can be burdensome. DNS servers may need to be configured to allow the domain controllers to perform DDNS updates. DNS administrators will need to configure DDNS to only allow domain controllers to update certain zones in order to mitigate security risks of allowing domain controllers to update any DNS records in the server. Support and maintenance is minimal with DDNS. By delegating AD DNS zones, clients can still point to the same DNS servers that they were so integration is easier.
Standalone Active Directory is useful to create isolated test or lab networks. To setup such an environment, the DNS service is installed on a DC in the forest, the DNS zones for the AD domains are added, and then the DNS server is configured to forward unresolved queries to one or more of the existing corporate DNS servers. The primary DNS server for all clients in the forest is pointed to the DC.
Active Directory Integrated DNS zones are used when AD DNS zones are used with DC. DNS servers are usually primary or secondary. The primary server holds the data for the zone in a file on the host and reads the entries from there. There is usually only one primary server. The secondary servers gets the contents of its zone from the primary that is authoritative for the zone. The contents of the secondary file are then updated periodically.
Background loading of DNZ zones is a relatively new feature. Prior to that the DNS Server service would not become available until it completed all of the zones it hosted from AD. This can take quite some time. The DNS sever now no longer waits until every zone is loaded but instead loads them in the background and makes the zones available for query/update.

Resource records used by Active Directory

When a DC is promoted to a domain, the default resource records are populated in the netlogon.dns file in the system root directory. These records look something like this: The first record is for domain itself and lists the name of the domain, the type of the record, the IP address and the weight. Each DC attempts to register an A record for its IP address for the domain it is in similar to the preceding record. This is an alias or canonical name (CNAME) record. The record is comprised of the GUID for the server, which is an alias for the server itself. Then there is a record for the canonical name (CNAME) DCs use this record if they know the GUID of a server and want to determine its IP address. If the dc is a Global catalog server, there is another A record. The remaining records are of type SRV which specifies the location of servers that should be used for specific protocols. These records allow you to remap the port numbers for individual protocols or the priority in which certain servers are used.
Sites that do not have domain controller located within the site can be covered by others that have site links defined. This is called automatic site coverage. The DC adds site specific records for a site to cover, so that the DC can handle queries for clients in that site. To see a list of sites for a particular DC, the NLTest command can be run. The automatic site coverage can be toggled on or off with a registry value on the domain controllers.
These records can be queried for information such as:
all the global catalogs in a forest or particular site
all Kerberos servers in a domain or a particular site
all domain controllers in a domain or a particular site
the PDC emulator for a domain.
For domain controllers that should be dedicated to an application like Microsoft Exchange and do not publish any records, there are two options for configuration of the SRV record : the DnsAvoidRegisterRecords registry entry can be used or the NetLogon system settings in the administrative templates of the group policy can be applied to the domain controllers.

Tuesday, July 2, 2013

NDIS drivers

Protocol drivers write packets on to the wire using a network adapter. Network adapter vendors write proprietary drivers for their hardware and this can be a large number. Since the protocol driver does not need to know the nuances of every network adapter, windows developed the Network Driver Interface Specification (NDIS) so that protocol drivers would not have to know the nuances of each network adapter. Furthermore, the network adapters are now expected to conform to NDIS. Adapters that do so are called NDIS miniport drivers.
NDIS library implements the boundary that exists between the NDIS drivers and Transport Driver Interface(TDI).
NDIS library helps the NDIS driver clients to format commands they send to NDIS drivers. NDIS drivers interface with the library to receive requests and send back responses. So NDIS IRPs are intercepted by the library at the NDIS protocol interface and forwarded to the NDIS intermediate driver and to the NDIS miniport driver before sending to the Hardware abstraction layer (HAL).
NDIS library was designed to not just provide NDIS boundary helper routines but also provide an entire execution environment so that the driver code can be moved between client and server. So the NDIS library does not accept and process IRPs but translates IRPs into calls into the NDIS driver. NDIS drivers does not have to handle re-entrancy as the library guarantees that the requests will be allowed to complete before new requests are issued. This helps the NDIS driver to avoid synchronizations which grow complex with multiprocessors.
On the other hand, this serialization can hamper scalability so in subsequent versions, drivers can indicate to the NDIS library that they don't want to be serialized. The NDIS library in such cases forwards requests as fast as the IRPs arrive. The NDIS driver would then be expected to queue and manage multiple simultaneous requests. Other features include reporting whether the network medium is active. TCP/IP task offloading allows a minport to offload packet checksums and IPsec to others. Fast packet forwarding allows forwarding without processing incoming packets that are not destined to the host. Wake-on-LAN introduces power management capabilities. Connection-oriented NDIS allows NDIS drivers to manage connection-oriented media. The functions on the interfaces used by NDIS driver to interface with the network adapter hardware translate directly to corresponding functions in the HAL.

Monday, July 1, 2013

Dynamic DNS

DDNS is a method for clients to send requests to a DNS server to add or delete resource records in a zone. Prior to DDNS, the records were either directly updated via a text based zone file or via a vendor-supported GUI, such as the Windows DNS MMC snap-in. Active Directory takes full advantage of DDNS to relieve the maintenance of resource records.
DNSSec was introduced to secure dynamic updates using public key-based methods. The approach Microsoft takes to providing secure dynamic updates is by using access control lists in AD. Zones store their DNS data in AD. By default, authenticated computers in a forest can make new entries in a zone. This enables authenticated user or computer to directly add personal computers to the network.
Global Names Zone was introduced to ease migration from WINS. WINS uses short names as opposed to DNS that uses hierarchical names. However, DNS provides support for short names using DNS suffix search, orders on clients, and the DNS resolver on the client will attempt to resolve the short name by appending each DNS suffix, defined one at a time in the order listed. In a large organization with numerous DNS namespaces, this list of suffixes could be quite long. Since such lookup could be potentially time-consuming, difficult to maintain and also causes significant increases in network traffic during short name resolution, Global Names Zone was introduced in Windows Server 2008. GNZ supports resolution without suffix search list to be on the client. Any client that supports DNS resolution can utilize the global name zones functionality without additional configuration. Windows server 2008 DNS server will first try to resolve the name queried in the local zone and if that fails, they will then try to resolve it in the global name zone. The caveat is that the names are statically registered instead of dynamically registered so it needs to be maintained. GNZ is useful for IPv6 deployments. CName records are placed in the GlobalNames zone and alias them to the records for specific server/name in the relevant forward lookup zone.