Monday, January 31, 2022

Sovereign clouds continued…

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on sovereign clouds. This article talks about Government Community Cloud.

The difference between Commercial, GCC and GCC High Microsoft 365 environments is important to correctly align the compliance needs of the businesses. Commercial Microsoft 365 is  the standard Microsoft 365 cloud used by Enterprise, Academia and even home Office 365 tenants. It has the most features and tools, global availability and lowest prices. Since it’s the default choice between the clouds, everyone qualifies and there are no validations. Some security and compliance requirements can be met here using tools like  Enterprise Mobility and Security, Intune, Compliance Center, Cloud App Security, Azure Information Protection, and the Advanced Threat Protection tools. Some compliance frameworks can also reside in the commercial cloud and these include HIPAA, NIST 800-53, PCI-CSS, GDPR, CCPA etc but not government or defense compliance because the cloud shares a global infrastructure and workforce. Even some FedRAMP government compliance can be met in the commercial cloud but it will be heavily augmented with existing tools and will require finding and patching gaps.

The Government Community cloud is government focused copy of the commercial environment. It has many of the same features as the commercial cloud buth has datacenters within the Continental United States. Compliance frameworks that can be met in the GCC include DFARS 252.204-7012, DoD SRG level 2, FBJ CJIS, and FedRAMP High. It is still insufficient for ITAR, EAR, Controlled Unclassified information and Controlled Defense information handling because the identity component and network that GCC resides on Azure Commercial and is not restricted to US Citizens. That said, GCC does have additional employee background checks such as verification of US Citizenship, verification of seven year employment history, verification of highest degree attained, Seven year criminal record check, validation against the department of treasury list of groups, the commerce list of individuals and the department of state list, criminal history and fingerprint background check. 

The Dod Cloud kicks it up a notch and is only usable for the Department of Defense purposes and Federal contractors who meet the stringent cybersecurity and compliance requirements. The GCC High is a copy of the DoD cloud but it exists in its own sovereign environment. The GCC High does not compare to the commercial cloud in terms of feature parity but it does support calling and audio conferencing. Features are added to the GCC High cloud only when they meet the federal approval process, a dedicated staff is available that has passed the DoD IT-2 adjudication and only when the features do not have an inherent design that fails to meet the purpose of this cloud.

Applications can continue to use modern authentication in Azure Government cloud but not GCC High. The identity authority can be Azure AD Public and Azure AD Government


Sunday, January 30, 2022

Sovereign clouds


This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on cloud protection. This article talks about sovereign clouds.  

Public clouds are general purpose compute for all industries and commerce. Most of the service portfolio from the public cloud providers are made available in the public cloud for general acceptance. Some services are also supported in the sovereign cloud. This article discusses the role and purpose of sovereign clouds. Let’s begin with a few examples of Sovereign clouds. These are 1) US Government clouds (GCC) 2) China Cloud and 3) Office 365 GCC High cloud or USDoD. Clearly, organizations must evaluate which cloud is right for them.  The differences between them mostly aligns with compliance. The Commercial, GCC, and GCCHigh Microsoft 365 environments must protect their controlled and unclassified data. These clouds offer enclosures within which the data resides and never leaves outside that boundary. It meets sovereignty and compliance requirements with geographical boundaries for the physical resources such as datacenters.  The individual national cloud and global Azure cloud are cloud instances. Each instance is separate from the others and has its own environment and endpoints. Cloud specific endpoints can leverage  the same OAuth 2.0 protocol and Open ID connect to work with the Azure Portal but even the identities must remain contained within that cloud. There is a separate Azure Portal for each one of these clouds. For example, the portal for Azure government is https://portal.azure.us and the portal for China National Cloud is https://portal.azure.cn

The Azure Active Directory and the Tenants are self-contained within these clouds. The corresponding Azure AD authentication endpoints are https://login.microsoftonline.us and https://login.partner.microsoftonline.cn respectively.

The Regions within these clouds in which to provision the azure resources also come with unique names that are not shared with any other regions in any of the other clouds. Since these environments are unique and different, the registering of applications, the acquiring of tokens and the calls to the services such as Graph API are also different.

Identity models will change with the application and location of identity. There are three types: On-Premises identity, Cloud identity and Hybrid identity 

The On-premises identity belongs to the Active Directory hosted on-premises that most customers already use today.

Cloud identities originate, exist and are managed only in the Azure AD within each cloud.

The Hybrid identities originate as on-premise identities but become hybrid through data synchronization to Azure AD. After directory synchronization, they exist both on-premises and in the cloud. This gives the name hybrid identity model.

Azure Government applications can use Azure Government identities but can also use Azure AD public identities to authenticate to an application hosted in Azure Government. This is facilitated by the choice of Azure AD Public or the Azure AD Government.


Saturday, January 29, 2022

 

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on controlled folder access. This article talks about cloud protection.

Cloud protection is part of the next-generation portfolio of technologies in Microsoft Defender Antivirus that provides near-instant automated protection against new and emerging threats and vulnerabilities. The definitions are kept up to date in the cloud, but their role does not stop there. The Microsoft Intelligent Security Graph includes large sets of interconnected data as well as powerful artificial intelligence systems driven by advanced machine learning models. It works together with Microsoft Defender Antivirus to deliver accurate, real-time intelligent protection.

Cloud protection consists of the following features:

-          Checking against metadata in the cloud

-          Cloud protection and sample submission

-          Tamper protection enforcement

-          Block at first sight

-          Emergency signature updates

-          Endpoint detection and response in block mode

-          Attack surface reduction rules

-          Indicators of compromise (IoCs)

These are enabled by default. If for any reason, they get turned off, then the organization can enforce turning in back on using the Windows Management Instruction, Group Policy, PowerShell or with MDM configuration service providers.

Fixes for threats and vulnerabilities are delivered in real-time with Microsoft Defender Antivirus, unlike waiting for the next update in its absence.

5 billion threats to devices are caught every month. Windows Defender Antivirus does it under the hood. It uses multiple engines to detect and stop a wide range of threats and attacker techniques at multiple points. They provide industry with the best detection and blocking capabilities. Many of these engines are local to the client. If the threats are unknown, the metadata or the file itself is sent to the cloud service. The cloud service is built to be accurate, realtime and intelligent. While trained models can be hosted anywhere, they are run efficiently in the cloud with the transfer of input and prediction between the client and the cloud. Threats are both common and sophisticated and some are even designed to slip through protection. The earliest detection of a threat is necessary to ensure that not even a single endpoint is affected. With the models hosted in the cloud, protection is even more enriched and made more efficient. The latest strains of malware and attack methods are continuously included in the engines.

These cloud-based engines include:

-          Metadata based ML engine – Stacked set of classifiers evaluate file-types, features, sender-specific signatures, and even the files themselves to combine results from these models to make a real-time verdict which allow or block files pre-execution.

-          Behavior based ML engine where the suspicious behavior sequences and advanced attack techniques are monitored to trigger analysis. The techniques span attack chain, from exploits, elevation and persistence all the way through to lateral movement and data exfiltration.

-          AMSI paired ML engine – where pairs of client-side and cloud side models perform advanced analysis of scripting behavior pre- and post- execution to catch advanced threats like fileless and in-memory attacks

-          File-classification ML Engine - where deep neural network examine full file contents. Suspicious files are held from running and submitted to the cloud protection service for classification.  The predictions determine whether the file should be allowed or blocked from execution.

-          Detonation-based ML Engine - a sandbox is provided where suspicious files are detonated so that classifiers can analyze the observed behaviors to block attacks.

-          Reputation ML engine – which utilizes sources with domain expert reputations and models from across Microsoft, to block threats that are linked to malicious URLs, domains, emails, and files.

-          Smart rules engine - which features expert written smart rules that identify threats based on researcher expertise and collective knowledge of threats.

 

These technologies are industry recognized and proven to come with customer satisfaction.

 

 

Friday, January 28, 2022

 

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on controlled folder access. This article talks about customization.

Controlled folder access helps protect valuable data from malicious apps and threats, such as ransomware. There are four ways to customize this control which include:

1) Protecting additional folders

2) Adding applications that should be allowed to access protected folders.

3) Allowing signed executables files to access protected folders.

4) Customizing the notification

Controlled folder access applies to system folders and default locations, but they cannot be changed to any alternate locations. Adding other folders can be helpful to cases where the default location has changed. It could also include mapped network drives. Environment variables and wild cards are also supported. These folders can be specified from Windows security application, with Group Policy or with PowerShell. MDM configuration service providers can also be used to protect additional folders.

Specific applications can also be allowed to make changes to controlled folders. Write access to files in protected folders must be protected. Allowing applications can be useful if a specific application must override the controlled folder access. An application can be specified by its location. If the location changes, it is no longer trustworthy and cannot be allowed to override the controlled folder access. Application exceptions can also be specified via the Windows Security application, Group Policy. PowerShell or with MDM configuration service providers.

When a rule is triggered and an application or file is blocked, the alert notifications can be customized in the Microsoft Defender for the Endpoint. Notifications can be in the form of emails to a group of individuals. If we are using role-based access control, recipients will only receive notifications based on the device groups that were configured in the notification rule.

Signed executable files can be allowed to access protected folders. We use indicators based on certificates for scenarios where we write rules for attack surface reduction and controlled folder access but need to permit signed applications by adding their certificates to the allow list. Indicators can also be used to block signed applications from running.

Rules can also be suppressed to avoid alerts and notifications that are noisy. A suppression rule will display status, scope, action, number of matching alerts, created by and date when the rule was created.

 

 

Thursday, January 27, 2022

 This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent networking discussions on controlled folder access. In this article, we review the access control lists for Azure Data Lake Storage Gen 2.

The access control model in Azure Data Lake storage gen 2 supports both Azure role-based access control (Azure RBAC) and POSIX like access control lists (ACL). The shared key and SAS authorization grants access to a user without requiring them to have an identity in Azure Active Directory (Azure AD). When these are used, the Azure RBAC and ACLs have no effect. Only when there is an identity in Azure AD, can the Azure RBAC and ACL be used. The

Azure RBAC and ACL both require the user (or application) to have an identity in Azure AD. Azure RBAC gives broad and sweeping access to storage account data, such as read or write access to all the data in a storage account, while ACLs is for granting privilege at a finer level where the write access must be to a specific directory or file. The evaluation of RBAC and ACL for authorization decisions is not static. The access control lists, and policy resolution artifacts are static but the evaluation of the identity and its permissions for an identity context is dynamic. It can even involve composition and inheritance. It can allow dynamic assignment of users to roles. So, when a person leaves an organization and goes to another, then a scheduled background job can revoke the privileges and perform cleanup.

Users are mapped via policies to roles, and they are granted different level of access to different resources. The permissions can vary with owner_read, owner_write, owner_delete, group_read, group_write, group_delete, other_read, other_write and other_delete which along with the other state for granted or revoked. The purpose of specifying privilege is that we only need to grant on a need basis. Role-based access control (RBAC) facilitates principle of least privilege. A higher privilege role such as the domain administrator need not be used to work with AD connect or for deployment purposes. A deployment operator is sufficient in this regard.

Role based access control also enforce the most restrictive permission set so a general ability to read can be taken away for specific cases.

When it comes to securing KeyVaults and Storage accounts. The access control policies are the resorted technique. On the contrary, the role-based access control is less maintenance. There is no need to keep adding and removing conditional access polices from the KeyVault because they end up being transient even if they are persisted. Instead, the role-based access control for KeyVault requires zero-touch and automatically flows to all items in the vault.

An Access Control List consists of several entries called Access control lists. It can have zero or more ACEs. Each ACE controls or monitors access to an object by a specified trustee. There are six types of ACEs three of which are general purpose and applicable to all while the other three object-specific ACEs. Every ACE has a security identifier part that identifies the trustee, an access mask that specifies the access rights and a flag that indicates the ACE type and a set of bit flags that determine whether the child container or objects can inherit the ACE from the primary object to the which the ACL is attached. The general-purpose ACEs includes an access-denied ACE which is used in discretionary access control list, an access-allowed ACE which is used to allow access rights to a trustee and a system-audit ACE which is used in a System Access Control List. The special purpose ACEs carry an object type GUID that identifies one of the following: a type of child object, a property set or property, an extended right, and a validated write.

The Active Directory contains two policy objects called a centralized authorization policy (cap) and a centralized authorization policy rule (capr). These polices are based on expressions of claims and resource attributes. Capr targets specific resources and articulates the access control to satisfy a condition. Capes apply to an organization where the set of resource to which it applies can be called out. It is a collection of caprs that can be applied together. A user folder will have a specific cap comprising of several caprs and there will be a similar new cap for assignment to the finance folder.

 

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent networking discussions on controlled folder access. In this article, we review the access control lists for Azure Data Lake Storage Gen 2.

The access control model in Azure Data Lake storage gen 2 supports both Azure role-based access control (Azure RBAC) and POSIX like access control lists (ACL). The shared key and SAS authorization grants access to a user without requiring them to have an identity in Azure Active Directory (Azure AD). When these are used, the Azure RBAC and ACLs have no effect. Only when there is an identity in Azure AD, can the Azure RBAC and ACL be used. The

Azure RBAC and ACL both require the user (or application) to have an identity in Azure AD. Azure RBAC gives broad and sweeping access to storage account data, such as read or write access to all the data in a storage account, while ACLs is for granting privilege at a finer level where the write access must be to a specific directory or file. The evaluation of RBAC and ACL for authorization decisions is not static. The access control lists, and policy resolution artifacts are static but the evaluation of the identity and its permissions for an identity context is dynamic. It can even involve composition and inheritance. It can allow dynamic assignment of users to roles. So, when a person leaves an organization and goes to another, then a scheduled background job can revoke the privileges and perform cleanup.

Users are mapped via policies to roles, and they are granted different level of access to different resources. The permissions can vary with owner_read, owner_write, owner_delete, group_read, group_write, group_delete, other_read, other_write and other_delete which along with the other state for granted or revoked. The purpose of specifying privilege is that we only need to grant on a need basis. Role-based access control (RBAC) facilitates principle of least privilege. A higher privilege role such as the domain administrator need not be used to work with AD connect or for deployment purposes. A deployment operator is sufficient in this regard.

Role based access control also enforce the most restrictive permission set so a general ability to read can be taken away for specific cases.

When it comes to securing KeyVaults and Storage accounts. The access control policies are the resorted technique. On the contrary, the role-based access control is less maintenance. There is no need to keep adding and removing conditional access polices from the KeyVault because they end up being transient even if they are persisted. Instead, the role-based access control for KeyVault requires zero-touch and automatically flows to all items in the vault.

An Access Control List consists of several entries called Access control lists. It can have zero or more ACEs. Each ACE controls or monitors access to an object by a specified trustee. There are six types of ACEs three of which are general purpose and applicable to all while the other three object-specific ACEs. Every ACE has a security identifier part that identifies the trustee, an access mask that specifies the access rights and a flag that indicates the ACE type and a set of bit flags that determine whether the child container or objects can inherit the ACE from the primary object to the which the ACL is attached. The general-purpose ACEs includes an access-denied ACE which is used in discretionary access control list, an access-allowed ACE which is used to allow access rights to a trustee and a system-audit ACE which is used in a System Access Control List. The special purpose ACEs carry an object type GUID that identifies one of the following: a type of child object, a property set or property, an extended right, and a validated write.

The Active Directory contains two policy objects called a centralized authorization policy (cap) and a centralized authorization policy rule (capr). These polices are based on expressions of claims and resource attributes. Capr targets specific resources and articulates the access control to satisfy a condition. Capes apply to an organization where the set of resource to which it applies can be called out. It is a collection of caprs that can be applied together. A user folder will have a specific cap comprising of several caprs and there will be a similar new cap for assignment to the finance folder.

 

Tuesday, January 25, 2022

 

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent networking discussions on private connectivity. This article focuses on controlled folder access

Controlled folder access helps protect valuable data from malicious apps and threats, such as ransomware. It protects data by checking applications against a list of known trusted applications. Controlled folder access can be turned on using the Windows Security App, Microsoft Endpoint Connection manager, or Intune. The Microsoft Defender for endpoint can give detailed reporting into controlled folder access events and blocks which forms part of the usual alert investigation scenarios. It works by only allowing trusted applications to access protected folders which are specified when this access is configured. Apps that are not in the trusted list of applications are prevented from making any changes to files inside protected folders. Application can be added manually to the trusted list using the configuration manager or Intune. Additional actions can be performed from the Microsoft 365 defender portal.

The controller folder access is important to prevent tampering of files. Ransomware encrypts files so that it cannot be used. When this access is enabled, unauthorized usages pop up as notifications. The notification can be customized using the company details and contact information. Rules can be enabled individually to customize what criteria the feature monitors. The protected folders include common system folders which include boot sectors and additional user folders. Applications can be given access to protected folders. Audit mode can be used to evaluate how controlled folder access would impact the organization.

Attack surface reduction technique in the environment hinges on audit mode. In audit mode, we can enable attack surface reduction rules, exploit protection, network protection, and controlled folder access in audit mode. It lets us see a record of what would happen if the feature had been enabled. The audit mode can be enabled when testing how the features will work. Since it is not part of business operations, this mode facilitates study of suspicious file modifications over a certain period. The features won’t block or prevent applications, scripts, or files from being modified but all those events will be recorded in the Windows Event Log. With audit mode, we can review the event log to see what effect the feature would have had if it was enabled. The Defender can help get details for each event. They are especially helpful for investigating attack surface reduction rules. It lets us investigate issues as part of the alert timeline and investigation scenarios. Audit mode can be used with Group Policy, PowerShell and configuration service providers.

When the audit applies to all events, the controlled folder access can be enabled to turn on the audit mode and the corresponding events can be viewed. When the audit applies to individual rules, the attack surface reduction rules can be tested, and the attack surface reduction can be viewed on the rules reporting page. When the audit applies to individual mitigations, the exploit protection can be enabled, and the corresponding events can be viewed. Custom views can be exported and imported. The events described in these scenarios can also be saved as xml.

 

 

Monday, January 24, 2022

Predicate push-down for OData clients (continued)...

 

Predicates are expected to evaluate the same way regardless of which layer they are implemented in. If we have a set of predicates and they are separated by or clause as opposed to and clause, then we will have a result set from each predicate, and they may involve the same records in the results of each predicate. If we filter based on one predicate and we also allow matches based on another predicate, the two result sets may then be merged into one so that the result can then be returned to the caller. The result sets may have duplicates so the merge may have to return only the distinct elements. This can easily be done by comparing the unique identifiers of each record in the result set.

 

The selection of the result is required prior to determining the section that needs to be returned to the user. This section is determined by the start, offset pair in the enumeration of the results. If the queries remain the same over time, and the request only varies in the paging parameters, then we can even cache the result and return only the paged section. The API will persist the predicate, result sets in cache so that subsequent calls for paging only results the same responses. This can even be done as part of predicate evaluation by simply passing the well-known limit and offset parameter directly in the SQL query. In the enumerator we do this with Skip and Take. The OData Client calls with client-driven paging using $skip and $top query options.

When the technology involved merely wants to expose the database to the web as popularly used with OData albeit incorrectly, then each SQL object is exposed directly over the web API as a resource. Some queries are difficult to write in OData as opposed to others. For example,

oDataClient.Resource.Where(x => x.Name.GetHashCode() % ParallelWorkersCount == WorkerIndex).ToList()

will not achieve the desired partition of a lengthy list of resources for faster, efficient parallel data access

and must be rewritten as something like:

oDataClient.Resource.Where(x => x.Name.startsWith(‘A’)).ToList()

:

oDataClient.Resource.Where(x => x.Name.startsWith(‘Z’)).ToList()

The system query options from these are $filter, $select, $orderby, $count, $top, and $expand where the last one helps with joins. Although a great deal of parity can be achieved between SQL and OData with the help of these query options, the REST interface does not form a replacement for the analytical queries possible with purely language options such as those available from U-SQL, LINQ or Kusto. Those have their own place higher up in the stack at the business or application logic layer but at the lower levels close to the database, a web interface separation of concerns between the stored data and its access, the primitives provide a challenge as well as an opportunity.

Let us look at how the OData is written. We begin with a database that can be accessed with a connection string that stores data in the form of tables for entities in a database. A web project with an entity data model is then written to prepare a data model from the database. The web project can be implemented with a SOAP-based WCF or REST based webAPIs and EntityFramework. Each API is added by creating an association between the entity and the API. Taking the example of WCF further since it provides terminology for all parts of the service albeit not obsolete, a type is specified with the base DataService and an InitializeService method, the config.SetEntitySetAccessRule is specified. Then the JSONPSupportBehaviour attribute is added to the service class so that the end users can get the data in the well-known format that makes it readable. The service definition as say http://<odata-endpoint>/service.svc can be expected in json or xml format to allow clients to build applications using those objects representing entities. The observation here is that it uses a data model which is not limited to SQL databases, so the problem is isolated away from the database and narrowed down to the operations over the data model. In fact, OData has never been about just exposing the database on the web. We choose which entities are accessed over the web and we can expand the reach with OASIS standard. OASIS is a global consortium that drives the development, convergence, and adoption of web standards. Another observation is that we need not even use the Entity Framework for the data model. Some experts argue that OData main use case is the create, update, and delete of entities over the web and the querying should be facilitated by APIs from web services where rich programmability already exists for writing queries. While it is true that there are language-based options that can come in the compute layer formed by the web services, the exposure remains a common theme to the REST API design for both the REST API over a service or the REST API over a database. The filter predicate used in those APIs will eventually try to push it into the data persistence layer. In our case, we chose an example of a GetHashCode() operator that is more language based rather than a notion for the database. As demonstrated with the SQL statement example above, the addition of a hash to an entity involves adding a compute column attribute to its persistence. Once that is available, the predicate can automatically be pushed into the database for maximum performance and scalability.

The manifestation of data to support simpler queries and their execution is not purely a technical challenge. The boundary between data and compute is complicated by claims to ownerships, responsibilities, and jurisdictions. In fact, clients writing OData applications are forced to work without any changes to master data. At this point, there are two options for these applications. The first involves translating the queries to those that can work on existing data such as the example shown above. The second involves the use of scoping down the size of the data retrieved by techniques such as incremental update polling, paging, sorting etc. and then performing the complex query operations in-memory on that limited set of data. Both these options are sufficient to alleviate the problem encountered.

The strategic problem for the case with the data being large and the queries being arbitrarily complex for OData clients can be resolved with the help of a partition function and the use of a scatter-gather processing by the clients themselves. This can be compared to the partition that is part of the URI path qualifier for REST interfaces to the CosmosDB store.

OData also provides the ability to batch requests. The HTTP specification must be followed when sending a response. A new batch handler must be created and passed when mapping routing for OData service. Batch or response consolidation will be enabled.

 

Sunday, January 23, 2022

Predicate push-down for OData clients:


Abstract:

Data access is an important operational consideration for application performance, but it is often not given enough attention on architecture diagrams. The trouble with data access is that it is often depicted by a straight-line arrow on the data path diagrams between a source and a destination. But the size of data and the queries that can be run over the data might result in vast temporal and spatial spread of bytes transferred and incur varying processing delays. When it is overlooked, it might bring about additional architectural components such as background processors, polling mechanisms and redundant technology stacks for fast path. This article discusses some of the challenges and remediations as it pertains to OData which exposes data to the web.

Description:

The resolution for improving performance of queries is that it mostly involves pushing predicates down into the database and more so even to the query execution and optimization layer within the database so that the optimizer has a chance to determine the best query plan for it.

 

In the absence of a database, there will be emulation of the work of the query execution inside the database and this is still not likely to be efficient and consistent in all cases simply because it involves an enumeration-based data structure only in the web-service layer.

 

On the other hand, the database is closest to the storage, indexes and organizes the records so that they are looked up more efficiently. The query plans can be compared and the most efficient can be chosen. Having an in-memory iteration only data structure will only limit us and will not scale to size of data when the query processing is handled at the service layer rather than at the data layer.

 

Predicates are expected to evaluate the same way regardless of which layer they are implemented in. If we have a set of predicates and they are separated by or clause as opposed to and clause, then we will have a result set from each predicate, and they may involve the same records in the results of each predicate. If we filter based on one predicate and we also allow matches based on another predicate, the two result sets may then be merged into one so that the result can then be returned to the caller. The result sets may have duplicates so the merge may have to return only the distinct elements. This can easily be done by comparing the unique identifiers of each record in the result set.

 

Saturday, January 22, 2022

 Azure private connectivity for Key vaults and storage account:

The private connectivity for Azure resources might be surprisingly hard to guarantee if the following instructions are not followed. These include:

1. Private Link and VNet Integration monitoring: The purpose of private connectivity is to prevent data exfiltration. Private links provide in-depth protection against the threat. 

2. To privately connect to a service, create an endpoint.

3. To privately render a service, create a private link service or a private resource. The existing service must be behind a load balancer.

4. When we create a private endpoint, it must have the same region as the vnet from which the connections originate.

5. The same virtual network can be added to the resource along with the subnet from which the connections originate. Trusted Microsoft services are already allowed to bypass this firewall.

6. When a private endpoint is added, the following connections can no longer be made

a. Connections required from clients and workstations that are part of the organization or home office or via Hypernet. This can be mitigated by adding an entry via the DNS maintained by the organization.

b. Applications and services that were honored based on service tags. There is no inbuilt support for service tags in the firewall configuration.

c. 3rd party solutions that are using any custom script or tooling to access the resource.

7. 6 a. can be addressed by adding the DNS records to the organizational DNS. The name records can be looked up using the DIG web interface and these might look like the following.

myvault.azure.net@8.8.4.4 (Default):  Copy results to clipboard

myvault.azure.net. 60 IN CNAME data-prod-wu2.vaultcore.azure.net.

data-prod-wu2.vaultcore.azure.net. 44 IN CNAME data-prod-wu2-region.vaultcore.azure.net.

data-prod-wu2-region.vaultcore.azure.net. 44 IN CNAME azkms-prod-wu2-b.trafficmanager.net.

azkms-prod-wu2-b.trafficmanager.net. 10 IN A 52.151.47.4

azkms-prod-wu2-b.trafficmanager.net. 10 IN A 51.143.6.21

azkms-prod-wu2-b.trafficmanager.net. 10 IN A 52.158.236.253

Here the CNAME record is used from above to add a new CNAME record to the DNS of the organization with the Zone specified as that of the private link as in privatelink.vaultcore.azure.net and FQDN as data-prod-wu2.vaultore.azure.net in this case. Alternatively, an A record with the corresponding IP Address can also be used.

8. For any connections that were excluded from switching to private connectivity, their originating IP addresses can be added to the firewall exception.

9. Disable public access is an option that can guarantee no public access to the resource, but the above steps will help that option is found too restrictive and unacceptable.



Friday, January 21, 2022

 

Public versus private connectivity for Azure Resources:

Azure resources are globally unique. They can be reached from anywhere on the internet by name or by their public IP addresses. A variety of clients such as mobile phones, personal laptops and remote departments can connect to them. They are cloud resources, so they bring the best practice from deployment, availability and service-level agreements.

Azure cloud resources are also used internally by organizations as their Information Technology resources. They must be protected from external access. One way to secure these resources from undesirable access is to use private connectivity.

The private connectivity avoids all relays over the internet. Azure resources no longer need public IP addresses in this case and can be reached on their private ip address. All the firewall rules for port restriction against the public IP address access goes away. The result is a cleaner, low latency network access and this is both safe and secure.

When the Azure resource is a storage account or a key vault, the access is not straightforward because there could be many applications and services on premises or remote that use them. In these cases, those consuming services may have their own virtual network. When we visit the portal for reviewing the storage account or the key vault, its networking section shows the options to disable internet access. When this option is selected, the consuming services can still reach the resource if they are on the same virtual network, but all external access will be prohibited. This is helpful towards eliminating internet-based access for these resources. A less restrictive option would be to list all the virtual networks from which accesses may originate so that these resources are no longer accessible from anywhere else including the internet and except for those virtual networks. These consuming services on the registered virtual networks can still access the resource over the internet. Their public connectivity is not disrupted

Another option is to dedicate a private endpoint or link to the resource so that the private connectivity is established. This option is helpful for both on-premises and cloud services that use these resources. Their usage is narrowed down to just these resources.

When the services and the resources are in different regions, they must have vnet peering because vnets do not stretch between regions Peering helps services and resources to connect across virtual networks

Finally, for remote clients that want to access the azure resources and cannot avoid the internet, they can do so over VPN. In this case, a VPN gateway and Point-2-site connectivity is required.

Thursday, January 20, 2022

 

Data Import and Export from a database:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most networking discussions on Azure DNS which is a full-fledged general availability service. This article focuses on data import and export from a database

Data migration is a necessity when archiving or moving from one platform to another, and for redundancy. Since databases have relational data, import and export of data must consider the schema. When only the schema needs to be transferred, all the SQL objects can be serialized as Xml and this is contained in a single DACPAC file. A Data-tier application is a logical database management entity that defines all the SQL server objects. With the data included with the schema, the exported format is a BACPAC file. It can be stored in Azure Blob Storage or on premises.

The advantage of using Data tier application formats is that the DACPAC enables a DBA to list and validate behaviors from different source and targets. It offers a chance to determine failures and data loss in the case of say an upgrade. DAC tools can provide an upgrade plan. A script driven exercise does not offer that visibility. DAC also supports versioning to help the developer who authors it and the DBA who uses it to maintain and manage the database lineage through its lifecyle.  The primary use case for a DACPAC is the propagation of an existing database through development, test and production environments or in the reverse direction. The BACPAC is used from a source database to a new database and even on another server.

For an export to be transactionally consistent, there must be no write activity occurring during the export or the data must be exported from a copy which is transactionally consistent. If the BACPAC file is stored in the Azure Blob Storage, then there is a maximum size limit of 200GB. It cannot be exported to Azure Premium storage or to a storage behind a firewall or to an immutable storage. The fully qualified file name must be limited to 128 characters and must exclude specific special characters. If the operation exceeds 20 hours, it may be canceled. User must be a member of the dbmanager role or assigned create database permissions

Azure SQL managed instance does not support exporting a database to a BACPAC file using the Portal or PowerShell. Instead, the SQLPackage or SQL Server management studio must be used. This does not mean that there is no such functionality in Portal or PowerShell. For example, New-AZSqlDatabaseExport and New-AzSqlDatabaseImport are available but large database export or import take a long time and may fail for many reasons. SQLPackage utility is best for scale and performance. Data Migration Service can migrate a database from a SQL Server to an Azure SQL database. Bcp utility is shipped along with SQL server and it can also be used to backup and restore data

Like Export, Import speed can be maximized by providing more and faster resources, scaling the database, and compute size during the import process. It can be scaled down after the import is successful. SQLPackage, Portal and Powershell help with import just as it does for export.

The DAC operations that are supported include extract, deploy, register, unregister, and upgrade. A BACPAC supports primarily import and export operations.

 

Wednesday, January 19, 2022

 

This is a continuation of the sample queries written for Azure Public Cloud for diagnostic purposes. The topic was introduced in this article earlier.

Sample Kusto queries:

1)      When log entries do not have function names, scopes or duration of calls:

source

| where description Contains "<string-before-scope-of-execution>"

| project SessionId, StartTime=timestamp

| join (source

| where description Contains "<string-after-scope-of-execution>"

| project StopTime=timestamp, SessionId)

on SessionId

| project SessionId, StartTime, StopTime, duration = StopTime - StartTime

 

| summarize count() by duration=bin(min_duration/1s, 10)

 

| sort by duration asc

 

| render barchart

 

2)      Since the duration column is also relevant to other queries later

source  | extend duration = endTime – sourceTime

 

3)      When the log entries do not have an exact match for a literal:

source

| filter EventText like "NotifyPerformanceCounters" 

| extend Tenant = extract("tenantName=([^,]+),", 1, EventText)

 

4)      If we wanted to use regular expressions on EventText:

source

| parse EventText with * "resourceName=" resourceName ",

totalSlices=" totalSlices:long * releaseTime=" releaseTime:date ")" *

| valid in~ ("true", "false")

5)      If we wanted to read signin logs:

source                            

| evaluate bag_unpack(LocationDetails)

| where RiskLevelDuringSignIn == 'none'

   and TimeGenerated >= ago(7d)

| summarize Count = count() by city

| sort by Count desc

| take 5

        6) If we wanted to time bucket the log entries:

source

| where Modified > ago(7d)

| summarize count() by bin(Modified, 1h)

| render columnchart

 

        7) If we wanted to derive just the ids:

Source

| where Modified > ago(7d)

| project  Id

 

        8) If we wanted to find the equivalent of a SQL query, we could use an example as follows:

             EXPLAIN

SELECT COUNT_BIG(*) as C FROM StormEvents

Which results in

StormEvents

| summarize C = count()

| project C

 

This works even for wild card characters such as:

EXPLAIN

SELECT * from dependencies where type like “Azure%”

Which results in

dependencies

| where type startswith “Azure”

 

And for extensibility as follows:

EXPLAIN

SELECT operationName as Name, AVG(duration) as AvgD FROM dependencies GROUP BY name

Which results in

dependencies

| summarize AvgD = avg(duration) by Name = operationName

     9)  If we wanted to process JsonPath, there are KQL Functions that process dynamic objects. For example:

datatable(input :dynamic)

[

dynamic({‘key1’: 123, ‘key2’: ‘abc’}),

dynamic({‘key1’: 456, ‘key3’: ‘fgh’}),

]

| extend result = bag_remove_keys(input, dynamic([‘key2’, ‘key4’]))

The above can also be written with json query language

| extend result = bag_remove_keys(input, dynamic([‘$.key1’]))

which results in

{‘key2’:’abc’}

 

 

 

 

 

 

Tuesday, January 18, 2022

 Disaster recovery using Azure DNS and Traffic Manager:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most networking discussions on Azure DNS which is a full-fledged general availability service.

This article focuses on disaster recovery using Azure DNS and Network Traffic Manager. The purpose of disaster recovery is to revive functionality after a severe loss for the application. The level of revival may be graded as unavailable, partially available or fully available. A multi-region architecture provides some fault tolerance and resiliency against application or infrastructure by facilitating a failover. The region redundancy helps achieve failover and high availability but the approaches for disaster recovery might vary from business to business. The following are listed as some of the options.

-          Active-passive with cold standby: In this failover solution, the VMs and other applications are running in the standby mode are not active until there is a need for failover.Backups, VM Images and resource manager templates continue to be replicated usually to a different region. This is cost-effective but takes time to complete a failover.

-          The active/passive with pilot light failover solution sets up a standby environment with minimal configuration.  The setup has only the necessary services running to support only a minimum and critical set of applications. The scenario can only execute minimal functionality, but it can scale up and launch more services to take bulk of the production load if a failover occurs. Data mirroring can be setup with a site-to-site vpn.

-          the active passive with warm standby is setup such that it can take up a base load and initiate scaling until all instances are up and running. The solution isn’t scaled to take full production workload, but it is functional. It is an enhancement over the previous approach but short of a full-blown approach.

Two requirements that come from this planning deserve callouts.  Firstly, a deployment mechanism must be used to replicate instances, data and configurations between primary and standby environments. The recovery can be done natively or third-party services. Secondly, a solution must be developed to divert network/web traffic from the primary site to the secondary site. This type of disaster recovery can be achieved via Azure DNS, Traffic Manager for DNS or third-party global load balancers.

The Azure DNS manual failover solution for disaster recovery uses the standard DNS mechanism to failover to the backup site. It assumes that both the primary and the secondary endpoints have static IP addresses that don’t change often, an Azure DNS zone exists for both the primary and secondary site and that the TTL is at or below the RTO SLA set in the organization. Since the DNS Server is outside the failover or disaster zone, it does not get impacted by any downtime. The user is merely required to make a flip. The solution is scripted and the low TTL set against the zone ensures that no resolver around the world caches it for long periods. For cold standby and pilot light, since some prewarming activity is involved, enough time must be given before making the flip. The use of Azure traffic manger automates this flip when both the primary and the secondary have a full deployment complete with cloud services and a synchronized database. The traffic manager routes the new requests to the secondary region on service disruption. By virtue of the inbuilt probes for various types of health checks, the Azure Traffic Manager falls back to its rules engine to perform the failover.