Cluster computing

Sunday, October 16, 2022

This article focuses on some of the best practices for working with workflows that deploy services. The tenets are:

· Reusability – many of the activity from the library of activities for one workflow can and will be reused for another. Very few workflows might have differences in doing tasks that were not covered by the global collection of activities. There should not be any difference between an activity that appears in bootstrapping and its invocation during redeployment/ rehosting in the new environment. Only the parameter values will change for this.

· Dependencies – many of the dependencies will be implicit as they originate from system components and services information. A workflow might additionally specify dependencies via the standard way in which workflows indicate dependencies. These will be on a case-by-case basis for tenants since it adds overhead to other services, many of whom are standalone. Implicit dependencies can be articulated in the format specified by the involved components.

· Splitting – Workflows are written for on-demand invocation from the web interface or by the system, so there might be more than one for a specific deployment scenario. It is best to include both the bootstrapping and the redeploy in the main workflow for the specific scenario, but they will be mutually exclusive during their respective phases and remain idempotent.

· Idempotency – All workflow steps and activities should be idempotent. If there are conditionals involved, they must be part of activities. The signaling and receiving notifications of dependent workflows if any must be specifically called out.

· Bootstrapping – This phase is common to many services and usually requires at least a cluster/set of servers to be made ready but there might be activities that require the service stamp to be deployed even if it is not configured along with necessary activities to do one time preparation such as getting secrets. Until the VIPs are ready, the redeployment cannot be kicked off. Bootstrapping might involve preparations for both primary and secondary where applicable.

· Redeployment or rehosting – This phase involves configuration since the bootstrapping is usually for a stamp and this stage converts it into a deployment for a service. Since it involves reconfiguration, it can be for both primary and secondary and typically done inside the new cloud. It is best to parameterize as much as possible.

· Naming convention – Though workflows can have any names inside the package that the owning teams upload, it is best to follow a convention for the specific scenario of one workflow calling another. Standalone single workflows do not have this problem. Even in the case when there are many workflows, a prefix/suffix might be helpful. This applies to both work workflows and activities.

· System workflow – Requiring separate workflows for bootstrap and redeployment via a system defined workflow to allow system to inject system defined activities between bootstrap and redeploy is a nice-to-have but the less intrusion into service deployment the better. This calls on the service to do their own tracking via passing parameter values between workflows and activities. A standard need not be specified for this, and it can be left to the discretion of the services.

The above list is not intended to be complete but focuses on the strengths of those that have worked well.

Saturday, October 15, 2022

This section refers some of the documentation for a certification in AZ-305.

1. Multiple tenants – enable access for developers of one tenant in another

A. A trust relationship must be setup between the DC receiving the request and the DC in the domain of the requesting account. Forest trusts help to manage a segmented AD DS infrastructures and support access to resources and other objects. Only one-way Transitive relationships are allowed. Federation is a collection of domains that have established trust.

2. How to setup single tenancy and operations that are restricted for single tenant auth?

A. This is required when the traditional approach to restricting access to domains names or IP addresses does not work for SaaS apps or for shared domain names. With tenant restrictions from Azure AD and SSO for the applications used, access can be controlled.

3. Identity protection versus monitoring, specifically services and purposes

A. Both security center and Azure sentinel can be used for Security, but the former helps to collect, prevent, and detect via analytics, the latter helps to detect via hunting, investigating via incidents and responding via automation.

4. What identity protection will protect from bot attack?
A. Azure AD Identity protection protects from bot attack. On-premises AD identity protection There are three key reports that administrators use for investigations in Identity Protection:

a. Risky users

b. Risky sign-ins

c. Risk detections

5. On-premises integration with Azure AD so that on-premises experience is not broken

There are two ways to do this:

1. Use Azure AD to create an Active Directory domain in the cloud and connect it to the on-premises Active Directory domain. Azure AD Connect integrates the on-premises directories with Azure AD.

2. Extend the existing on-premises Active Directory infrastructure to Azure, by deploying a VM in Azure that runs AD DS as a Domain Controller. This architecture is more common when the on-premises network and the Azure virtual network (VNet) are connected by a VPN or ExpressRoute connection. Several variations are possible:

a. a domain is created in Azure, and it is joined to the on-premises AD forest.

b. a separate forest is created in Azure that is trusted by domains in the on-premises forest.

c. an Active Directory Federation Services (AD FS) deployment is replicated to Azure.

6. Order of setting up service resources and tasks for AD integration of on-premises.

A. This includes Active Directory, Active Directory Domain Services, AD Federation Services.

7. Conditional access policies versus azure policies – when to use what?

A. Azure AD Conditional access can help author conditions such as when the password authentication must be turned off for legacy applications based on DateTime or other such criteria.

B. A policy is a default allow and explicit deny system focused on resource properties during deployment and for already existing resources. It supports cloud governance with compliance.

8. Can a blueprint be used to force hierarchy of resources specific to region?

A. Azure Blueprints can be used to assign policies in how resource templates are deployed which can affect multiple resources, it helps adhere to an organization’s standards, patterns, and best practices. It cannot be used to specify role assignments. It can consist of one or more policies.

9. Limits of resources and subscriptions? Can a tenant have more than one subscription?

A. When we run a single instance of resource, the service limits, subscription limits and the quota apply. When these limits are encountered, the shared resources must be scaled out.

10. Do we need availability zone redundancy or geo-redundancy?

A. Some tradeoffs based on cost (az is free, region is not), overhead (deploying to additional regions implies additional instances that may need to be monitored and read-only separation is possible only in the case of geo-redundancy.

11. Azure SQL managed instances – appropriateness over elastic pools and higher compute

A. Each elastic pool is contained within a single logical server. Database names must be unique in a pool so multiple geo secondaries cannot share the same pool.

12. How many databases per tenant?

A. a tenant database dedicated to store the company’s business data. The knowledge about the shared application is then stored in a dedicated application database.

13. How to perform migration of applications from on-premises to Azure – choose appropriate database instance, service and SKU

A. The four phases of migration include phase 1 – discover and scope, phase 2 – classify and plan, phase 3 – plan migration and testing, and phase 4 – manage and gain insight.

B. The first phase is the process of creating an inventory of all applications in the ecosystem. They fall into three categories those that can be migrated, not migrated, or marked for deprecation.

C. The second phase involves detailing the apps within the categories with criticality, usage, and lifespan. It prioritizes the application for migration and plans a pilot.

D. The third phase involves planning migration and testing by communicating changes and migrating applications and transition users.

E. The fourth phase involves managing and gaining insight by managing end-user and admin experiences and gaining insight into application and user behavior.

F. These four phases transition the application experience from old to new smoothly. Migrating from earlier version of Windows to later or from switching one SKU to another is possible.

14. Will the elastic pool scale or is it better to go with higher compute for certain workloads?

A. An elastic pool must have sufficient resources in the pool to accommodate a database. Elastic pools share compute resources between several databases on the same server. This helps to achieve performance elasticity of each database. The sharing of provisioned resources across databases reduced their unit costs. There are built-in protections against noisy neighbor problems. The architectural approach must meet the levels of the scale expected from the system.

B. Higher Compute boosts the performance for a database.

15. How do we setup geo-recovery, geo-replication, and geo-failover for restricted MTTR and RTO?

A. There is usually a delay when a backup is taken and when it is geo-restored, and the restored database can be up to one hour behind the original database. Geo-restore relies on automatically created geo-replicated backups with a recovery point objective of up to 1 hour and an estimated recovery time objective (RTO) of up to 12 hours. It does not guarantee that the target region will have the capacity to restore the database after a regional outage, because a sharp increase in demand is likely. Therefore, it is most used for small databases. Business continuity for larger databases is ensured via auto-failover groups. It has a much lower RPO and RTO and the capacity is guaranteed.

16. How to proceed with database migration from on-premises to cloud?

A. Geo-replication can also be performed for database migration with minimum downtime and application upgrades by creating an extra secondary as a fail back copy during application upgrades. An end-to-end recovery requires recovery of all components and dependent services. All components are resilient to the same failures and become available within the recovery time objective of the application. Designing cloud solutions for disaster recovery include scenarios using two Azure regions for business continuity with minimal downtime or using regions with maximum data preservation or to replicate an application to different geographies to follow demand.

17. How can virtual networks enable with securing tenants and connecting on-premises?

A. virtual networks allow name resolution to be set up. The name resolution to an IP address depends on whether there is a single instance or many instances of the multitenant application. For example, a CNAME for the custom domain of a tenant might have a value pointing to a multi-part subdomain of the multitenant application solution provider. Since this provider might want to set up proper routing to multiple instances, they might have a CNAME record for subdomains of their individual instance to route to that instance. They will also have an A name record for that specific instance to point to the IP address of the provider’s domain name. This chain of records resolves the requests for the custom domain to the IP address of the instance within the multiple instances deployed by the provider. Virtual networks also extend to on-premises.

18. What is the order of connecting a service instance privately to the enterprise application?

A. Network features such as private endpoints and disabled public network access can greatly reduce the attack surface of a data platform of an organization. The simplest solution is to host a jumpbox on the virtual network of the data management landing zone to connect to the data services through private endpoints. Azure Bastion could be a more secure alternative and it would connect to a target vm subnet over NSG.

19. How to expose nested virtual network access to the internet? Is there a gateway involved?

A. Network Watcher can be used to view the topology of an Azure Virtual Network. It can be used to monitor Azure VPN Gateways. The Get-AzureRmVirtualNetworkGatewayConnection PowerShell can be used to retrieve the connection details. If two virtual networks are linked, one of them, must have a gateway to the internet.

20. How to use a load balancer with the virtual network or for access to an application?

A. For an example deployment A virtual network interface for each VM, an internet facing load balancer, two load balancing rules, an availability set, and say two VMs are required.

21. When to use VMSS for certain migration scenarios? Do we run into specific scaling limits for peak load?

A. Scale sets support up to 1,000 VM instances for standard marketplace images and custom images through the Azure Compute Gallery. If a scale set is created using a managed image, the limit is 600 VM instances. VMSS makes it easy to create and manage VM instances, provide high availability and application resiliency, and allows applications to automatically be scaled as resource demand changes

22. When to use VMs instead of VMSS? Will it affect availability across regions? Can the VMSS be spread across regions?

A. VMs and VMSS are bound to regions. A regional scale set uses placement groups, which act as an implicit availability set with five fault domains and five update domains Scale sets of more than 100 VMs span multiple placement groups.

23. Will the VMSS require private endpoints when enterprise services are hosted.

A. The private endpoints can be created for a service on a virtual network. VMSS deploys compute.

24. What are the minimum number of instances 2 or 4 when there are paired regions involved for certain deployment scenario?

A. The resource double for paired regions. The minimum number for one region can be taken as 1 of each resource.

25. How many logging and monitoring namespaces for multi-tenants’ applications?

A. One only for all the tenants of the multitenant application.

26. What cloud services will be used for collecting and analyzing IoT traffic from edges?

A. Azure IoT Hub connects, monitors and controls billions of IoT assets. Azure TimeSeries Insights can help to explore and gain insights from the Time-Series IoT data in real-time.

B. CosmosDB and Function Apps can be used for custom processing. Azure EventHub can receive and process millions of events per second for stream processing.

27. How will we scale resources for edge traffic? What databases are best suited for certain data?

A. Time-Series data can be analyzed with Azure TimeSeries Insights.

B. Streaming data can be processed with Azure EventHub and Function Apps

28. Will a time-series database or a cosmos document store be preferred to certain application and its workload?

A. IoT traffic is best collected by Azure Event Hub and analyzed via Time-Series Insights. Document store provides many capabilities for documents including SQL queries. It is also general purpose and scales quite well. It can be deployed with separation of read-only and read-write instances.

29. What will be the order of services and namespace creations for creating a reporting dashboard for a specific purpose?

A. A data ingestion service, a data collection store, and a reporting stack in that order. Variations depend on the type of data and analysis.

30. When is a container registry prepared and does it need access to the internet and public registries?

A. If a registry is accessed over the internet, it must confirm that it allows public network access from the client. By default, the registry instance will allow access to public registry endpoints from all networks, but it can limit access to selected networks or IP addresses.

31. Will the container instances be preferred to azure functions? when is the latter better suited?

A. The function is the unit of work whereas in a container instance, the entire container contains the unit of work. So, Azure functions start, and end based on event triggers whereas the microservices in containers run all the time.

32. What are the scaling limits for either of them or which is better suited for hosting APIs?

A. By virtue of the triggering functionality, functions suffer from cold start for http invocations although it scales very well to the volume of IoT traffic. A container App is better suited to hosting APIs

Friday, October 14, 2022

This article talks about organization of data particularly keys and values.

Configuration data is stored as key-values. They are a simple and flexible representation of application settings. Keys serve as identifiers to retrieve corresponding values.

Application configuration in a multitenant solution particularly in B2B systems apply to multiple accounts usually several of them that are running on the same system. They are not a secret and not sensitive. They are not applicable to data pertaining to multiple users such as user profiles which can be in hundreds or thousands. These configurations are also edited by multiple teams not just the owning team or its development team but also technical support and other staff members.

A classic example of using a configuration key is to set a default language for an enterprise account. All the users of that account will see this language when they login.

When this configuration data is maintained in a table, then there are rows corresponding to the default language for each of the enterprise account that can be queried with SQL. The configuration store becomes a database in this case. The drawbacks to this approach are that 1) an audit solution needs to be slapped on to the database otherwise direct edits quickly become unmanageable. 2) rollback is more difficult than if it were in files. 3) It is not easy for everyone to see who made the last change 4) It doesn’t support hierarchy for an account such as departments and 5) the table proliferates for as many environments as there are.

The most common approach to storing configuration keys and values is one that facilitates hierarchy. This is best done in folder/file layout while permitting visibility into who changed what along with authenticated access and sharing across all environments. This is best done with a source code control system such as git. When the configuration is checked into the source code such as git. Some best practices continue to apply specifically to configuration key-values.

A best practice involves organizing keys in hierarchical namespaces by using a character delimiter. A convention is not mandated for multiple tenants, but it helps. Keys regardless of their format must be treated. When parsing is avoided, it is easier to not break any usages. Data used within application frameworks might dictate specific naming schemes for key-values. A combined size limit of 10KB usually applies on a key-value.

Key namespaces must be easier to read with proper use of delimiters to denote hierarchical information. They must also be easier to manage. A key-name hierarchy must represent logical groups of configuration data. They should be easy to query using pattern matching.

When there is the luxury of using a dedicated configuration service, key-values could optionally have a label attribute. Labels are used to differentiate key-values with the same key. No labels are associated initially, and key-values can be referenced without a label. A common use of labels is to denote environments for the same key. Labels can also be used to create versions. It is possible to jump forward or fall back between keys using versions. Values depend on content-type with Unicode as the most popular form. MIME types are also applicable for feature flags, Key Vault references, and JSON key-values.

Key-values can be soft-deleted which means that they can be recovered. Soft delete will act as a safeguard to scenarios including the case when a deleted app configuration store could be recovered in the retention time-period and the case when an app configuration store is permanently deleted. A soft-deleted store will be retained for a short time known as the retention period. When it elapses, the store is deleted permanently. Key-values can also be purged before the retention period expires. Permission to read and purge deleted stores are granted to owner and contributor roles by default.

Json content-type is preferable over other formats for key-values because it provides simpler data management, enhanced data export, and native support with app configuration provider. If content is directly edited in file-systems, Yaml might be more terse. When configuration data changes, Event Grid can be used to receive data change notifications. These events can trigger web hooks, Azure functions, Azure storage queues, or any other event handler. Typically, a resource group and a message endpoint are created to subscribe to the topic.