Cluster computing

Monday, October 17, 2022

Workflows (continued)

Comparisons to Deis Workflow:
Deis workflow is a platform-as-a-service that adds a developer friendly layer to any Kubernetes cluster so that applications can be deployed and managed easily. Kubernetes evolved as an industry effort from the native Linux container support of the operating system. It can be considered as a step towards a truly container centric development environment. Containers decouple applications from infrastructure which separates dev from ops.
Containers made PaaS possible. Containers help compile the code for isolation. PaaS enables applications and containers to run independently. PaaS containers were not open source. They were just proprietary to PaaS. This changed the model towards development centric container frameworks where applications could now be written with their own
Let us look at the components of the Deis workflow:
The workflow manager – checks your cluster for the latest stable components. If the components are missing. It is essentially a Workflow Doctor providing first aid to your Kubernetes cluster that requires servicing.
The monitoring subsystem consists of three components – the Telegraf, InfluxDB, and Grafana. The first is a metrics collection agent that runs using the daemon set API.The second is a database that stores the metrics collected by the first. The third is a graphing application, which natively stores the second as a data source and provides a robust engine for creating dashboards on top of time-series data.
The logging subsystem which consists of two components – first that handles log shipping and second that maintains a ring buffer of application logs
The router component which is based on Nginx and routes inbound https traffic to applications. This includes a cloud-based load balancer automatically.
The registry component which holds the application images generated from the builder component.
The object storage component where the data that needs to be stored is persisted. This is generally an off-cluster object storage.
Slugrunner is the component responsible for executing build-pack based applications. Slug is sent from the controller which helps the Slugrunner download the application slug and launch the application
The builder component is the workhorse that builds your code after it is pushed from source control.
The database component which holds most of the platform state. It is typically a relational database. The backup files are pushed to object storage. Data is not lost between backup and database restarts.
The controller which serves as the http endpoint for the overall services so that CLI and SDK plugins can be utilized.
Deis Workflow is more than just an application deployment workflow unlike CloudFoundry. It performs application rollbacks, supports zero-time app migrations at the router level and provides scheduler tag support that determines which nodes the workloads are scheduled on. Moreover, it runs on Kubernetes so other workloads can be run on Kubernetes along with these workflows. Workflow components have a “deis-” namespace that tells them apart from other Kubernetes workloads and provide building, logging, release and rollback, authentication and routing functionalities all exposed via a REST API. This is a layer distinct from the Kubernetes. While Deis provides workflows, Kubernetes provides orchestration and scheduling.

Comparision to Azure DevOps:
Some tenets for organization from ADO have parallels in Workflow management systems:
·        Projects can be added to support different business units
·        Within a project, teams can be added
·        Repositories and branches can be added for a team
·        Agents, agent pools, and deployment pools to support continuous integration and deployment
·        Many users can be managed using the Azure Active Directory.

Conclusion:
The separation of workflows from resources and built-to-scale design is a pattern that makes both workflows and resources equally affordable to customers.

Sunday, October 16, 2022

This article focuses on some of the best practices for working with workflows that deploy services. The tenets are:

· Reusability – many of the activity from the library of activities for one workflow can and will be reused for another. Very few workflows might have differences in doing tasks that were not covered by the global collection of activities. There should not be any difference between an activity that appears in bootstrapping and its invocation during redeployment/ rehosting in the new environment. Only the parameter values will change for this.

· Dependencies – many of the dependencies will be implicit as they originate from system components and services information. A workflow might additionally specify dependencies via the standard way in which workflows indicate dependencies. These will be on a case-by-case basis for tenants since it adds overhead to other services, many of whom are standalone. Implicit dependencies can be articulated in the format specified by the involved components.

· Splitting – Workflows are written for on-demand invocation from the web interface or by the system, so there might be more than one for a specific deployment scenario. It is best to include both the bootstrapping and the redeploy in the main workflow for the specific scenario, but they will be mutually exclusive during their respective phases and remain idempotent.

· Idempotency – All workflow steps and activities should be idempotent. If there are conditionals involved, they must be part of activities. The signaling and receiving notifications of dependent workflows if any must be specifically called out.

· Bootstrapping – This phase is common to many services and usually requires at least a cluster/set of servers to be made ready but there might be activities that require the service stamp to be deployed even if it is not configured along with necessary activities to do one time preparation such as getting secrets. Until the VIPs are ready, the redeployment cannot be kicked off. Bootstrapping might involve preparations for both primary and secondary where applicable.

· Redeployment or rehosting – This phase involves configuration since the bootstrapping is usually for a stamp and this stage converts it into a deployment for a service. Since it involves reconfiguration, it can be for both primary and secondary and typically done inside the new cloud. It is best to parameterize as much as possible.

· Naming convention – Though workflows can have any names inside the package that the owning teams upload, it is best to follow a convention for the specific scenario of one workflow calling another. Standalone single workflows do not have this problem. Even in the case when there are many workflows, a prefix/suffix might be helpful. This applies to both work workflows and activities.

· System workflow – Requiring separate workflows for bootstrap and redeployment via a system defined workflow to allow system to inject system defined activities between bootstrap and redeploy is a nice-to-have but the less intrusion into service deployment the better. This calls on the service to do their own tracking via passing parameter values between workflows and activities. A standard need not be specified for this, and it can be left to the discretion of the services.

The above list is not intended to be complete but focuses on the strengths of those that have worked well.

Saturday, October 15, 2022

This section refers some of the documentation for a certification in AZ-305.

1. Multiple tenants – enable access for developers of one tenant in another

A. A trust relationship must be setup between the DC receiving the request and the DC in the domain of the requesting account. Forest trusts help to manage a segmented AD DS infrastructures and support access to resources and other objects. Only one-way Transitive relationships are allowed. Federation is a collection of domains that have established trust.

2. How to setup single tenancy and operations that are restricted for single tenant auth?

A. This is required when the traditional approach to restricting access to domains names or IP addresses does not work for SaaS apps or for shared domain names. With tenant restrictions from Azure AD and SSO for the applications used, access can be controlled.

3. Identity protection versus monitoring, specifically services and purposes

A. Both security center and Azure sentinel can be used for Security, but the former helps to collect, prevent, and detect via analytics, the latter helps to detect via hunting, investigating via incidents and responding via automation.

4. What identity protection will protect from bot attack?
A. Azure AD Identity protection protects from bot attack. On-premises AD identity protection There are three key reports that administrators use for investigations in Identity Protection:

a. Risky users

b. Risky sign-ins

c. Risk detections

5. On-premises integration with Azure AD so that on-premises experience is not broken

There are two ways to do this:

1. Use Azure AD to create an Active Directory domain in the cloud and connect it to the on-premises Active Directory domain. Azure AD Connect integrates the on-premises directories with Azure AD.

2. Extend the existing on-premises Active Directory infrastructure to Azure, by deploying a VM in Azure that runs AD DS as a Domain Controller. This architecture is more common when the on-premises network and the Azure virtual network (VNet) are connected by a VPN or ExpressRoute connection. Several variations are possible:

a. a domain is created in Azure, and it is joined to the on-premises AD forest.

b. a separate forest is created in Azure that is trusted by domains in the on-premises forest.

c. an Active Directory Federation Services (AD FS) deployment is replicated to Azure.

6. Order of setting up service resources and tasks for AD integration of on-premises.

A. This includes Active Directory, Active Directory Domain Services, AD Federation Services.

7. Conditional access policies versus azure policies – when to use what?

A. Azure AD Conditional access can help author conditions such as when the password authentication must be turned off for legacy applications based on DateTime or other such criteria.

B. A policy is a default allow and explicit deny system focused on resource properties during deployment and for already existing resources. It supports cloud governance with compliance.

8. Can a blueprint be used to force hierarchy of resources specific to region?

A. Azure Blueprints can be used to assign policies in how resource templates are deployed which can affect multiple resources, it helps adhere to an organization’s standards, patterns, and best practices. It cannot be used to specify role assignments. It can consist of one or more policies.

9. Limits of resources and subscriptions? Can a tenant have more than one subscription?

A. When we run a single instance of resource, the service limits, subscription limits and the quota apply. When these limits are encountered, the shared resources must be scaled out.

10. Do we need availability zone redundancy or geo-redundancy?

A. Some tradeoffs based on cost (az is free, region is not), overhead (deploying to additional regions implies additional instances that may need to be monitored and read-only separation is possible only in the case of geo-redundancy.

11. Azure SQL managed instances – appropriateness over elastic pools and higher compute

A. Each elastic pool is contained within a single logical server. Database names must be unique in a pool so multiple geo secondaries cannot share the same pool.

12. How many databases per tenant?

A. a tenant database dedicated to store the company’s business data. The knowledge about the shared application is then stored in a dedicated application database.

13. How to perform migration of applications from on-premises to Azure – choose appropriate database instance, service and SKU

A. The four phases of migration include phase 1 – discover and scope, phase 2 – classify and plan, phase 3 – plan migration and testing, and phase 4 – manage and gain insight.

B. The first phase is the process of creating an inventory of all applications in the ecosystem. They fall into three categories those that can be migrated, not migrated, or marked for deprecation.

C. The second phase involves detailing the apps within the categories with criticality, usage, and lifespan. It prioritizes the application for migration and plans a pilot.

D. The third phase involves planning migration and testing by communicating changes and migrating applications and transition users.

E. The fourth phase involves managing and gaining insight by managing end-user and admin experiences and gaining insight into application and user behavior.

F. These four phases transition the application experience from old to new smoothly. Migrating from earlier version of Windows to later or from switching one SKU to another is possible.

14. Will the elastic pool scale or is it better to go with higher compute for certain workloads?

A. An elastic pool must have sufficient resources in the pool to accommodate a database. Elastic pools share compute resources between several databases on the same server. This helps to achieve performance elasticity of each database. The sharing of provisioned resources across databases reduced their unit costs. There are built-in protections against noisy neighbor problems. The architectural approach must meet the levels of the scale expected from the system.

B. Higher Compute boosts the performance for a database.

15. How do we setup geo-recovery, geo-replication, and geo-failover for restricted MTTR and RTO?

A. There is usually a delay when a backup is taken and when it is geo-restored, and the restored database can be up to one hour behind the original database. Geo-restore relies on automatically created geo-replicated backups with a recovery point objective of up to 1 hour and an estimated recovery time objective (RTO) of up to 12 hours. It does not guarantee that the target region will have the capacity to restore the database after a regional outage, because a sharp increase in demand is likely. Therefore, it is most used for small databases. Business continuity for larger databases is ensured via auto-failover groups. It has a much lower RPO and RTO and the capacity is guaranteed.

16. How to proceed with database migration from on-premises to cloud?

A. Geo-replication can also be performed for database migration with minimum downtime and application upgrades by creating an extra secondary as a fail back copy during application upgrades. An end-to-end recovery requires recovery of all components and dependent services. All components are resilient to the same failures and become available within the recovery time objective of the application. Designing cloud solutions for disaster recovery include scenarios using two Azure regions for business continuity with minimal downtime or using regions with maximum data preservation or to replicate an application to different geographies to follow demand.

17. How can virtual networks enable with securing tenants and connecting on-premises?

A. virtual networks allow name resolution to be set up. The name resolution to an IP address depends on whether there is a single instance or many instances of the multitenant application. For example, a CNAME for the custom domain of a tenant might have a value pointing to a multi-part subdomain of the multitenant application solution provider. Since this provider might want to set up proper routing to multiple instances, they might have a CNAME record for subdomains of their individual instance to route to that instance. They will also have an A name record for that specific instance to point to the IP address of the provider’s domain name. This chain of records resolves the requests for the custom domain to the IP address of the instance within the multiple instances deployed by the provider. Virtual networks also extend to on-premises.

18. What is the order of connecting a service instance privately to the enterprise application?

A. Network features such as private endpoints and disabled public network access can greatly reduce the attack surface of a data platform of an organization. The simplest solution is to host a jumpbox on the virtual network of the data management landing zone to connect to the data services through private endpoints. Azure Bastion could be a more secure alternative and it would connect to a target vm subnet over NSG.

19. How to expose nested virtual network access to the internet? Is there a gateway involved?

A. Network Watcher can be used to view the topology of an Azure Virtual Network. It can be used to monitor Azure VPN Gateways. The Get-AzureRmVirtualNetworkGatewayConnection PowerShell can be used to retrieve the connection details. If two virtual networks are linked, one of them, must have a gateway to the internet.

20. How to use a load balancer with the virtual network or for access to an application?

A. For an example deployment A virtual network interface for each VM, an internet facing load balancer, two load balancing rules, an availability set, and say two VMs are required.

21. When to use VMSS for certain migration scenarios? Do we run into specific scaling limits for peak load?

A. Scale sets support up to 1,000 VM instances for standard marketplace images and custom images through the Azure Compute Gallery. If a scale set is created using a managed image, the limit is 600 VM instances. VMSS makes it easy to create and manage VM instances, provide high availability and application resiliency, and allows applications to automatically be scaled as resource demand changes

22. When to use VMs instead of VMSS? Will it affect availability across regions? Can the VMSS be spread across regions?

A. VMs and VMSS are bound to regions. A regional scale set uses placement groups, which act as an implicit availability set with five fault domains and five update domains Scale sets of more than 100 VMs span multiple placement groups.

23. Will the VMSS require private endpoints when enterprise services are hosted.

A. The private endpoints can be created for a service on a virtual network. VMSS deploys compute.

24. What are the minimum number of instances 2 or 4 when there are paired regions involved for certain deployment scenario?

A. The resource double for paired regions. The minimum number for one region can be taken as 1 of each resource.

25. How many logging and monitoring namespaces for multi-tenants’ applications?

A. One only for all the tenants of the multitenant application.

26. What cloud services will be used for collecting and analyzing IoT traffic from edges?

A. Azure IoT Hub connects, monitors and controls billions of IoT assets. Azure TimeSeries Insights can help to explore and gain insights from the Time-Series IoT data in real-time.

B. CosmosDB and Function Apps can be used for custom processing. Azure EventHub can receive and process millions of events per second for stream processing.

27. How will we scale resources for edge traffic? What databases are best suited for certain data?

A. Time-Series data can be analyzed with Azure TimeSeries Insights.

B. Streaming data can be processed with Azure EventHub and Function Apps

28. Will a time-series database or a cosmos document store be preferred to certain application and its workload?

A. IoT traffic is best collected by Azure Event Hub and analyzed via Time-Series Insights. Document store provides many capabilities for documents including SQL queries. It is also general purpose and scales quite well. It can be deployed with separation of read-only and read-write instances.

29. What will be the order of services and namespace creations for creating a reporting dashboard for a specific purpose?

A. A data ingestion service, a data collection store, and a reporting stack in that order. Variations depend on the type of data and analysis.

30. When is a container registry prepared and does it need access to the internet and public registries?

A. If a registry is accessed over the internet, it must confirm that it allows public network access from the client. By default, the registry instance will allow access to public registry endpoints from all networks, but it can limit access to selected networks or IP addresses.

31. Will the container instances be preferred to azure functions? when is the latter better suited?

A. The function is the unit of work whereas in a container instance, the entire container contains the unit of work. So, Azure functions start, and end based on event triggers whereas the microservices in containers run all the time.

32. What are the scaling limits for either of them or which is better suited for hosting APIs?

A. By virtue of the triggering functionality, functions suffer from cold start for http invocations although it scales very well to the volume of IoT traffic. A container App is better suited to hosting APIs