Cluster computing

Saturday, March 18, 2023

Linux Kernel:

Linux Kernel is a small and special code within the core of the Linux Operating System and directly interacts with the hardware. It involves process management, process scheduling, system calls, interrupt handling, bottom halves, kernel synchronization and its techniques, memory management and process address space.

A process is the program being executed on the processor. Threads are the objects of activity within the process. Kernel schedules individual threads. Linux does not differentiate between thread and process. A multi-threaded program can have multiple processes. A process is created using the fork call. Fork call will return in the child process and in the parent process. At the time of the fork, all the resources are copied from the parent to the child. When the exec call is called, the new address space is loaded for the process. Linux kernel maintains a doubly linked list of task structures pertaining to the processes and refers to them with process descriptors which are used to keep information regarding the processes. The size of the process structure depends on the architecture of the machine. For 32-bit machines, it is about 1.7KB. Task structure gets stored in memory using kernel stack for each process. A process kernel stack has a low memory address and a high memory address. The stack grows from the high memory address to the low memory address and its front can be found with the stack pointer. The thread_struct and the task_struct are stored on this process address space towards the low memory address. PID helps to identify the process from one of thousands. The thread_info_struct is used to conserve memory because storing1.7KB in a 4KB process space uses up a lot. It has pointers to the task_struct structures. The pointer is a redirection to the actual data structure and uses up a very tiny space. The maximum number of processes in linux can be set in the configuration at pid_max in the nested directories of proc, sys and kernel. A current macro points to the currently executing task_struct structure. The processes can be in different process states. The first state to enter when the process is forked is the ready state. When the scheduler dispatches the task to run, it enters the running state and when the task exits, it is terminated. A task can switch between running and ready many times by going through an intermediary state called the waiting state or the interruptible state. In this state, the task sleeps on the waiting queue for a specific event. When an event occurs, and a task is woken up and placed back on the queue. This state is visible in the task_struct structure of every process. To manipulate the current process stack, there is an API to set_task_state. The process context is the context in which the kernel executes on behalf of the process. It is triggered by a system call. Current macro is not valid in the interrupt context. Init process is the first process that gets created which then forks other user space processes. The etc tab entries and init tab entries keep track of all the processes and daemons to create. A process tree helps organize the processes. A copy on write is a technique which makes a copy of the address space when a child edits it. Until that time, all reading child processes can continue to use only one instance. The set of resources such as virtual memory, file system, and signals that can be shared are determined by the clone system call which is invoked as part of the fork system call. If the page tables need to be copied, then a vfork system call is called instead of fork. Kernel threads only run within the kernel and do not have an associated process space. Flush is an example of kernel thread. The ps -ef command lists all the kernel threads. All the tasks that were undertaken at the time of fork are reversed at the time of the process exit. The process descriptor will be removed when all the references to it are removed. A zombie process is not in the running process. A zombie process is one that is not in the running state, but its process descriptor still lingers. A process that exists before the child is a case where the child becomes parentless. The kernel provides the child with new parents.

Friday, March 17, 2023

SQL Schema

Table: Books

+----------------+---------+

| Column Name | Type |

+----------------+---------+

| book_id | int |

| name | varchar |

| available_from | date |

+----------------+---------+

book_id is the primary key of this table.

Table: Orders

+----------------+---------+

| Column Name | Type |

+----------------+---------+

| order_id | int |

| book_id | int |

| quantity | int |

| dispatch_date | date |

+----------------+---------+

order_id is the primary key of this table.

book_id is a foreign key to the Books table.

Write an SQL query that reports the books that have sold less than 10 copies in the last year, excluding books that have been available for less than one month from today. Assume today is 2019-06-23.

Return the result table in any order.

The query result format is in the following example.

Example 1:

Input:

Books table:

+---------+--------------------+----------------+

| book_id | name | available_from |

+---------+--------------------+----------------+

| 1 | "Kalila And Demna" | 2010-01-01 |

| 2 | "28 Letters" | 2012-05-12 |

| 3 | "The Hobbit" | 2019-06-10 |

| 4 | "13 Reasons Why" | 2019-06-01 |

| 5 | "The Hunger Games" | 2008-09-21 |

+---------+--------------------+----------------+

Orders table:

+----------+---------+----------+---------------+

+----------+---------+----------+---------------+

| 1 | 1 | 2 | 2018-07-26 |

| 2 | 1 | 1 | 2018-11-05 |

| 3 | 3 | 8 | 2019-06-11 |

| 4 | 4 | 6 | 2019-06-05 |

| 5 | 4 | 5 | 2019-06-20 |

| 6 | 5 | 9 | 2009-02-02 |

| 7 | 5 | 8 | 2010-04-13 |

+----------+---------+----------+---------------+

Output:

+-----------+--------------------+

| book_id | name |

+-----------+--------------------+

| 1 | "Kalila And Demna" |

| 2 | "28 Letters" |

| 5 | "The Hunger Games" |

+-----------+--------------------+

SELECT DISTINCT b.book_id, b.name

FROM books b

LEFT JOIN Orders o on b.book_id = o.book_id

GROUP BY b.book_id, b.name,

DATEDIFF(day, DATEADD(year, -1, '2019-06-23'), o.dispatch_date),

DATEDIFF(day, b.available_from, DATEADD(month, -1, '2019-06-23'))

HAVING SUM(o.quantity) IS NULL OR

DATEDIFF(day, DATEADD(year, -1, '2019-06-23'), o.dispatch_date) < 0 OR

(DATEDIFF(day, DATEADD(year, -1, '2019-06-23'), o.dispatch_date) > 0 AND DATEDIFF(day, b.available_from, DATEADD(month, -1, '2019-06-23')) > 0 AND SUM(o.quantity) < 10);

Case 1

Input

Books =

| book_id | name | available_from |
| ------- | ---------------- | -------------- |
| 1 | Kalila And Demna | 2010-01-01 |
| 2 | 28 Letters | 2012-05-12 |
| 3 | The Hobbit | 2019-06-10 |
| 4 | 13 Reasons Why | 2019-06-01 |
| 5 | The Hunger Games | 2008-09-21 |

Orders =

| order_id | book_id | quantity | dispatch_date |
| -------- | ------- | -------- | ------------- |
| 1 | 1 | 2 | 2018-07-26 |
| 2 | 1 | 1 | 2018-11-05 |
| 3 | 3 | 8 | 2019-06-11 |
| 4 | 4 | 6 | 2019-06-05 |
| 5 | 4 | 5 | 2019-06-20 |
| 6 | 5 | 9 | 2009-02-02 |
| 7 | 5 | 8 | 2010-04-13 |

Output

| book_id | name |
| ------- | ---------------- |
| 2 | 28 Letters |
| 1 | Kalila And Demna |
| 5 | The Hunger Games |

Expected

| book_id | name |
| ------- | ---------------- |
| 1 | Kalila And Demna |
| 2 | 28 Letters |
| 5 | The Hunger Games |

Thursday, March 16, 2023

Improvements to Azure from application modernization purposes:

As fears for a global slowdown are gripping the tech industry, organizations planning their digital transformation must do more with less. Two improvements are suggested to the Azure public cloud in this essay. First, the development of a tool that can extract the interfaces from legacy applications source code and stage them for a microservice transformation. Second, the rollout of a pre-assembled and pre-configured set of Azure resources that makes it easy to deploy various applications.

Azure already has significant innovations as a cost-effective differentiation from its nearest competitor and these two improvements will help those organizations with a charter for cloud adoption to make the leap.

Azure has claims to provide savings of up to 54% over running applications on-premises and 35% over running them on AWS as per their media reports. Streamlined operations, simplified administration and proximity are the other additional benefits. Built-in tools from Visual Studio and MSSQL provide convenience to migrations for applications and databases respectively. The differentiating features for Azure over its competitor for the purposes of such savings include the Hybrid benefit and the TCO calculator. The Hybrid Benefit is a licensing offer that helps migration to Azure by applying existing licenses to Windows Azure, SQL Server and Linux subscriptions. Additionally, services like Azure Arc help to use Azure Kubernetes Service and Azure Stack for Hyperconverged clustering solution to run virtualized workloads on-premises which makes it easy to consolidate aging infrastructure and connect to Azure for cloud services. The TCO calculator helps to understand the cost areas that affect the current applications today such as server hardware, software licenses, electricity, and labor. It recommends a set of equivalent services in Azure that will support the applications and helps to create a customized business case to justify migration to Azure. All it takes is a set of three steps: enter a few details about the current infrastructure, review the assumptions and receive a summary with supporting analysis.

The features asked here include analysis for legacy applications that can explain how to convert the applications to a microservice architecture. An extractor that generates the KDM model from legacy application code can be automated by virtue of understanding the interfaces and whether they are candidates for segregation into microservices. Dedicated parsers can help with this code-to-model transformation. The restructuring phase aims at deriving an enriched conceptual technology independent specification of the legacy system in a knowledge model KDM from the information stored inside the models generated on the previous phase. KDM is an OMG standard and can involve up to four layers: Infrastructure layer, Program Elements layer, resource layer and abstractions layer. Each layer is dedicated to a particular application viewpoint. The forward engineering is a process of moving from high-level abstractions by means of transformational techniques to automatically obtain representation on a new platform such as microservices or as constructs in a programming language such as interfaces and classes. Even the user interface can go through forward engineering into a Rich single page application with a new representation describing the organization and positioning of components. Segregation of interfaces into microservices is easier with well-known patterns such as model-view-controller. Data access, profiling and instrumentation or bookkeeping at interface level can also add useful information to the organization of interfaces for the purpose of extracting microservices. Software measurement metamodels have played a significant role in forward engineering.

The second ask is about the deployment of hosts and resources native to the cloud. Azure can provide dedicated blueprints that include policies and templates for the purposes of hosting a specific type of modernized application. Microservices, for instance, requires load balancers, cache, containers, ingress and data connections and a modernized application can be more easily deployed when the developers do not have to author the infrastructure and can rely on pre-assembled resources specific to their needs. Azure Service Fabric became a veritable resource to suit such purpose but the ask here is to create a blueprint that also establishes the size of the stamp.

These asks are like full-service for application migration and modernization teams.

Wednesday, March 15, 2023

Problem 1: Triangle Judgement

SQL Schema

Table: Triangle

+-------------+------+
| Column Name | Type |
+-------------+------+
| x           | int |
| y           | int |
| z           | int |
+-------------+------+
(x, y, z) is the primary key column for this table.
Each row of this table contains the lengths of three line segments.

Write an SQL query to report for every three-line segments whether they can form a triangle.

Return the result table in any order.

The query result format is in the following example.

Example 1:

Input:
Triangle table:
+----+----+----+
| x | y | z |
+----+----+----+
| 13 | 15 | 30 |
| 10 | 20 | 15 |
+----+----+----+
Output:
+----+----+----+----------+
| x | y | z | triangle |
+----+----+----+----------+
| 13 | 15 | 30 | No |
| 10 | 20 | 15 | Yes |
+----+----+----+----------+

SELECT x, y, z,

CASE

WHEN x + y > z AND y + z > x AND x + z > y THEN 'Yes'

ELSE 'No'

END AS `triangle`

FROM Triangle;

Case 1

Input

Triangle =

| x | y | z | | -- | -- | -- | | 13 | 15 | 30 | | 10 | 20 | 15 |

Output

| x | y | z | triangle | | -- | -- | -- | -------- | | 13 | 15 | 30 | No | | 10 | 20 | 15 | Yes |

Tuesday, March 14, 2023

Shrinking budgets pose tremendous challenge to organizations with their digital transformation initiatives and cloud adoption roadmap. Technology decision makers must decide what to do with legacy applications that have proliferated prior to the pandemic. There are three main choices available: maintain the status quo and do nothing, migrate and modernize the applications to a modern cloud-based environment or rewrite and replace them. The last one might be tempting given the various capabilities introduced by both AWS and Azure and refreshed knowledge base about the application to be transformed but lift-and-shift costs have been brought down by both the clouds.

As a specific example, significant cost savings can be achieved with just migrating legacy ASP.Net applications from on-premises to the cloud. Traditional .NET applications are well poised for migration by virtue of the .NET runtime on which they run. Azure has claims to provide savings of up to 54% over running applications on-premises and 35% over running them on AWS as per their media reports. Streamlined operations, simplified administration and proximity are the other additional benefits. Built-in tools from Visual Studio and MSSQL provide convenience to migrations for applications and databases respectively.

One of the key differences between the migrations to either public cloud is the offering for Hybrid Benefit from Azure. The Hybrid Benefit is a licensing offer that helps migration to Azure by applying existing licenses to Windows Azure, SQL Server and Linux subscriptions that can realize substantial cost savings. Additionally, services like Azure Arc help to use Azure Kubernetes Service and Azure Stack for Hyperconverged clustering solution to run virtualized workloads on-premises which makes it easy to consolidate aging infrastructure and connect to Azure for cloud services.

Another difference between the migrations to either public cloud is the offering of a calculator to calculate Total Cost of Ownership by Azure. The TCO calculator helps to understand the cost areas that affect the current applications today such as server hardware, software licenses, electricity and labor. It recommends a set of equivalent services in Azure that will support the applications. The analysis shows each cost area with an estimate of the on-premises spending versus the spending in Azure. There are several cost categories that either decrease or go away completely when moving workloads to the cloud. Finally, it helps to create a customized business case to justify migration to Azure. All it takes is a set of three steps: enter a few details about the current infrastructure, review the assumptions and receive a summary with supporting analysis.

The only limitation that an organization faces is one that is self-imposed. Organizations and big company departments might be averse to their employees increasing their cloud budget to anything beyond a thousand dollars a month. This is not the only gap. Business owners cite those existing channels of supply and demand are becoming savvy in their competition with the cloud while the architects do not truly enforce the right practice to keep the overall budget of cloud computing expenses to be under a limit. Employees and resource users are being secured by role-based access control but the privilege to manage subscriptions is granted to those users which allows them to disproportionately escalate costs.

When this is overcome, the benefits outweigh the costs and apprehension.

Monday, March 13, 2023

Continuous event analysis for Fleet Management software:

Use case: Continuous events from fleet management operations involve data that pertain to geospatial analytics and driverless vehicles weblogs in clickstream analytics and point of sale from inventory control. The real-time fleet management of a station involves routing incoming vehicles through the station and scheduling their departures with the objective of optimizing punctuality and regularity of transit service. The purpose is to develop an automated vehicle traffic control system. The scheduling problem is modeled as a bicriteria job shop scheduling problem with additional constraints. There are two objective functions in lexicographical order: first, the minimization of tardiness/earliness and second, the headway optimization. This problem is solved in two steps. A heuristic builds a feasible solution by considering the first objective function. Then the regularity is optimized. This also works well for simulated data at the station. This article investigates the use of a data pipeline and cloud native resources for the management of a fleet.

Implementing a data pipeline:

The example taken here is with regards to the Azure public cloud for pointing to specific products and principles, but any equivalent public cloud resources can be used. There is a point of ingestion from data sources typically via Azure Event Hubs, IoT hub, or BLOB storage. Even tottering options and time windows can be suitably adjusted to perform aggregations. The language of query is SQL, and it can be extended with JavaScript or C sharp user-defined functions. Queries written in SQL are easy to apply to filtering, sorting, and aggregation. Open-source stream analytics software such as Apache Flink also provide SQL like querying ability in addition to the structured query operations familiar with collections and per event processing methods. The topology between ingestion and delivery is handled by this stream analytics service while allowing extensions with the help of reference data stores, Azure functions, and real-time scoring via machine learning services. Event Hubs, Azure BLOB storage, and IoT hubs can collect data on the ingestion side, while they are distributed after analysis via alerts and notifications, dynamic dashboarding, data warehousing, and storage/archival. The fan-out of data to different services is itself a value addition but the ability to transform events into processed events also generates more possibilities for downstream usages including reporting and visualizations. As with all the services in the Azure portfolio, a data pipeline comes with standard deployment using Azure resource manager templates, health monitoring via Azure monitoring, billing usages that can drive down costs, and various forms of programmability options such as SDK, REST-based API services, command-line interfaces, and PowerShell automation. It can be offered as a fully managed PaaS offering so the infrastructure and workflow initializers need not be set up by hand for most deployments. It can also run directly in the cloud instead of an infrastructure like Kubernetes hosted in the cloud and scale to many events with relatively low latency. Such a cloud native continuous event fleet management service can not only be production ready but also reliable in mission-critical deployments. Security and compliance are not sacrificed for the sake of performance as is typical with the best practices of cloud resources.

Sunday, March 12, 2023

MySQL managed instance in the cloud:

Organizations planning to switch to cloud often find a suite of small scale monitoring applications that need to be migrated to the cloud. These are small applications typically persisting state in a MySQL backend store. Among the choices that they have, they include are re-host, re-platform or re-architect.

Monitoring applications are usually written with the intent to monitor resources continuously, catch issues before they become a bottleneck for the operations, understand what is going on and why and prepare contingency plans beforehand.

A simple sample monitoring application when deployed to a public cloud has the following topology usually. It has entry-points via a load balancer for the frontend that is accessible over the internet for its customers and a CLI/CloudShell for the administrators. These entry points reach resources that are deployed within a VNet that spans the web tier and data tier. There can be load balancers before those tiers are accessed because it helps to spread out the traffic for high availability and low latency. The data tier might consist of a flexible MySQL server which uses a read replica.

When we modernize an existing application, we can ease our move to the cloud with the full promise of cloud technology. With a cloud native microservice approach, scalability and flexibility inherent to the cloud can be taken advantage of. Modernizing the cloud native applications enables applications to run concurrently and seamlessly connect with existing investments. Barriers that prohibit productivity and integration are removed.

One of the tenets of modernizing involves "Build-once-and-deploy-on-any-cloud". This process begins with assessing the existing application, building the applications quickly, automating the deployments for productivity and run and consistently manage the modernized application.

Identifying applications that can be readily moved into the cloud platform and those that require refactoring is the first step because the treatments of lift-and-shift and refactoring are quite different. Leveraging containers as the foundation for applications and services is another aspect.

Automating deployments for productivity with a DevOps pipeline makes it quick and reliable. A common management approach to consolidate operations for all applications ensure faster problem resolution.

When the application readiness is assessed, there are four tracks of investigation: cloud migration, cost reduction, agile delivery and innovation resulting in VMs in the cloud for migration purposes or containers for repackaging, re-platforming and refactoring respectively - all of these in the build phase of the build-deploy and run. While VMs are handled by migration accelerators in the deploy phases, the containers are handled by the modern DevOps pipelines in the deploy phase. The modern application runtimes for containers are also different from the common operations on VMs between the migration and modernization paths in the run phase. Finally, the migration results in a complex relocated traditional application while the modernization results in traditional application via repackaging, cloud ready application via re-platforming and cloud native application via refactoring.