Tuesday, November 12, 2024

 Amont the control methods for UAV swarm, Dynamic Formation Changes is the one holding the most promise for morphing from one virtual structure to another. When there is no outside influence or data driven flight management, coming up with the next virtual structure is an easier articulation for the swarm pilot.

It is usually helpful to plan out up to two or three virtual structures in advance for a UAV swarm to seamlessly morph from one holding position to another. This macro and micro movements can even be delegated to humans and UAV swarm respectively because given initial and final positions, the autonomous UAV can make tactical moves efficiently and the humans can generate the overall workflow given the absence of a three-dimensional GPS based map.

Virtual structure generation can even be synthesized from images with object detection and appropriate scaling. So virtual structures are not necessarily input by humans. In a perfect world, UAV swarms launch from packed formation to take positions in a matrix in the air and then morph from one position to another given the signals they receive.

There are several morphing algorithms that reduce the distances between initial and final positions of the drones during transition between virtual structures. These include but are not limited to:

1. Thin-plate splines aka TPS algorithm: that adapts to minimize deformation of the swarm’s formation while avoiding obstacles. It uses a non-rigid mapping function to reduce lag caused by maneuvers.

2. Non-rigid Mapping function: This function helps reduce the lag caused by maneuvers, making the swarm more responsive and energy efficient.

3. Distributed assignment and optimization protocol: this protocol enables uav swarms to construct and reconfigure formations dynamically as the number of UAV changes.

4. Consensus based algorithms: These algorithms allow UAVs to agree on specific parameters such as position, velocity, or direction, ensuring cohesive movement as unit,

5. Leader-follower method: This method involves a designated leader UAV guiding the formation, with other UAV following its path.

The essential idea behind the transition can be listed as the following steps:

1. Select random control points

2. Create a grid and use TPS to interpolate value on this grid

3. Visualize the original control points and the interpolated surface.

A sample python implementation might look like so:

import numpy as np

from scipy.interpolate import Rbf

import matplotlib.pyplot as plt

# Define the control points

x = np.random.rand(10) * 10

y = np.random.rand(10) * 10

z = np.sin(x) + np.cos(y)

# Create the TPS interpolator

tps = Rbf(x, y, z, function='thin_plate')

# Define a grid for interpolation

x_grid, y_grid = np.meshgrid(np.linspace(0, 10, 100), np.linspace(0, 10, 100))

z_grid = tps(x_grid, y_grid)

# Plot the original points and the TPS interpolation

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

ax.scatter(x, y, z, color='red', label='Control Points')

ax.plot_surface(x_grid, y_grid, z_grid, cmap='viridis', alpha=0.6)

ax.set_xlabel('X axis')

ax.set_ylabel('Y axis')

ax.set_zlabel('Z axis')

ax.legend()

plt.show()

Reference: previous post

Monday, November 11, 2024

 In Infrastructure Engineering, control plane and data plane both serve different purposes and often engineers want to manage only those that are finite, bounded and are open to management and monitoring. If the number far exceeds those that can be managed, it is better to separate resources and data. For example, when there are several drones to be inventoried and managed for interaction with cloud services, it is not necessary to create a pseudo-resource representing each of the drones. Instead, a composite cloud resource set representing a management object can be created for the fleet and almost all of the drones can be kept as data in a corresponding database maintained by that object. Let us go deep into this example for controlling UAV swarm movement via cloud resources.

First an overview of the control methods is necessary.There are several such as leader-follower, virtual structure, behavior-based, consensus-based, and artificial potential field and advanced AI-based methods (like artificial neural networks and deep reinforcement learning).

There are advantages and limitations of each approach, showcasing how conventional methods offer reliability and simplicity, while AI-based strategies provide adaptability and sophisticated optimization capabilities.

There is a critical need for innovative solutions and interdisciplinary approaches combining conventional and AI methods to overcome existing challenges and fully exploit the potential of UAV swarms in various applications, so the infrastructure and solution accelerator stacks must enable switching from one AI model to another or even change direction from one control strategy to another.

Real-world applications for UAV swarms are not only present, they are the future. So this case study is justified by wide-ranging applications across fields such as military affairs, agriculture, search and rescue operations, environmental monitoring, and delivery services.

Now, for a little more detail on the control methods to select one that we can leverage for a cloud representation. These control methods include:

Leader-Follower Method: This method involves a designated leader UAV guiding the formation, with other UAVs following its path. It's simple and effective but can be limited by the leader's capabilities.

Virtual Structure Method: UAVs maintain relative positions to a virtual structure, which moves according to the desired formation. This method is flexible but requires precise control algorithms.

Behavior-Based Method: UAVs follow simple rules based on their interactions with neighboring UAVs, mimicking natural swarm behaviors. This method is robust but can be unpredictable in complex scenarios.

Consensus-Based Method: UAVs communicate and reach a consensus on their positions to form the desired shape. This method is reliable and scalable but can be slow in large swarms.

Artificial Potential Field Method: UAVs are guided by virtual forces that attract them to the desired formation and repel them from obstacles. This method is intuitive but can suffer from local minima issues.

Artificial Neural Networks (ANN): ANN-based methods use machine learning to adaptively control UAV formations. These methods are highly adaptable but require significant computational resources.

Deep Reinforcement Learning (DRL): DRL-based methods use advanced AI techniques to optimize UAV swarm control. These methods are highly sophisticated and can handle complex environments but are computationally intensive.

Out of these, the virtual structure method inherently leverages both the drones capabilities to find appropriate positions on the virtual structure as well as their ability to limit the movements to reach their final position and orientation.

Some specific examples and details include:

Example 1: Circular Formation

Scenario: UAVs need to form a circular pattern.

Method: A virtual structure in the shape of a circle is defined. Each UAV maintains a fixed distance from this virtual circle, effectively forming a circular formation around it.

Advantages: This method is simple and intuitive, making it easy to implement and control.

Example 2: Line Formation

Scenario: UAVs need to form a straight line.

Method: A virtual structure in the shape of a line is defined. Each UAV maintains a fixed distance from this virtual line, forming a straight line formation.

Advantages: This method is effective for tasks requiring linear arrangements, such as search and rescue operations.

Example 3: Complex Shapes

Scenario: UAVs need to form complex shapes like a star or polygon.

Method: A virtual structure in the desired complex shape is defined. Each UAV maintains a fixed distance from this virtual structure, forming the complex shape.

Advantages: This method allows for the creation of intricate formations, useful in tasks requiring precise positioning.

Example 4: Dynamic Formation Changes

Scenario: UAVs need to change formations dynamically during a mission.

Method: The virtual structure is updated in real-time according to the mission requirements, and UAVs adjust their positions accordingly.

Advantages: This method provides flexibility and adaptability, essential for dynamic and unpredictable environments.


Sunday, November 10, 2024

 Chief executives are increasingly demanding that their technology investments, including data and AI, work harder and deliver more value to their organizations. Generative AI offers additional tools to achieve this, but it adds complexity to the challenge. CIOs must ensure their data infrastructure is robust enough to cope with the enormous data processing demands and governance challenges posed by these advances. Technology leaders see this challenge as an opportunity for AI to deliver considerable growth to their organizations, both in terms of top and bottom lines. While many business leaders have stated in public that it is especially important for AI projects to help reduce costs, they also say that it is important that these projects enable new revenue generation. Gartner forecasts worldwide IT spending to grow by 4.3% in 2023 and 8.8% in 2024 with most of that growth concentrated in the software category that includes spending on data and AI. AI-driven efficiency gains promise business growth, with 81% expecting a gain greater than 25% and 33% believing it could exceed 50%. CDOs and CTOs are echoing: “If we can automate our core processes with the help of self-learning algorithms, we’ll be able to move much faster and do more with the same amount of people.” and “Ultimately for us it will mean automation at scale and at speed.”

Organizations are increasingly prioritizing AI projects due to economic uncertainty and the increasing popularity of generative AI. Technology leaders are focusing on longer-range projects that will have significant impact on the company, rather than pursuing short-term projects. This is due to the rapid pace of use cases and proofs of concept coming at businesses, making it crucial to apply existing metrics and frameworks rather than creating new ones. The key is to ensure that the right projects are prioritized, considering the expected business impact, complexity, and cost of scaling.

Data infrastructure and AI systems are becoming increasingly intertwined due to the enormous demands placed on data collection, processing, storage, and analysis. As financial services companies like Razorpay grow, their data infrastructure needs to be modernized to accommodate the growing volume of payments and the need for efficient storage. Advances in AI capabilities, such as generative AI, have increased the urgency to modernize legacy data architectures. Generative AI and the LLMs that support it will multiply workload demands on data systems and make tasks more complex. The implications of generative AI for data architecture include feeding unstructured data into models, storing long-term data in ways conducive to AI consumption, and putting adequate security around models. Organizations supporting LLMs need a flexible, scalable, and efficient data infrastructure. Many have claimed success with the adoption of Lakehouse architecture, combining features of data warehouse and data lake architecture. This architecture helps scale responsibly, with a good balance of cost versus performance. As one data leader observed “I can now organize my law enforcement data, I can organize my airline checkpoint data, I can organize my rail data, and I can organize my inspection data. And I can at the same time make correlations and glean understandings from all of that data, separate and together.”

Data silos are a significant challenge for data and technology executives, as they result from the disparate approaches taken by different parts of organizations to store and protect data. The proliferation of data, analytics, and AI systems has added complexity, resulting in a myriad of platforms, vast amounts of data duplication, and often separate governance models. Most organizations employ fewer than 10 data and AI systems, but the proliferation is most extensive in the largest ones. To simplify, organizations aim to consolidate the number of platforms they use and seamlessly connect data across the enterprise. Companies like Starbucks are centralizing data by building cloud-centric, domain-specific data hubs, while GM's data and analytics team is focusing on reusable technologies to simplify infrastructure and avoid duplication. Additionally, organizations need space to innovate, which can be achieved by having functions that manage data and those that involve greenfield exploration.

#codingexercise: CodingExercise-11-10-2024.docx


Saturday, November 9, 2024

 One of the fundamentals in parallel processing in computer science involves the separation of tasks per worker to reduce contention. When you treat the worker as an autonomous drone with minimal co-ordination with other members of its fleet, an independent task might appear something like installing a set of solar panels in an industry with 239 GW estimate in 2023 for the global solar powered renewable energy. That estimate was a 45% increase over the previous year. As industry expands, drones are employed for their speed. Drones aid in every stage of a plant’s lifecycle from planning to maintenance. They can assist in topographic surveys, during planning, monitor construction progress, conduct commissioning inspections, and perform routine asset inspections for operations and maintenance. Drone data collection is not only comprehensive and expedited but also accurate.

During planning for solar panels, drones can conduct aerial surveys to assess topography, suitability, and potential obstacles, create accurate 3D maps to aid in designing and optimizing solar farm layouts, and analyze shading patterns to optimize panel placement and maximize energy production. During construction, drones provide visual updates on construction progress, and track and manage inventory of equipment, tools, and materials on-site. During maintenance, drones can perform close-up inspections of solar panels to identify defects, damage, or dirt buildup, monitor equipment for wear and tear, detect hot spots in panels with thermal imaging, identify and manage vegetation growth that might reduce the efficiency of solar panels and enhance security by patrolling the perimeter and alerting to unauthorized access.

When drones become autonomous, these activities go to the next level. The dependency on human pilots has always been a limitation on the frequency of flights. On the other hand, autonomous drones boost efficiency, shorten fault detection times, and optimize outcomes during O&M site visits. Finally, they help to increase the power output yield of solar farms. The sophistication of the drones in terms of hardware and software increases from remote-controlled drones to autonomous drones. Field engineers might suggest selection of an appropriate drone as well as the position of docking stations, payload such as thermal camera and capabilities. A drone data platform that seamlessly facilitates data capture, ensures safe flight operations with minimal human intervention, prioritize data security and meet compliance requirements becomes essential at this stage. Finally, this platform must also support integration with third-party data processing and analytics applications and reporting stacks that publish various charts and graphs. As usual, a separation between data processing and data analytics helps just as much as a unified layer for programmability and user interaction with API, SDK, UI and CLI. While the platform can be sold separately as a product, leveraging a cloud-based SaaS service reduces the cost on the edge.

There is still another improvement possible over this with the formation of dynamic squadrons, consensus protocol and distributed processing with hash stores. While there are existing applications that serve to improve IoT data streaming at the edges and cloud processing via stream stores and analytics with the simplicity of SQL based querying and programmability, a cloud service that installs and operates a deployment stamp with a solution accelerator and as a citizen resource of a public cloud helps bring the best practices of storage engineering, data engineering and enabling businesses to be more focused.


Friday, November 8, 2024

 This is a summary of the book titled “The Leader’s guide to managing risk” written by risk consultant K. Scott Griffith published by HarperCollins Leadership in 2023. He worked as America Airlines chief safety officer, and he draws on his experience to mitigate and manage risk. People are focused on success and they put their mind to survive and thrive despite risk.  But when things go awry, risk management is the only thing that can prevent some consequences if we understand the underlying causes of bad outcomes. Sustained success over time demonstrates reliability and risk management teaches reliability and resilience.  Organizational reliability takes it to the next level with a combination of human and systemic factors. Global risk is the largest scope and it can be daunting but can be met. People and organizations can both achieve sustained success.

Reliability is essential for businesses to demonstrate their worth to shareholders and consumers. High performance must be sustainable over time, as failure is inevitable and not a result of perfect performance. Leaders must manage risk based on solid science and understand that companies are groups of people working within systems.

Reliability theory teaches that events always emerge within a probability distribution, and leaders must pay attention to each level of operations to perceive and understand risks. They must also be aware of their personal risk tolerance, which is the level of risk they are willing to tolerate.

To manage risk effectively, leaders must develop "risk intelligence," which is the ability to perceive a risk's severity and potential harmful results. This intelligence is derived from their experience and history with their organization but is subject to psychological and cognitive limitations. Leaders must also be aware of their personal risk tolerance, which should be rational and fact-based.

A system's design determines its reliability, and leaders must understand how it works both when functioning properly and when it fails. Systems come in various forms, from large systems like transportation infrastructure to small personal ones. To predict and manage system performance, leaders must understand the influences that determine the results systems produce. Human reliability maintains system reliability, and systems should optimize core functions while heeding less important functions. Systems can become unreliable due to factors other than flaws or design limitations, such as lack of maintenance, user overload, environmental factors, or human incompetence.

System design is crucial for maintaining system functionality, resilience, and trust. Engineers and designers can create obstacles, redundancies, and recovery mechanisms. Organizations should prioritize training their leaders to manage human and operational risk factors. Human performance is predictable, and organizations can manage and minimize human failures by understanding how performance works and what motivates and shapes it. Inclusive, collaborative work encourages reliable performance.

Organizational reliability is achieved through a combination of human and systemic factors, including effective leadership and various internal and external factors. Leaders must manage risk by inspiring employees to work towards achieving the organization's goals, balancing competing priorities, and understanding the risk of error. They should ensure the company has reliable systems, appropriate resources, and effective working methods.

Leaders can predict human and organizational reliability by assessing future risks through methods like investigations, audits, inspections, employee reports, and predictive risk modeling. These methods have limitations and are subject to human interpretation bias. Predictive risk analysis helps leaders anticipate future risks and mitigate or prevent them, such as preventing accidents or errors in hospitals or transportation systems.

Global risks, such as climate change, can be mitigated by individuals and organizations. Nuclear capabilities can deter war, but cooperation among nations is necessary to address and mitigate it. Parenting can also involve addressing risks like toxic household chemicals and car accidents. 

Overall, understanding and managing these risks is crucial for maintaining organizational reliability.

Between 1995 and 2005, the aviation industry in the United States reduced fatal airline accidents by 78%. This improvement was not due to a single policy but rather the Commercial Aviation Safety Team, which examined various aviation risk dimensions. To manage risk optimally, organizations should follow a "Sequence of Reliability" framework, which starts with understanding the implications of risk and moves to managing systems, organizations, and employees. The best approach varies depending on the specific risk faced.


Thursday, November 7, 2024

 This is a comparison of the technologies between a driver platform and a drone fleet platform from the points of their shared focus and differences.

Tenet 1: Good infrastructure enables iterations.

Whether it is a modular algorithm or a microservice, one might need to be replaced with another or used together, a good infrastructure enables their rapid development.

Tenet 2: modular approach works best to deliver quality safety-critical AI systems and iterative improvements. Isolation is the key enablement for quality control, and nothing beats that than modular design.

Tenet 3: spatial-temporal fusion increases robustness and decreases uncertainty. Whether on ground or in the air and irrespective of the capabilities of the unit, space and time provide holistic information.

Tenet 4: vision language models enable deep reasoning. The queries can be better articulated, and the responses have higher precision and recall.

The methodology for both involves:

Learn everything but never require perfection because it is easy to add to the vector but difficult to be perfect about the selection

Resiliency reigns where there are no single neural paths to failure. This is the equivalent of no single point of failure.

Multi-modality first: so that when multiple sensors fail or the data is not complete, decisions can still be made.

Interpretability and partitioning so that data can be acted on without requiring whole state all the time.

Generalization so that the same model can operate in different contexts.

Surface scoping with bounded validations so that this helps with keeping cost under check.

Now, for the differences, the capabilities of the individual units aside, the platform treats them as assets in a catalog and while some of them might be autonomous, the platform facilitates both federated and delegated decision support. The catalog empowers multi-dimension organization just like MDMs do, the data store supports both spatial and temporal data, vector data and metadata filtering at both rest and in transit. The portfolio of services available on the inventory is similar to that of an e-commerce store with support for order processing from different points of sale.

Reference:

Context for this article:DroneInformaionManagement.docx

#codingexercise: CodingExercise-11-07-2024.docx


Wednesday, November 6, 2024

 Find all K-distance indices in array.

You are given a 0-indexed integer array nums and two integers key and k. A k-distant index is an index i of nums for which there exists at least one index j such that |i - j| <= k and nums[j] == key.

Return a list of all k-distant indices sorted in increasing order.

Example 1:

Input: nums = [3,4,9,1,3,9,5], key = 9, k = 1

Output: [1,2,3,4,5,6]

Explanation: Here, nums[2] == key and nums[5] == key.

- For index 0, |0 - 2| > k and |0 - 5| > k, so there is no j where |0 - j| <= k and nums[j] == key. Thus, 0 is not a k-distant index.

- For index 1, |1 - 2| <= k and nums[2] == key, so 1 is a k-distant index.

- For index 2, |2 - 2| <= k and nums[2] == key, so 2 is a k-distant index.

- For index 3, |3 - 2| <= k and nums[2] == key, so 3 is a k-distant index.

- For index 4, |4 - 5| <= k and nums[5] == key, so 4 is a k-distant index.

- For index 5, |5 - 5| <= k and nums[5] == key, so 5 is a k-distant index.

- For index 6, |6 - 5| <= k and nums[5] == key, so 6 is a k-distant index.

Thus, we return [1,2,3,4,5,6] which is sorted in increasing order.

Example 2:

Input: nums = [2,2,2,2,2], key = 2, k = 2

Output: [0,1,2,3,4]

Explanation: For all indices i in nums, there exists some index j such that |i - j| <= k and nums[j] == key, so every index is a k-distant index.

Hence, we return [0,1,2,3,4].

Constraints:

• 1 <= nums.length <= 1000

• 1 <= nums[i] <= 1000

• key is an integer from the array nums.

• 1 <= k <= nums.length

class Solution {

    public List<Integer> findKDistantIndices(int[] nums, int key, int k) {

        List<Integer> indices = new ArrayList<Integer>();

        for (int i = 0; i < nums.length; i++){

            if (nums[i] == key){

                for (int j = i-k; j<=i+k; j++){

                    if (j >= 0 && j < nums.length; j++) {

                        if (!indices.contains(j)) {

                            indices.add(j)

                        }

                    }

                }

            }

        }

        return indices;

    }

}

10, 1, 1 => [0,1]

101,1,1 =>[0,1,2]

1010001,1,1 => [0,1,2,3,5,6]

1010001,1,2 => [0,1,2,3,4,5,6]


Tuesday, November 5, 2024

 This is an extension of use cases for a drone management software platform in the cloud. The DFCS

Autonomous drone fleet movement has been discussed with the help of self-organizing maps as implemented in https://github.com/raja0034/som4drones but sometimes the fleet movement must be controlled precisely especially at high velocity and large number or high density when the potential for congestion or collision is high. For example, when the velocity is high, drones and missiles might fly similarly. A centralized algorithm to control their movement for safe arrival of all units might make it more convenient and cheaper for the units if there is continuous connectivity during flight. One such algorithm is the virtual structure method.

 The virtual structure method for UAV swarm movement is a control strategy where the swarm is organized into a predefined formation, often resembling a geometric shape. Instead of relying on a single leader UAV, the swarm is controlled as if it were a single rigid body or virtual structure. Each UAV maintains its position relative to this virtual structure, ensuring cohesive and coordinated movement. The steps include:

Virtual Structure Definition: A virtual structure, such as a line, triangle, or more complex shape, is defined. This structure acts as a reference for the UAVs' positions.

Relative Positioning: Each UAV maintains its position relative to the virtual structure, rather than following a specific leader UAV. This means that if one UAV moves, the others adjust their positions to maintain the formation.

Coordination and Control: The UAVs use local communication and control algorithms to ensure they stay in their designated positions within the virtual structure. This can involve adjusting speed, direction, and altitude based on the positions of neighboring UAVs.

Fault Tolerance: Since the control does not rely on a single leader, the swarm can be more resilient to failures. If one UAV fails, the others can still maintain the formation.

 A sample implementation where each UAV follows a leader to maintain a formation might appear as follows:

import numpy as np

class UAV:

    def __init__(self, position):

        self.position = np.array(position)

class Swarm:

    def __init__(self, num_uavs, leader_position, formation_pattern):

        self.leader = UAV(leader_position)

        self.uavs = [UAV(leader_position + offset) for offset in formation_pattern]

    def update_leader_position(self, new_position):

        self.leader.position = np.array(new_position)

        self.update_uav_positions()

    def update_uav_positions(self):

        for i, uav in enumerate(self.uavs):

            uav.position = self.leader.position + formation_pattern[i]

    def get_positions(self):

        return [uav.position for uav in self.uavs]

# Example usage

num_uavs = 5

leader_position = [0, 0, 0] # Initial leader position

formation_pattern = [np.array([i*2, 0, 0]) for i in range(num_uavs)] # Line formation

swarm = Swarm(num_uavs, leader_position, formation_pattern)

# Update leader position

new_leader_position = [5, 5, 5]

swarm.update_leader_position(new_leader_position)

# Get updated positions of all UAVs

positions = swarm.get_positions()

print("Updated UAV positions:")

for pos in positions:

    print(pos)


Monday, November 4, 2024

 Chief executives are increasingly demanding that their technology investments, including data and AI, work harder and deliver more value to their organizations. Generative AI offers additional tools to achieve this, but it adds complexity to the challenge. CIOs must ensure their data infrastructure is robust enough to cope with the enormous data processing demands and governance challenges posed by these advances. Technology leaders see this challenge as an opportunity for AI to deliver considerable growth to their organizations, both in terms of top and bottom lines. While a large number of business leaders have stated in public that it is especially important for AI projects to help reduce costs, they also say that it is important that these projects enable new revenue generation. Gartner forecasts worldwide IT spending to grow by 4.3% in 2023 and 8.8% in 2024 with most of that growth concentrated in the software category that includes spending on data and AI. AI-driven efficiency gains promise business growth, with 81% expecting a gain greater than 25% and 33% believing it could exceed 50%. CDOs and CTOs are echoing: “If we can automate our core processes with the help of self-learning algorithms, we’ll be able to move much faster and do more with the same amount of people.” and “Ultimately for us it will mean automation at scale and at speed.”

Organizations are increasingly prioritizing AI projects due to economic uncertainty and the increasing popularity of generative AI. Technology leaders are focusing on longer-range projects that will have significant impact on the company, rather than pursuing short-term projects. This is due to the rapid pace of use cases and proofs of concept coming at businesses, making it crucial to apply existing metrics and frameworks rather than creating new ones. The key is to ensure that the right projects are prioritized, considering the expected business impact, complexity, and cost of scaling.

Data infrastructure and AI systems are becoming increasingly intertwined due to the enormous demands placed on data collection, processing, storage, and analysis. As financial services companies like Razorpay grow, their data infrastructure needs to be modernized to accommodate the growing volume of payments and the need for efficient storage. Advances in AI capabilities, such as generative AI, have increased the urgency to modernize legacy data architectures. Generative AI and the LLMs that support it will multiply workload demands on data systems and make tasks more complex. The implications of generative AI for data architecture include feeding unstructured data into models, storing long-term data in ways conducive to AI consumption, and putting adequate security around models. Organizations supporting LLMs need a flexible, scalable, and efficient data infrastructure. Many have claimed success with the adoption of Lakehouse architecture, combining features of data warehouse and data lake architecture. This architecture helps scale responsibly, with a good balance of cost versus performance. As one data leader observed “I can now organize my law enforcement data, I can organize my airline checkpoint data, I can organize my rail data, and I can organize my inspection data. And I can at the same time make correlations and glean understandings from all of that data, separate and together.”

Data silos are a significant challenge for data and technology executives, as they result from the disparate approaches taken by different parts of organizations to store and protect data. The proliferation of data, analytics, and AI systems has added complexity, resulting in a myriad of platforms, vast amounts of data duplication, and often separate governance models. Most organizations employ fewer than 10 data and AI systems, but the proliferation is most extensive in the largest ones. To simplify, organizations aim to consolidate the number of platforms they use and seamlessly connect data across the enterprise. Companies like Starbucks are centralizing data by building cloud-centric, domain-specific data hubs, while GM's data and analytics team is focusing on reusable technologies to simplify infrastructure and avoid duplication. Additionally, organizations need space to innovate, which can be achieved by having functions that manage data and those that involve greenfield exploration.

Reference: previous article


Sunday, November 3, 2024

 There are N points (numbered from 0 to N−1) on a plane. Each point is colored either red ('R') or green ('G'). The K-th point is located at coordinates (X[K], Y[K]) and its color is colors[K]. No point lies on coordinates (0, 0).

We want to draw a circle centered on coordinates (0, 0), such that the number of red points and green points inside the circle is equal. What is the maximum number of points that can lie inside such a circle? Note that it is always possible to draw a circle with no points inside.

Write a function that, given two arrays of integers X, Y and a string colors, returns an integer specifying the maximum number of points inside a circle containing an equal number of red points and green points.

Examples:

1. Given X = [4, 0, 2, −2], Y = [4, 1, 2, −3] and colors = "RGRR", your function should return 2. The circle contains points (0, 1) and (2, 2), but not points (−2, −3) and (4, 4).

class Solution {

    public int solution(int[] X, int[] Y, String colors) {

        // find the maximum

        double max = Double.MIN_VALUE;

        int count = 0;

        for (int i = 0; i < X.length; i++)

        {

            double dist = X[i] * X[i] + Y[i] * Y[i];

            if (dist > max)

            {

                max = dist;

            }

        }

        for (double i = Math.sqrt(max) + 1; i > 0; i -= 0.1)

        {

            int r = 0;

            int g = 0;

            for (int j = 0; j < colors.length(); j++)

            {

                if (Math.sqrt(X[j] * X[j] + Y[j] * Y[j]) > i)

                {

                    continue;

                }

                if (colors.substring(j, j+1).equals("R")) {

                    r++;

                }

                else {

                    g++;

                }

            }

            if ( r == g && r > 0) {

                int min = r * 2;

                if (min > count)

                {

                    count = min;

                }

            }

        }

        return count;

    }

}

Compilation successful.

Example test: ([4, 0, 2, -2], [4, 1, 2, -3], 'RGRR')

OK

Example test: ([1, 1, -1, -1], [1, -1, 1, -1], 'RGRG')

OK

Example test: ([1, 0, 0], [0, 1, -1], 'GGR')

OK

Example test: ([5, -5, 5], [1, -1, -3], 'GRG')

OK

Example test: ([3000, -3000, 4100, -4100, -3000], [5000, -5000, 4100, -4100, 5000], 'RRGRG')

OK


Saturday, November 2, 2024

 A previous article talked about ETL, it’s modernization, new take on old issues and resolutions, and efficiency and scalability. This section talks about the bigger picture where this fits in.

In terms of infrastructure for data engineering projects, customers usually get started on a roadmap that progressively builds a more mature data function. One of the approaches for drawing this roadmap that experts observe as repeated across deployment stamps involves building a data stack in distinct stages with a stack for every phase on this journey. While needs, level of sophistication, maturity of solutions, and budget determines the shape these stacks take, the four phases are more or less distinct and repeated across these endeavors. They are starters, growth, machine-learning and real-time. Customers begin with a starters stack where the essential function is to collect the data and often involve implementing a drain. A unified data layer in this stage significantly reduces engineering bottlenecks. A second stage is the growth stack which solves the problem of proliferation of data destinations and independent silos by centralizing data into a warehouse which also becomes a single source of truth for analytics. When this matures, customers want to move beyond historical analytics and into predictive analytics. At this stage, a data lake and machine learning toolset come handy to leverage unstructured data and mitigate problems proactively. The next and final frontier to address is the one that overcomes a challenge in this current stack which is that it is impossible to deliver personalized experiences in real-time.

In this way, organizations solve the point-to-point integration problem by implementing a unified, event-based integration layer in the starters stack. Then when the needs became a little more sophisticated—to enable downstream teams

(and management) to answer harder questions and act on all of the data, they will centralize both clickstream data and relational data to build a full picture of the customer and their journey. After solving these challenges by implementing a cloud data warehouse as the single source of truth for all customer data, and then using reverse ETL pipelines to activate that data, organizations gear up towards the next stage. As the business grew, optimization required moving from historical analytics to predictive analytics, including the need to incorporate unstructured data into the analysis. To accomplish that, organizations implemented the ML Stack, which included a data lake (for unstructured data), and a basic machine learning tool set that could generate predictive outputs like churn scores. Finally,

these outputs are put to use by sending them through the warehouse and reverse ETL

pipelines, making them available as data points in downstream systems including CRM for customer touchpoints.

#codingexercise: CodingExercise-11-02-2024.docx