Cluster computing

Sunday, March 16, 2025

Emerging trends:

Constructing an incremental “knowledge base” of a landscape from drone imagery merges ideas from simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and semantic segmentation. Incremental SLAM and 3D reconstruction is suggested in the ORB-SLAM2 paper by Mur-Atal and Tardos in 2017 where a 3D Map is built by estimating camera poses and reconstructing scene geometry from monocular, stereo, or RGB-D inputs. Such SLAM framework can also be extended by fusing in semantic cues to enrich the resulting map with object and scene labels The idea of including semantic information for 3D reconstruction is demonstrated by SemanticFusion written by McCormick et al for ICCV 2017 where they use a Convolutional Neural Network aka CNN for semantic segmentation as their system fuses semantic labels into a surfel-based 3D map, thereby transforming a purely geometric reconstruction into a semantically rich representation of a scene. SemanticFusion helps to label parts of the scene – turning a raw point cloud or mesh into a knowledge base where objects, surfaces and even relationships can be recognized and queries. SfM, on the other hand, helps to stitch multi-view data into a consistent 3D-model where the techniques are particularly relevant for drone applications. Incremental SfM pipelines can populate information about a 3D space based on the data that arrives in the pipeline, but the drones can “walk the grid” around an area of interest to make sure sufficient data is captured to buid the 3D-model from 0 to 100% and the progress can even be tracked. Semantic layer is not added to SfM processing, but semantic segmentation or object detection can be layered on independently overly the purely geometric data. Layering-on additional modules for say, object detection, region classification, or even reasoning over scene changes helps to start with basic geometric layouts and add optinally to build comprehensive knowledge base. Algorithms that crunch these sensor data whether they are images or LiDAR data must operate in real-time and not on batch periodic analysis. They can, however, be dedicated to specific domains such as urban monitoring, agricultural surveying, or environmental monitoring for additional context-specific knowledge.

Saturday, March 15, 2025

Count the number of array slices in an array of random integers such that the difference between the min and the max values in the slice is <= k

public static int getCountOfSlicesWithMinMaxDiffGEk(int[] A, int k){

int N = A.length;

int[] maxQ = new int[N+1];

int[] posmaxQ = new int[N+1];

int[] minQ = new int[N+1];

int[] posminQ = new int[N+1];

int firstMax = 0, lastMax = -1;

int firstMin = 0, lastMin = -1;

int j = 0, result = 0;

for (int i = 0; i < N; i++) {

while (j < N) {

while(lastMax >= firstMax && maxQ[lastMax] <= A[j]) {

lastMax -= 1;

}

lastMax += 1;

maxQ[lastMax] = A[j];

posmaxQ[lastMax] = j;

while (lastMin >= firstMin && minQ[lastMin] >= A[j]) {

lastMin -= 1;

}

lastMin += 1;

minQ[lastMin] = A[j];

posminQ[lastMin] = j;

if (maxQ[firstMax] - minQ[firstMin] <= k) {

j += 1;

} else {

break;

}

System.out.println("result="+result + " i=" + i + " j="+ j);

result += (j-i);

if (result >= Integer.MAX_VALUE) {

result = Integer.MAX_VALUE;

}

if (posminQ[firstMin] == i) {

firstMin += 1;

}

if (posmaxQ[firstMax] == i) {

firstMax += 1;

}

return result;

}

A: 3,5,7,6,3 K=2

result=0 i=0 j=2

result=2 i=1 j=4

result=5 i=2 j=4

result=7 i=3 j=4

result=8 i=4 j=5

#Paper95: https://1drv.ms/w/c/d609fb70e39b65c8/EbV65i2MZFtLj1P7JUDLiCMBIyiNuEfc1O1kx47__m_bgg?e=E5lNAk

Friday, March 14, 2025

This is a summary of a book titled “Be the Unicorn” written by William Vanderbloemen and published by Harper Collins in 2023. It talks about 12 data driven habits that separate best leaders from the rest to the point that they stand out as unicorns. These soft skills are presented with intriguing, teachable case studies and include:

Move quickly to seize opportunities

To get ahead, be authentic

Build the agility to welcome new opportunities

Become a problem solver

Learn to anticipate the future

Be over-prepared

Develop self-awareness

Stay curious

Establish a network of connections

Learn to be likable.

Focus to maximize productivity

Live a purpose-driven life.

The author is founder of an executive search firm and his list of 12 soft skills comes out of his experience in working with others. Unicorns are distinctive, admired, and respected individuals who stand out as the spark that lights up a room. To become a unicorn, one must master these talents.

Moving quickly to seize opportunities is crucial, as it allows one to stand out from peers and become irreplaceable. Responding to challenges and opportunities with alacrity is essential. Being authentic is crucial in the search engine era, as it helps build trusting relationships with others.

Building agility is essential to welcome new opportunities. Ursula Burns, a chemical engineering graduate, exemplifies the value of agility. She was adaptable, agile, and willing to accept change for what it was. Burns's formula for success exemplifies the value of agility.

When individuals stretch their mindset and horizons with these soft skills, they will become natural with their newfound abilities.

Problem-solving is essential for success in any business. Companies will pay for and retain solutions to problems, as they are everywhere. Kevin Plank, a college football player, started Under Armour to solve athletes' sweating problem by researching synthetic fabrics. To position yourself advantageously, focus on understanding the problem and developing a profitable plan.

Analyze the future by understanding the environment and making predictions. Marc Benioff co-founded Salesforce, an online business and sales-management system that customers can use without purchasing hardware or software. Understanding the environment is crucial for making predictions and staying ahead of competitors.

Being over-prepared is also essential for success. John Wooden, a UCLA basketball coach, over-prepared his players to prevent blisters, leading to 10 national championships. Over-prepared leaders are better equipped for crises and face constant shocks, while under-prepared leaders are never ready for a crisis. Over-preparation builds agility and is more likely to be successful in any situation.

To achieve success in life, one must develop self-awareness, stay curious, establish a network of connections, and learn to be likable. Self-awareness involves being honest with oneself and understanding one's strengths and weaknesses. This leads to increased creativity, effectiveness in jobs, better relationships, leadership, and promotion. To increase self-awareness, practice humility, be patient, and trust others.

Curiousness is a key factor in success, as it drives creativity and knowledge. To increase curiosity, ask questions and listen carefully to answers. Networking with people in your field helps you learn about job openings and become a favored applicant for promotions.

Likeability is a key characteristic of successful job applicants, as it trumps competency. To achieve likability, talk less and listen more, building empathy and showing people that you put others first. By doing so, you will be drawn to them and contribute to their success.

Richard Branson, a successful entrepreneur, exemplifies the importance of focus and organization in maximizing productivity. Technology allows us to work anytime, anywhere, and with anyone, but it can make it difficult to focus on tasks. To maximize productivity, measure your own and your company's productivity by noting important goals achieved during specific work periods. Live a purpose-driven life, like Reshma Saujani, who founded Girls Who Code, by identifying your "why" and finding a workplace where you can address it meaningfully every day. As Canadian Prime Minister Justin Trudeau said, "Change is moving faster than ever, but will never move this slowly again." Nurturing leadership skills, such as the "The Lady and the Unicorn" tapestries, can lead to a strong, unique, and victorious life.

Thursday, March 13, 2025

GitOps is as much a part of Azure Infrastructure engineering as anything native to the public cloud. The convenience that git repositories and associated workflows provide is not specific to the public clouds and can span sovereign clouds and on-premises. In this regard, a few methods need to be called out.

Assuming a workflow file has been authored in the .github/workflows folder of a repository, it is very much like an automation script that can be shared and re-used in different workflows and that is not restricted to just GitHub workflows. All you need is a personal access token. For example,

curl -X POST \

-H "Accept: application/vnd.github.v3+json" \

-H "Authorization: token YOUR_PERSONAL_ACCESS_TOKEN" \

https://api.github.com/repos/OWNER/REPO/actions/workflows/WORKFLOW_ID/dispatches \

-d '{"ref":"main"}

The advantages of GitOps includes auditing which is similar to that of the public cloud. Every run of the workflow is recorded including who did it and when. Secrets and variables allow the workflow to be parameterized and this can be done with the help of another request just prior to the run. For example,

curl -X PUT \

-H "Accept: application/vnd.github.v3+json" \

-H "Authorization: token YOUR_PERSONAL_ACCESS_TOKEN" \

https://api.github.com/repos/OWNER/REPO/actions/secrets/SECRET_NAME \

-d '{"encrypted_value": "NEW_ENCRYPTED_VALUE", "key_id": "KEY_ID"}'

This helps particularly in the case when the Owner or owning organization to the repository has policies in place to require any change to the files in the repository to be done via pull-requests where manual approvals are necessary.

GitOps, therefore, provides version tracking and file sharing convenience that can be packaged to run with dedicated accounts that have very specific fine-grained access to what resources are specified to be acted upon by the workflow file.

A user interface or any floating ui component in any portal can also use GitOps as backend instead of the actual resource, thereby, providing convenience to how an automation is run.

Reference: Previous1 articles (IaCResolutionsPart261.docx: https://1drv.ms/w/c/d609fb70e39b65c8/EZtfWh6GSp5ElYh8itkjwBkBXexIdT-xGnqwQqcrQZk-cQ?e=DwZ2BI).

Wednesday, March 12, 2025

Location queries

Location is a datatype. It can be represented either as a point or a polygon and each helps with answering questions such as getting top 3 stores near to a geographic point or stores within a region. Since it is a data type, there is some standardization available. SQL Server defines not one but two data types for the purpose of specifying location: the Geography data type and the Geometry data type. The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) coordinate system. The point and the polygon are examples of the Geography data type. Both the geography and the geometry data type must have reference to a spatial system and since there are many of them, it must be used specifically in association with one. This is done with the help of a parameter called the Spatial Reference Identifier or SRID for short. The SRID 4326 is the well-known GPS coordinates that give information in the form of latitude/Longitude. Translation of an address to a Latitude/Longitude/SRID tuple is supported with the help of built-in functions that simply drill down progressively from the overall coordinate span. A table such as ZipCode could have an identifier, code, state, boundary, and center point with the help of these two data types. The boundary could be considered the polygon formed by the zip and the Center point as the central location in this zip. Distances between stores and their membership to zip can be calculated based on this center point. Geography data type also lets us perform clustering analytics which answers questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures that support such clustering techniques. The geometry data type supports operations such as area and distance because it translates to coordinates. It has its own rectangular coordinate system that we can use to specify the boundaries or the ‘bounding box’ that the spatial index covers.

The operations performed with these data types include the distance between two geography objects, the method to determine a range from a point such as a buffer or a margin, and the intersection of two geographic locations. The geometry data type supports operations such as area and distance because it translates to coordinates. Some other methods supported with these data types include contains, overlaps, touches, and within.

A note about the use of these data types now follows. One approach is to store the coordinates in a separate table where the primary keys are saved as the pair of latitude and longitude and then to describe them as unique such that a pair of latitude and longitude does not repeat. Such an approach is questionable because the uniqueness constraint for locations has a maintenance overhead. For example, two locations could refer to the same point and then unreferenced rows might need to be cleaned up. Locations also change ownership, for example, store A could own a location that was previously owned by store B, but B never updates its location. Moreover, stores could undergo renames or conversions. Thus, it may be better to keep the spatial data associated in a repeatable way along with the information about the location. Also, these data types do not participate in set operations. That is easy to do with collections and enumerable with the programming language of choice and usually consist of the following four steps: answer initialization, return an answer on termination, accumulation called for each row, and merge called when merging the processing from parallel workers. These steps are like a map-reduce algorithm. These data types and operations are improved with the help of a spatial index. These indexes continue to be like indexes of other data types and are stored using B-Tree. Since this is an ordinary one-dimensional index, the reduction of the dimensions of the two-dimensional spatial data is performed by means of tessellation which divides the area into small subareas and records the subareas that intersect each spatial instance. For example, with a given geography data type, the entire globe is divided into hemispheres and each hemisphere is projected onto a plane. When that given geography instance covers one or more subsections or tiles, the spatial index would have an entry for each such tile that is covered. The geometry data type has its own rectangular coordinate system that you define which you can use to specify the boundaries or the ‘bounding box’ that the spatial index covers. Visualizers support overlays with spatial data which is popular with mapping applications that super-impose information over the map with the help of transparent layers. An example is the Azure Maps with GeoFence as described here¹.

References:

1. Azure Maps and GeoFence: https://1drv.ms/w/s!Ashlm-Nw-wnWhKoMrJB6VrX06DHN-g?e=dWJIgv

Monday, March 10, 2025

This is a different approach from the generic approach described for UAV swarm flight path management in uncharted territory as described by an earlier article1 using waypoint selections and trajectory smoothing. There are two key differentiating characteristics of UAV swarms that can provide additional insights. One is that the UAV swarm is not necessarily always a one-follow-another sequence through waypoints and trajectory. In fact the formation is dynamic and the center of the UAV swarm has its own position, altitude, velocity and distance from references than the leading and trailing units. Another is that the units are themselves capable of carrying a variety of sensors such as LiDAR data that can be stored and queried and in the cloud to build the knowledge base of the landscape via data processing systems similar to image capturing ones discussed earlier2. The commercial UAV swarm serving cloud-software based solution3 does not limit the types and capabilities of the drones or the storing and querying of the data from the sensors. The generic approach treated each unit of the UAV swarm as a co-ordinate but this approach outlined below makes use of both these factors.

The center of the UAV swarm regardless of its formation and changes to formations when following trajectories through waypoints can also be tracked in addition to each unit of the swarm. The scatter plots of position vs time, altitude vs time, speed vs time, distance from reference point versus time, inter UAV distance vs time and velocity components versus time demonstrate elongated scatter plots which lends itself to correlation. Therefore, logistic regression can be used to predict various aspects of the flight path of the center of the UAV swarm. Logistic regression differs from the other Regression techniques in the use of statistical measures. Regression is very useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction. This technique is suitable for specific aspects of the flight path for the center of the UAV swarm that doesn’t always have to be the same drone. One advantage of logistic regression is that the algorithm is highly flexible, taking any kind of input, and supports several different analytical tasks. Use of MicrosoftML rxFastTree is recommended for this purpose.

The gradient boost algorithm for rxFastTree is possible with several loss functions including the squared loss function. The algorithm for the least squares regression can be written as :

1. Set the initial approximation

2. For a set of successive increments or boosts each based on the preceding iterations, do

3. Calculate the new residuals

4. Find the line of search by aggregating and minimizing the residuals

5. Perform the boost along the line of search

6. Repeat 3,4,5 for each of 2.

Sunday, March 9, 2025

The ability to store and query LiDAR data exemplifies the extremes in storage and computing requirements. LiDAR uses laser light to measure distances and create highly accurate 3D maps. It captures millions of data points per second, providing detailed information about the environment around each unit of the UAV Swarm. Selecting appropriate software for processing LiDAR data is crucial. Some cloud-based tools and frameworks include PDAL (Point Data Abstraction Library), PCL (Point Cloud Library), Open3D, Entwine, and TerraSolid. These tools can help to build LiDAR processing pipelines. LiDAR data processing includes data ingestion, preprocessing, classification, feature extraction, and analysis and visualization. Each step requires significant computational resources, especially for large datasets. Cloud-based storage solutions offer scalability, flexibility, cost-effectiveness, accessibility, speed and seamless collaboration making them ideal for handling massive LiDAR datasets. Cloud storage providers also offer advanced security features and robust backup mechanisms.

Over ten LiDAR companies are public in the US, with many key players in Europe and Asia. Competition among manufacturers is driving down prices, making LiDAR feasible for various markets. Standardization and regulation has not come about leading to frustrations with varied specifications. Open standard data formats are essential for flexibility and efficiency. Integrating multiple sensors adds calibration and synchronization challenges and raw point cloud data is complex to interpret without expert help. Complexity in processing raw data remains a challenge for real-time applications due to the millions of data points captured per second. The amount of 3D data is rapidly increasing, complicating real-time interpretation. Most advancements in computer vision focus on 2D data, making 3D LiDAR processing complex. Real-time LiDAR applications need actionable insights rather than just raw data. Manufacturers focus on technical specifications, but practical applications require problem-solving insights. Integrating LiDAR into applications is challenging and can lead to costly mistakes. In fact, existing solutions focus on LiDAR-agnostic strategy to support a wide range of sensors. Software solutions that leverage LiDAR software processor expedite real-time application development and insights. Comprehensive features from the software processor and RESTful API for easy integration comes in handy for automations and infrastructure deployments. No one questions LiDAR solutions to provide the state-of-the-art and anonymous spatial intelligence data for improved UAV operations.

Deep learning shows promising results for processing tasks associated with UAV-based image and LiDAR data especially in improving classification, object detection, and semantic segmentation tasks in remote sensing. There are some challenges in using deep learning for such imagery due to difficulty of acquiring sufficient labeled samples. Convolutional Neural Network aka CNNs with object detection is the most common approach. This highlights the importance of real-time processing, domain adaptation, and few-shot learning as potentially emerging technologies. Deep learning architecture complexity must be reduced while maintaining accuracy. Quantization techniques can reduce memory requirements for deep learning models. Domain adaptation and transfer learning are essential for addressing UAV imagery and Generative Adversarial Networks aka GANs are a promising approach for aligning source and target domains. Attention mechanisms improve feature extraction in high-resolution remote sensing images, enhancing tasks like segmentation and object detection. Contrastive loss holds promise in improving model performance. As yet, certain photogrammetric processing such as dense point cloud generation and orthomosaic creation are still unexplored for UAV imageries. With improvement of more labeled datasets, deep learning neural networks can become more generalized for UAV swarms.