Monday, April 27, 2026

 Azure Web App Logging

An Azure Web App can log in two broad ways: locally on the app host for quick troubleshooting, or externally through Azure Monitor diagnostic settings for longer-lived and downstream analytics use. The best choice depends on the following factors: speed and simplicity, or durability, integration, and centralized operations.

Logging options

Local logging writes logs to the App Service file system, where you can download them or access them over FTPS. This is the lightest-weight option for development and short investigations, and Azure App Service supports FTPS-only mode so you can avoid plain FTP; if you are using file-system logging, a common optimization is to keep retention at 0 days and size quota around 35 MB so you do not accumulate unnecessary storage or incur avoidable cost on the app resource.

Diagnostic settings send logs to a Storage account, Event Hub, or Log Analytics. This is the better fit when you need centralized retention, querying, or forwarding to operational tools such as Splunk through Event Hub or another ingestion pipeline, but it can generate meaningful storage and ingestion volume depending on how verbose the selected log categories are

Practical trade-offs

Local file-system logging is usually faster to access and easier for developers because the logs sit close to the app and can be pulled immediately. The downside is that it is not designed for long-term retention or enterprise-scale observability, and the footprint should be kept intentionally small so it does not compete with the app for space or create unnecessary overhead.

Diagnostic settings are better for compliance, analytics, and cross-team access because they move data out of the app into durable Azure services. The trade-off is cost and volume: app logs, HTTP logs, and platform logs can grow quickly, and sending all categories to Storage or Event Hub increases both ingestion and downstream processing costs, especially if a SIEM such as Splunk also charges for indexed volume.

Blob storage option

Sending logs to Azure Blob Storage is often the middle ground between local-only logs and a full streaming pipeline. Compared with keeping logs on the app host, blob storage gives you better retention, easier central access, and stronger separation of duties; compared with Event Hub, it is simpler and usually cheaper for archive-style retention, but less suitable for real-time operational forwarding.

From a security perspective, blob storage is preferable when you want to restrict access with managed identities, RBAC, and private networking rather than exposing the app host file system or broadly granting FTPS access. In general, the more external the log destination, the better your control plane story becomes, but the more important it is to secure identities, network paths, and storage permissions.

Cost impact

When logging is turned on for all log types, the monthly cost increases in two places: the App Service side and the destination side. On the app side, local logging can consume file-system quota and operational overhead, while external logging can add Azure Monitor, Storage, Event Hub, and downstream SIEM costs; in practice, the biggest cost driver is usually log volume rather than the mere act of enabling logging

A full “everything on” configuration can become expensive if verbose application logs, HTTP logs, and platform diagnostics are all emitted continuously. The right way to manage cost is to limit categories to what is actually needed, reduce verbosity in production, and set retention policies that match the business need instead of defaulting to indefinite collection

Premium tier considerations

If the app service plan is upgraded to the lowest Premium tier, turning on logging through diagnostic settings is generally a better production pattern than relying on only local file logging. Premium gives more headroom for performance-sensitive workloads, but logging still adds CPU, I/O, and network overhead, especially if the destination is remote and every write must be exported out of the app path

The main security concern is not the Premium tier itself, but the expanded data flow: logs may contain request paths, headers, identifiers, or exception details, so access to the destination must be tightly limited. The main performance concern is bursty log generation, which can increase latency if the app spends too much time serializing and exporting log data rather than serving requests

Dev and ops access

A good pattern is to optimize for both developer and operational needs by splitting access modes. Developers can use local logs or near-real-time access for low-latency troubleshooting and faster iteration, while operations teams consume the same data centrally with read-only access, least privilege, and controlled retention in Storage, Event Hub, or a SIEM pipeline

This reduces friction because developers get interactive access without waiting on a downstream pipeline, while operations gets governed, durable visibility with auditability and restricted permissions. In practice, that usually means keeping local logs small and temporary, and pushing only the logs needed for production observability into centralized destinations

Recommendations

Azure’s general direction for App Service logging is to use local logs for short-lived troubleshooting, diagnostic settings for durable monitoring, and secure transport and access controls for anything beyond the app host. FTPS should be limited to FTPS-only or disabled when not needed, detailed error pages should not be exposed to clients in production, and logging categories should be scoped narrowly to reduce cost and noise.

A popular policy posture is:

• Keep local file-system logs small, temporary, and developer-focused.

• Use diagnostic settings for production retention and centralized monitoring.

• Route only necessary categories to Storage or Event Hub.

• Restrict destination access with least privilege and private connectivity where possible.

• Treat log content as sensitive operational data and control retention accordingly

Sunday, April 26, 2026

 Continued from previous article 


Some replicas are asynchronous by nature and are called observers. They do not participate in the in-sync replica or become a partition leader, but they restore availability to the partition and allow producers to produce data again. Connected clusters might involve clusters in distinct and different geographic regions and usually involve linking between the clusters. Linking is an extension of the replica fetching protocol that is inherent to a single cluster. A link contains all the connection information necessary for the destination cluster to connect to the source cluster. A topic on the destination cluster that fetches data over the cluster link is called a mirror topic. This mirror may have a same or prefixed name, synced configurations, byte for byte copy and consumer offsets as well as access control lists.

Managed services over brokers complete the delivery value to the business from standalone deployments of brokers such that cluster sizing, over-provisioning, failover design and infrastructure management are automated. They are known to amplify the availability to 99.99% uptime service-level agreement. Often, they involve a replicator which is a worker that executes connector and its tasks to co-ordinate data streaming between source and destination broker clusters. A replicator has a source consumer that consumes the records from the source cluster and then passes these records to the Connect framework. The Connect framework would have a built-in producer that then produces these records to the destination cluster. It might also have dedicated clients to propagate overall metadata updates to the destination cluster.

In a geographically distributed replication for business continuity and disaster recovery, the primary region has the active cluster that the producers and consumers write to and read from, and the secondary region has read-only clusters with replicated topics for read only consumers. It is also possible to configure two clusters to replicate to each other so that both of them have their own sets of producers and consumers but even in these cases, the replicated topic on either side will only have read-only consumers. Fan-in and Fan-out are other possible arrangements for such replication.

Disaster recovery almost always occurs with a failover of the primary active cluster to a secondary cluster. When disaster strikes, the maximum amount of data usually measured in terms of time that can be lost after a recovery is minimized by virtue of this replication. This is referred to as the Recovery Point Objective. The targeted duration until the service level is restored to the expectations of the business process is referred to as the Recovery Time Objective. The recovery helps the system to be brought back to operational mode. Cost, business requirements, use cases and regulatory and compliance requirements mandate this replication and the considerations made for the data in motion for replication often stand out as best practice for the overall solution.

One of the toughest challenges in data engineering has been the diversity of stacks, platforms, products and logic to the detriment of smooth operations, business continuity and disaster recovery. The problem stems from the dichotomy between assets and debt. When developers spend time writing to say SQL edge, then they find a greater debt to move to an open-source stack because the data operations proliferate and there is very little curating. That is why planning for all the Ops consideration is just as necessary at design time as the feature itself.


#codingexercise: CodingExercise-04-26-2026.docx

Saturday, April 25, 2026

 (Continued from previous article)

When these IoT resources are shared, isolation model, impact-to-scaling performance, state management and security of the IoT resources become complex. Scaling resources helps meet the changing demand from the growing number of consumers and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. Scaling depends on number of producers and consumers, payload size, partition count, egress request rate and usage of IoT hubs capture, schema registry, and other advanced features. When additional IoT is provisioned or rate limit is adjusted, the multitenant solution can perform retries to overcome the transient failures from requests. When the number of active users reduces or there is a decrease in the traffic, the IoT resources could be released to reduce costs. Data isolation depends on the scope of isolation. When the storage for IoT is a relational database server, then the IoT solution can make use of IoT Hub. Varying levels and scope of sharing of IoT resources demands simplicity from the architecture. Patterns such as the use of the deployment stamp pattern, the IoT resource consolidation pattern and the dedicated IoT resources pattern help to optimize the operational cost and management with little or no impact on the usages.   

Edge computing relies heavily on asynchronous backend processing. Some form of message broker becomes necessary to maintain order between events, retries and dead-letter queues. The storage for the data must follow the data partitioning guidance where the partitions can be managed and accessed separately. Horizontal, vertical, and functional partitioning strategies must be suitably applied. In the analytics space, a typical scenario is to build solutions that integrate data from many IoT devices into a comprehensive data analysis architecture to improve and automate decision making.

Event Hubs, blob storage, and IoT hubs can collect data on the ingestion side, while they are distributed after analysis via alerts and notifications, dynamic dashboarding, data warehousing, and storage/archival. The fan-out of data to different services is itself a value addition but the ability to transform events into processed events also generates more possibilities for downstream usages including reporting and visualizations.

One of the main considerations for data pipelines involving ingestion capabilities for IoT scale data is the business continuity and disaster recovery scenario. This is achieved with replication.  A broker stores messages in a topic which is a logical group of one or more partitions. The broker guarantees message ordering within a partition and provides a persistent log-based storage layer where the append-only logs inherently guarantee message ordering. By deploying brokers over more than one cluster, geo-replication is introduced to address disaster recovery strategies.

Each partition is associated with an append-only log, so messages appended to the log are ordered by the time and have important offsets such as the first available offset in the log, the high watermark or the offset of the last message that was successfully written and committed to the log by the brokers and the end offset  where the last message was written to the log and exceeds the high watermark. When a broker goes down, subsequent durability and availability must be addressed with replicas. Each partition has many replicas that are evenly distributed but one replica is elected as the leader and the rest are followers. The leader is where all the produce and consume requests go, and followers replicate the writes from the leader.

A pull-based replication model is the norm for brokers where dedicated fetcher threads periodically pull data between broker pairs. Each replica is a byte-for-byte copy of each other, which makes this replication offset preserving. The number of replicas is determined by the replication factor. The leader maintains a ledge called the in-sync replica set, where messages are committed by the leader after all replicas in the ISR set replicate the message. Global availability demands that brokers are deployed with different deployment modes. Two popular deployment modes are 1) a single broker that stretches over multiple clusters and 2) a federation of connected clusters.


Thursday, April 23, 2026

 Data in motion – IoT solution and data replication

The transition of data from edge sensors to the cloud is a data engineering pattern that does not always get a proper resolution with the boilerplate Event-Driven architectural design proposed by the public clouds because much of the fine tuning is left to the choice of the resources, event hubs and infrastructure involved in the streaming of events. This article explores the design and data in motion considerations for an IoT solution beginning with an introduction to the public cloud proposed design, the choices between products and the considerations for the handling and tuning of distributed, real-time data streaming systems with particular emphasis on data replication for business continuity and disaster recovery. A sample use case can include the continuous events for geospatial analytics in fleet management and its data can include driverless vehicles weblogs.

Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events. The right choice of architectural style plays a big role in the total cost of ownership for a solution involving events.

The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. IoT requires events to be ingested at very high volumes. The producer-consumer design has scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.

Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.

Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. IoT traffic mandates a guaranteed delivery. Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.

The benefits and the challenges suggest some of these best practices. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp. Large data transfer between services is an antipattern. Loosely coupled event driven systems are best.

IoT Solutions can be proposed either with an event driven stack involving open-source technologies or via a dedicated and optimized storage product such as a relational engine that is geared towards edge computing. Either way capabilities to stream, process and analyze data are expected by modern IoT applications. IoT systems vary in flavor and size. Not all IoT systems have the same certifications or capabilities.


Wednesday, April 22, 2026

 Derived metrics in observability pipelines for Inflection signatures If we assume an immovable, straight-down (nadir) camera with no pitch, yaw, roll, or zoom, the geometry of the problem simplifies in a way that is almost ideal for defining observability metrics. The drone’s motion is now the primary source of variation across frames: translation along straight edges, and a change in translation direction at corners. That means we can design metrics that are explicitly sensitive to changes in planar motion and scene displacement while being largely invariant to viewpoint distortions. Those metrics can be computed per frame or per short window, aggregated over time, and then reintroduced into the observability pipeline as custom events that act as “inflection hints” for downstream agents. The starting point is to treat each frame as a node in a temporal sequence with associated observability features. With a nadir camera, the dominant effect of motion is a shift of the ground texture in the image plane. Along a straight edge, this shift is approximately constant in direction and magnitude (modulo speed variations), while at a corner, the direction of shift changes. We can capture this with a simple but powerful family of metrics based on inter-frame displacement. For each pair of consecutive frames, we compute a dense or block-based optical flow field and summarize it into a mean flow vector and a dispersion measure. The mean flow magnitude reflects how fast the ground is moving under the camera; the mean flow direction reflects the direction of travel. The dispersion (e.g., standard deviation of flow vectors) reflects local inconsistencies due to parallax, moving objects, or noise. Over straight edges, we expect the mean flow direction to be stable and the dispersion to be relatively low and slowly varying. At corners, the mean direction will rotate over a short sequence of frames, and dispersion may spike as the motion field transitions. This gives us three basic observability metrics per frame or per window: average flow magnitude, average flow direction, and flow dispersion. These can be logged as metrics in the observability pipeline and then aggregated over sliding windows to produce higher-level signals: direction stability (e.g., variance of direction over the last N frames), magnitude stability, and dispersion anomalies. Because the camera is fixed in orientation, we can also exploit frame differencing and spatial alignment more aggressively. For example, we can compute a global translational alignment between consecutive frames using phase correlation or template matching. The resulting translation vector is a robust proxy for the drone’s planar motion. Again, along straight edges, the translation vector’s direction is stable; at corners, it rotates. The 

Tuesday, April 21, 2026

 Smallest stable index:

You are given an integer array nums of length n and an integer k.

For each index i, define its instability score as max(nums[0..i]) - min(nums[i..n - 1]).

In other words:

• max(nums[0..i]) is the largest value among the elements from index 0 to index i.

• min(nums[i..n - 1]) is the smallest value among the elements from index i to index n - 1.

An index i is called stable if its instability score is less than or equal to k.

Return the smallest stable index. If no such index exists, return -1.

Example 1:

Input: nums = [5,0,1,4], k = 3

Output: 3

Explanation:

• At index 0: The maximum in [5] is 5, and the minimum in [5, 0, 1, 4] is 0, so the instability score is 5 - 0 = 5.

• At index 1: The maximum in [5, 0] is 5, and the minimum in [0, 1, 4] is 0, so the instability score is 5 - 0 = 5.

• At index 2: The maximum in [5, 0, 1] is 5, and the minimum in [1, 4] is 1, so the instability score is 5 - 1 = 4.

• At index 3: The maximum in [5, 0, 1, 4] is 5, and the minimum in [4] is 4, so the instability score is 5 - 4 = 1.

• This is the first index with an instability score less than or equal to k = 3. Thus, the answer is 3.

Example 2:

Input: nums = [3,2,1], k = 1

Output: -1

Explanation:

• At index 0, the instability score is 3 - 1 = 2.

• At index 1, the instability score is 3 - 1 = 2.

• At index 2, the instability score is 3 - 1 = 2.

• None of these values is less than or equal to k = 1, so the answer is -1.

Example 3:

Input: nums = [0], k = 0

Output: 0

Explanation:

At index 0, the instability score is 0 - 0 = 0, which is less than or equal to k = 0. Therefore, the answer is 0.

Constraints:

• 1 <= nums.length <= 100

• 0 <= nums[i] <= 109

• 0 <= k <= 109

class Solution {

    public int firstStableIndex(int[] nums, int k) {

        long[] scores = new long[nums.length];

        for (int i = 0; i < nums.length; i++) {

            int max = Integer.MIN_VALUE;

            int min = Integer.MAX_VALUE;

            for (int j = 0; j <= i; j++) {

                if (nums[j] > max) {

                    max = nums[j];

                }

            }

            for (int j = i; j < nums.length; j++) {

                if (nums[j] < min) {

                    min = nums[j];

                }

            }

            // System.out.println("max="+max+"&min="+min);

            scores[i] = (long) max - min;

        }

        long min_score = k;

        int min_score_index = -1;

        int first_stable_index = Integer.MIN_VALUE;

        for (int i = 0; i < scores.length; i++) {

            if ( scores[i] <= min_score ) {

                min_score = scores[i];

                min_score_index = i;

                if (first_stable_index == Integer.MIN_VALUE) {

                    first_stable_index = i;

                }

            }

        }

        if (first_stable_index == Integer.MIN_VALUE) {

            first_stable_index = -1;

        }

        return first_stable_index;

    }

}

Test cases:

Case 1:

Input

nums =

[5,0,1,4]

k =

3

Output

3

Expected

3

Case 2:

Input

nums =

[3,2,1]

k =

1

Output

-1

Expected

-1

Case 3:

Input

nums =

[0]

k =

0

Output

0

Expected

0

#Codingexercise: Codingexercise-04-21-2026.docx

Today's article: Derived Metrics 

Sunday, April 19, 2026

 Longest Balanced Substring After One Swap

You are given a binary string s consisting only of characters '0' and '1'.

A string is balanced if it contains an equal number of '0's and '1's.

You can perform at most one swap between any two characters in s. Then, you select a balanced substring from s.

Return an integer representing the maximum length of the balanced substring you can select.

Example 1:

Input: s = "100001"

Output: 4

Explanation:

• Swap "100001". The string becomes "101000".

• Select the substring "101000", which is balanced because it has two '0's and two '1's.

Example 2:

Input: s = "111"

Output: 0

Explanation:

• Choose not to perform any swaps.

• Select the empty substring, which is balanced because it has zero '0's and zero '1's.

Constraints:

• 1 <= s.length <= 105

• s consists only of the characters '0' and '1'

class Solution {

    public int longestBalanced(String s) {

        int max = 0;

        for (int i = 0; i < s.length(); i++) {

            for (int j = i+1; j < s.length(); j++) {

                int count0 = 0;

                int count1 = 0;

                for (int k = i; k <= j; k++) {

                    if (s.charAt(k) == '1') {

                        count1++;

                    } else {

                        count0++;

                    }

                }

                if (count0 == count1 && (j-i+1) > max) {

                    max = j - i + 1;

                }

                else if ((j - i + 1) <= (2 * Math.min(count0, count1) + 1)) {

                    for (int m = 0; m < i; m++) {

                        if (s.charAt(m) == '0' && Math.min(count0, count1) == count0 && (j-i+2) > max) { max = (j-i+2);}

                        if (s.charAt(m) == '1' && Math.min(count0, count1) == count1 && (j-i+2) > max) { max = (j-i+2);}

                    }

                    for (int n = j+1; n < s.length(); n++) {

                        if (s.charAt(n) == '0' && Math.min(count0, count1) == count0 && (j-i+2) > max) { max = (j-i+2);}

                        if (s.charAt(n) == '1' && Math.min(count0, count1) == count1 && (j-i+2) > max) { max = (j-i+2);}

                    }

                } else {

                    // skip

                }

            }

        }

        return max;

    }

}

Test cases:

Case 1:

Input

s =

"100001"

Output

4

Expected

4

Case 2:

Input

s =

"111"

Output

0

Expected

0