Wednesday, December 31, 2025

 This is a summary of a book titled “Developing the Leader Within You 2.0” written by John C. Maxwell and published by Harper Collins in 2018. In this book, he explores the essential qualities and practices that define effective leadership, drawing on decades of experience and a wealth of illustrative case histories. He starts by saying that leadership is not merely a matter of position or seniority, nor is it an innate trait reserved for a select few. Instead, he argues, leadership is a set of skills and character traits that anyone can develop through intentional effort and self-reflection. He emphasizes that the journey to becoming a great leader is transformative, promising to enhance effectiveness, reduce weaknesses, lighten workloads, and multiply one’s impact on others.

Maxwell acknowledges that many potential leaders hesitate to pursue growth, often held back by limiting beliefs. Some may think they are not “born leaders,” or that a title or years of experience will automatically confer leadership status. Others postpone their development, waiting for an official appointment before investing in themselves. Maxwell counters these misconceptions with the wisdom of John Wooden, who cautioned that preparation must precede opportunity. The message is clear: leadership development is a proactive endeavor, and the time to start is now.

He asserts the mastery of ten fundamental capabilities. The first is influence, which he describes as the cornerstone of leadership. Influence is earned through respect and manifests in various forms, from positional authority to the ability to inspire and develop others. Maxwell illustrates the five levels of leadership, ranging from the basic authority of a position to the pinnacle of influence achieved through personal excellence and the development of others. He shares personal anecdotes, such as the lasting impact of a teacher’s encouragement, to demonstrate how influence can ripple through countless lives. Maxwell’s mantra, “Leadership is influence,” underscores the importance of cultivating authentic authority.

Judgment is the second capability, and Maxwell reframes time management as the art of setting priorities. Everyone receives the same twenty-four hours each day, but leaders distinguish themselves by choosing how to spend that time wisely. He encourages self-analysis to identify what matters most, advocating for proactive decision-making and the mature acceptance that not everything can be accomplished. Prioritization, he suggests, is the key to productivity and fulfillment.

Character forms the ethical foundation of leadership. Maxwell notes that leading oneself is often the greatest challenge, requiring ongoing self-examination and the courage to reshape one’s own behavior. He draws on the example of Pope Francis, who warns leaders to avoid common pitfalls such as arrogance, busyness, inflexibility, and lack of gratitude. Authenticity, humility, and gratitude are vital, while rivalry, hypocrisy, and indifference erode trust and effectiveness.

Change management is another critical skill. Maxwell recounts the story of Lou Holtz, a football coach who transformed losing teams into champions by embracing change and inspiring others to do the same. Change, Maxwell observes, is often accompanied by emotional turmoil and resistance, but leaders must help others see the benefits that outweigh the losses. The ability to guide teams through transitions is a hallmark of agile leadership.

Problem-solving is presented as an opportunity rather than a burden. Maxwell cites M. Scott Peck’s insight that accepting life’s difficulties makes them easier to overcome. Leaders, he notes, are perpetually navigating crises, and their effectiveness depends on viewing challenges as chances for growth and innovation.

Attitude is another defining trait. Maxwell highlights the importance of positivity, tenacity, and hope, noting that followers often mirror the disposition of their leaders. He quotes Charles Swindoll, who places attitude above education, wealth, and circumstance. A leader’s outlook shapes the culture and morale of the entire team.

Servant leadership is a core value for Maxwell, shaped by his own journey as a church pastor. Initially focused on personal achievement, he was transformed by the philosophy of Zig Ziglar, who taught that helping others achieve their goals leads to mutual success. Maxwell now champions the idea that serving others is the essence of true leadership.

Vision is essential for providing teams with purpose and direction. Without vision, Maxwell warns, teams lose energy and focus, becoming fragmented and disengaged. A leader’s ability to articulate a compelling future inspires commitment and elevates ordinary work to extraordinary levels.

Self-control is the discipline required to lead oneself before leading others. Maxwell invokes Harry S. Truman’s belief that self-mastery is the first victory. Leaders must travel inward, cultivating self-discipline, because followers will not trust someone who lacks control.

Personal growth is the ongoing process of expanding one’s abilities and expertise. Maxwell shares his tradition of reflecting on lessons learned at each decade of life, emphasizing that growth requires a willingness to surrender comfort and embrace change. The pursuit of personal development leads to greater influence, decisiveness, discipline, and positivity, ultimately shaping a more complete leader and person.

Throughout this book, Maxwell weaves together practical advice, personal stories, and timeless wisdom to create a compelling guide for anyone seeking to unlock their leadership potential. The book’s message is both empowering and challenging: leadership is within reach for those willing to invest in themselves, embrace growth, and serve others. By mastering these ten capabilities, individuals can transform not only their own lives but also the lives of those they lead.


Tuesday, December 30, 2025

 Visio‑LLM versus agentic retrieval: Which is better?

In aerial drone image analytics, vision‑LLMs and agentic retrieval are starting to look less like competing paradigms and more like different gradients of the same idea: how much of our “intelligence” lives in a single multimodal model, and how much is distributed across specialized tools that the model orchestrates. The most recent geospatial benchmarks make that trade‑off very concrete.

Geo3DVQA is a good anchor for understanding what raw vision‑LLMs can and cannot do for remote sensing. It evaluates ten state‑of‑the‑art vision‑language models on 3D geospatial reasoning tasks using only RGB aerial imagery—no LiDAR, no multispectral inputs, just the kind of data we get at scale arXiv.org arXiv.org. The benchmark spans 110k question–answer pairs across 16 task categories and three levels of complexity, from single‑feature questions (“What is the dominant land cover here?”) to multi‑feature reasoning (“Are the taller buildings concentrated closer to the river?”) and application‑level spatial analysis (“Is this neighborhood at high risk for heat‑island effects?”) arXiv.org. When we look at the performance, the story is sobering. General‑purpose frontier models like GPT‑4o and Gemini‑2.5‑Flash manage only 28.6% and 33.0% accuracy respectively on this benchmark arXiv.org arXiv.org. A domain‑adapted Qwen2.5‑VL‑7B, fine‑tuned on geospatial data, jumped to 49.6%, gaining 24.8 percentage points over its base configuration arXiv.org arXiv.org. That’s a big relative gain, but it’s still far from the kind of reliability we want if the output is going to drive asset inspections, risk scoring, or regulatory reporting.

Those numbers capture the core reality of pure vision‑LLM usage in drone analytics today. If our task is open‑ended visual understanding—describing scenes, answering flexible questions, triaging imagery, or accelerating human review—these models already add real value. They compress rich spatial structure into text in a way that is incredibly convenient for analysts and downstream systems. But when the task requires precise, height‑aware reasoning, consistent semantics across large areas, or application‑grade spatial analysis, even the best general models underperform without heavy domain adaptation arXiv.org. In other words, “just ask the VLM” is powerful for exploration but fragile for anything that must be consistently correct at scale.

Agentic retrieval frameworks approach the same problem from the opposite direction. Instead of relying on a single, monolithic vision‑LLM to do perception, memory, and planning all at once, they treat the model as one decision‑making component in a multi‑agent system—one that can call out to external tools, databases, and specialized models when needed. UAV‑CodeAgents is a clear example in the UAV domain. It uses a ReAct‑style architecture where multiple agents collaboratively interpret satellite imagery and high‑level natural language instructions, then generate executable UAV missions arXiv.org. The system includes a vision‑grounded pixel‑pointing mechanism that lets the agents refer to precise locations on the map, and a reactive thinking loop so they can iteratively revise goals as new observations arrive arXiv.org. In large‑scale mission planning scenarios for industrial and environmental fire detection, UAV‑CodeAgents achieves a 93% mission success rate, with an average mission creation time of 96.96 seconds arXiv.org. The authors show that lowering the decoding temperature to 0.5 improves planning reliability and reduces execution time, and that fine‑tuning Qwen2.5‑VL‑7B on 9,000 annotated satellite images strengthens spatial grounding arXiv.org.

What’s striking here is that the system’s effectiveness comes from the interplay between the vision‑LLM and the agentic scaffold around it. The VLM is not directly “flying the drone” or making all decisions. Instead, it interprets images, reasons in language, and chooses when to act—e.g., calling tools, updating waypoints, or revising mission plans arXiv.org. The agentic layer enforces structure: we have explicit mission goals, world representation, constraints, and action APIs. As a result, the same underlying multimodal model that might only reach 30–50% accuracy on a free‑form VQA benchmark can, when harnessed in this way, support end‑to‑end mission plans that succeed more than 90% of the time in the evaluated scenarios arXiv.org. The retrieval part—pulling in maps, prior detections, environmental context, or historical missions—is implicit in that architecture: the agents are constantly grounding their decisions in external data sources rather than relying solely on the VLM’s internal weights.

If we put Geo3DVQA and UAV‑CodeAgents side by side, we get a quantitative feel for the trade‑off. Raw vision‑LLMs, even frontier‑scale ones, struggle to exceed 30–33% accuracy on complex 3D geospatial reasoning with RGB imagery, whereas a domain‑adapted 7B model can reach 50% arXiv.org arXiv.org. That’s good enough for “co‑pilot”‑style assistance but not for autonomous decision making. Meanwhile, an agentic system that embeds a comparable VLM inside a multi‑agent ReAct framework, and couples it to grounded tools and explicit mission representations, can deliver around 93% mission success in its target domain, with sub‑two‑minute planning times arXiv.org. The exact numbers are not directly comparable—Geo3DVQA is a question‑answer benchmark, UAV‑CodeAgents is mission generation—but they point in the same direction: the more we offload structure, memory, and control to an agentic retrieval layer, the more we can extract robust, end‑to‑end performance from imperfect vision‑LLMs.

For aerial drone image analytics specifically—change detection, object‑of‑interest search, compliance checks, risk scoring—the practical implications are clear. A pure vision‑LLM approach is ideal when we want to sit an analyst in front of a scene and let them ask free‑form questions: “What seems unusual here?”, “Where are the access points?”, “Which rooftops look suitable for solar?” The model’s strengths in semantic abstraction and natural language reasoning shine in those settings, and benchmarks like Geo3DVQA suggest that domain‑tuned models will keep getting better arXiv.org. But as soon as we care about consistency across thousands of scenes, strict thresholds, or compositional queries over time and space, we want those questions to be mediated by an agentic retrieval system that explicitly tracks objects, events, geospatial layers, and past decisions. In that world, the vision‑LLM is mostly a perception‑and‑intent module: it turns raw pixels and human queries into structured facts and goals, which the agents then reconcile against a retrieval layer made of maps, catalogs, and traditional analytics.

The research frontier is moving in two complementary directions. On the vision‑LLM side, Geo3DVQA highlights the need for models that can infer 3D structure and environmental attributes from RGB alone and shows that domain‑specific fine‑tuning can double performance relative to general models arXiv.org arXiv.org. We can expect a wave of remote‑sensing‑tuned VLMs that push accuracy beyond 50% on multi‑step geospatial reasoning tasks and start to integrate external cues like DEMs, climate data, and building footprints in more principled ways. On the agentic retrieval side, UAV‑CodeAgents demonstrates that multi‑agent ReAct frameworks, with explicit grounding and tool calls, can already achieve high mission success in constrained scenarios arXiv.org. The next step is to standardize benchmarks for these systems: not just asking whether the VLM answered the question correctly, but whether the full agentic pipeline produced safe, efficient, and explainable decisions on real drone missions.

What is missing—and where there is room for genuinely new work—is a unified evaluation that holds everything constant except the degree of “agentic scaffolding.” Imagine taking the same aerial datasets, the same base VLM, and comparing three regimes: the VLM answering questions directly; the VLM augmented with retrieval over a geospatial database but no explicit agency; and a fully agentic, multi‑tool system that uses the VLM only as a reasoning and perception kernel. We could measure not only accuracy and latency, but also mission success, human trust, error recoverability, and the ease with which analysts can audit and refine decisions. Geo3DVQA provides the template for rigorous perception‑level benchmarking arXiv.org; UAV‑CodeAgents sketches how to evaluate mission‑level performance in an agentic system arXiv.org. The next wave of work will connect those two levels, and the most interesting findings will not be “VLMs versus agentic retrieval,” but how to architect their combination so that drone analytics pipelines are both more powerful and more controllable than either paradigm alone.


Monday, December 29, 2025

 Visio‑LLM chat interface versus objects‑in‑scenes catalog plus SQL: which is better?

The clearest quantitative comparison between a language-model-based querying interface and a traditional SQL workflow comes from Ipeirotis and Zheng’s 2025 user study on natural language interfaces for databases (NLIDBs). They compare SQL‑LLM, a modern NL2SQL system built on Seek AI, with Snowflake’s native SQL interface in a controlled lab setting with 20 participants and 12 realistic analytics tasks per participant. The results are surprisingly decisive: the NL2SQL interface reduces mean task completion time from 629 seconds to 418 seconds, a 10–30% speedup depending on task, with a statistically significant difference (p = 0.036). At the same time, task accuracy rises from 50% to 75% (p = 0.002). Participants also reformulate queries less often, recover from errors 30–40 seconds faster, and report lower frustration. Behavioral analysis shows that, when the NLIDB is well‑designed, users actually adopt more structured, schema‑aware querying strategies over time, rather than treating the system as a vague natural language oracle.

If this is mapped to the data analytics world, SQL‑LLM is essentially “LLM chat front‑end that emits structured queries”; Snowflake is the canonical structured interface. So, at least in the textual domain, a chat interface tightly coupled to a correct, inspectable execution layer can be both faster and more accurate than a traditional SQL UI for mixed‑skill users. The result is not just that “chat is nicer,” but that it materially shifts the error profile: users spend less time fighting syntax and more time converging on the right question.

On the visual analytics side, Martins and colleagues provide a 2025 systematic review, “Talking to Data,” which synthesizes the rise of conversational agents for visual analytics and natural-language-to-visualization (NL2VIS) workflows. They survey LLM‑based agents that let users ask questions like “Show me a time series of daily incidents by district and highlight outliers” and receive automatically generated charts and dashboards. Across the systems they review, the primary benefit is consistent: conversational interfaces dramatically lower the barrier to entry for non‑technical users and accelerate first‑insights exploration for everyone. Users no longer need to know which chart type, which field, or which filter to apply; instead, they iteratively describe intent in language. The review notes an acceleration of research after 2022 and highlights common architectural patterns such as multi‑agent reasoning (one agent for intent parsing, another for code generation, another for validation), context‑aware prompting, and automatic code generation backends that produce SQL or visualization scripts under the hood.

But the same review is blunt about the downsides. LLM‑driven visual analytics systems suffer from prompt brittleness, hallucinated insights, and inconsistent performance across domains. In other words, they shine in “getting started” and in ideation but can be fragile in the long tail of complex or ambiguous queries. This is precisely where a structured objects‑in‑scenes catalog plus SQL (or structured filters) tends to dominate: once a user knows what she wants, a faceted object browser with composable filters and explicit SQL conditions is precise, auditable, and predictable. The current research consensus is not that conversational agents replace structured interfaces, but that they act as an outer, more human‑friendly layer wrapped around a rigorous, structured core.

The vision‑specific part is still emerging, but there is emerging pattern in recent work on LLM‑assisted visual analytics agents. Zhao and colleagues’ ProactiveVA framework implements an LLM‑powered UI agent that monitors user interactions with a visual analytics system and offers context‑aware suggestions proactively, rather than only on demand. Instead of just answering queries, the agent watches when users get “stuck” in complex visual tools and intervenes with suggestions: alternative views, drill‑downs, parameter changes. They implement the agent in two different visual analytics systems and evaluate it through algorithmic evaluation and user and expert studies, showing that proactive assistance can help users navigate complexity more effectively. Although ProactiveVA is not focused purely on vision‑language object querying, it illustrates the same interaction pattern likely to emerge in vision‑LLM settings: the agent lives on top of a rich, structured tool (our object catalog, filters, metrics) and orchestrates interactions, rather than replacing the underlying structure.

If one projects the NLIDB and NL2VIS findings into a vision‑LLM setting where the underlying data is an objects‑in‑scenes catalog indexed by SQL, a few hypotheses are well‑supported by existing evidence, even if not yet directly tested for aerial or scene‑level vision. First, a vision‑LLM chat interface that translates “natural” questions like “Show me all intersections with at least three trucks and a pedestrian within 10 meters in the last 5 minutes” into structured queries over a scene catalog will almost certainly improve accessibility and time‑to‑first‑answer for non‑SQL users, mirroring the 10–30% time savings and 25‑point accuracy gains seen in NLIDB studies. Second, the same studies suggest that, with appropriate feedback—showing the generated SQL, visualizing filters, allowing users to refine them—users begin to internalize the schema and move toward more structured mental models over time, rather than staying in a purely “chatty” mode. Third, NL2VIS work indicates that conversational interfaces excel at exploration, hypothesis generation, and “what’s interesting here?” tasks, while deterministic structured interfaces excel at confirmatory analysis and compliance‑grade reproducibility.

At the same time, all the pain points NL2VIS and NLIDB researchers describe will be amplified in vision‑LLM workflows. Hallucinations in vision‑language models mean that a chat interface might confidently describe patterns or objects that are not actually present in the underlying catalog, unless the system is architected so that the LLM can only reason over ground‑truth detections and metadata, not raw pixels. Schema ambiguity becomes more complicated, because the same visual concept (say, “truck near crosswalk”) may correspond to multiple object categories, spatial predicates, and temporal windows in the catalog. The review by Martins et al. emphasizes that robust systems increasingly rely on multi‑stage pipelines and explicit grounding: one module to resolve user intent, another to generate executable code, and another to validate results against the data and, if necessary, ask follow‑up questions. That is roughly the architecture we would want for trustworthy vision‑LLM interfaces as well.

Upcoming research directions in the literature line up nicely with the gap we are pointing at. Martins et al. explicitly call for more systematic user studies that compare conversational agents to traditional visual analytics tools, focusing not only on accuracy and time, but also on trust, learnability, and long‑term workflow integration. They highlight the need for standardized benchmarks for conversational visual analytics—essentially the NL2SQL benchmarks, but for NL2VIS and related tasks. ProactiveVA, meanwhile, opens the door to agentic systems that do more than answer questions: they monitor interaction logs, predict when the user needs help, and suggest next steps in an interpretable, controllable way. Extending such agents to vision‑centric workflows, where the agent can propose new filters or views on top of an objects‑in‑scenes catalog, is a natural next step.

What is still missing, and where there is clear space for original work, is an end‑to‑end, quantitative comparison between three modes on the same vision dataset: first, a pure objects‑in‑scenes catalog with SQL or GUI filters; second, a vision‑LLM chat interface that only describes scenes but does not drive structured queries; and third, a hybrid system where the chat interface is grounded in the catalog and always produces explicit, inspectable queries. The database and visual analytics communities have now shown that the hybrid pattern—LLM chat front‑end, structured execution back‑end—can deliver significant gains in speed, accuracy, and user satisfaction over traditional interfaces alone. Vision‑centric systems are just starting to catch up. If we frame our Drone Video Sensing Applications work as “bringing the NLIDB/NL2VIS playbook into multimodal, scene‑level analytics” and design a user study with metrics analogous to Ipeirotis and Zheng’s, we would not just be building a product interface; we would be writing one of the first concrete answers to the question we are asking.


Sunday, December 28, 2025

 Vision-LLMs within the context of an aerial drone image analytics framework 

Recent advances in multimodal large language models (vision‑LLMs) have begun to reshape the methodological landscape of aerial and remote‑sensing analytics. Models such as PaliGemma, RemoteCLIP, GeoChat, and LLaVA represent distinct but converging trajectories in visual–linguistic reasoning, each offering capabilities that can be strategically integrated into an end‑to‑end drone image analytics framework. Their emergence coincides with the increasing availability of high‑resolution drone imagery, the maturation of cloud‑scale inference infrastructure, and the growing demand for explainable, instruction‑following geospatial models. Together, these trends suggest a new generation of analytics pipelines that combine classical computer vision with grounded multimodal reasoning. 

PaliGemma, developed within the Gemma ecosystem, exemplifies a general‑purpose multimodal model capable of image captioning, segmentation, and zero‑shot object detection. The official Keras‑based inference notebooks demonstrate how PaliGemmaCausalLM can be loaded, provided with image tensors, and prompted for tasks such as referring‑expression segmentation and object detection ai.google.dev Github. These examples illustrate a flexible architecture that can be adapted to drone imagery, particularly for tasks requiring contextual reasoning—such as describing anomalous structures, identifying land‑use transitions, or generating natural‑language summaries of flight‑level observations. While PaliGemma is not explicitly trained on remote‑sensing corpora, its generalization performance on high‑resolution imagery suggests that domain‑adapted fine‑tuning, as shown in the finetuning notebooks Github, could yield strong performance on aerial datasets. 

RemoteCLIP, by contrast, is explicitly optimized for remote‑sensing tasks. Its training on large‑scale satellite and aerial datasets enables robust zero‑shot classification and retrieval performance, outperforming baseline CLIP models on RSICD and other benchmarks by significant margins. The publicly available Python demo illustrates how RemoteCLIP checkpoints can be downloaded from Hugging Face, loaded via open_clip, and used to compute text–image similarity for remote‑sensing queries Github. This capability is particularly relevant for drone analytics pipelines that require rapid semantic retrieval—for example, identifying all frames containing runways, construction sites, or agricultural patterns without requiring task‑specific training. RemoteCLIP’s performance gains on remote‑sensing benchmarks make it a strong candidate for embedding‑level components of our framework, such as indexing, clustering, and semantic search. 

GeoChat extends the LLaVA‑style architecture into a grounded remote‑sensing domain, offering region‑level reasoning, visual question answering, and referring‑object detection tailored to high‑resolution imagery. The GeoChat demo codebase provides a full Python pipeline for loading pretrained models, processing images, and generating multimodal conversational outputs Github. Unlike general‑purpose models, GeoChat is explicitly trained on remote‑sensing instruction‑following datasets, enabling it to interpret complex spatial relationships, describe land‑use categories, and reason about object interactions in aerial scenes. This makes GeoChat particularly suitable for mission‑critical drone workflows such as damage assessment, environmental monitoring, and infrastructure inspection, where interpretability and grounded reasoning are essential. 

LLaVA, one of the earliest widely adopted vision‑LLMs, remains a strong baseline for multimodal reasoning. Python examples using vLLM demonstrate how LLaVA‑1.5 can be loaded and prompted with images to generate descriptive outputs nm-vllm.readthedocs.io. Although not domain‑specialized, LLaVA’s broad adoption and extensive community tooling make it a practical choice for prototyping drone‑analytics tasks such as captioning, anomaly explanation, or operator‑assistive interfaces. Its availability across multiple cloud providers—including Azure’s model catalog and open‑source inference runtimes—further enhances its deployability. 

Across benchmarks, RemoteCLIP and GeoChat generally outperform general‑purpose models on remote‑sensing tasks, particularly in zero‑shot classification, region grounding, and high‑resolution reasoning. PaliGemma and LLaVA, while more generalist, benefit from larger ecosystems, more mature tooling, and broader cloud redistribution. Azure, AWS, and GCP increasingly support these models through managed inference endpoints, containerized deployments, and GPU‑accelerated runtimes, enabling scalable integration into drone‑analytics pipelines. Industry adoption is strongest for CLIP‑derived models in geospatial indexing and for LLaVA‑style models in operator‑assistive interfaces, while GeoChat is gaining traction in research and early‑stage deployments for environmental monitoring and disaster response. 

Within our aerial drone analytics framework, these models can be positioned as complementary components: RemoteCLIP for embedding‑level retrieval and semantic indexing; PaliGemma for captioning, segmentation, and general multimodal reasoning; GeoChat for grounded geospatial interpretation; and LLaVA for prototyping and operator‑facing interfaces. Their integration would enable a hybrid pipeline capable of both high‑throughput automated analysis and interactive, human‑in‑the‑loop reasoning. 

Future research directions include domain‑adaptive fine‑tuning of PaliGemma and LLaVA on drone‑specific corpora, cross‑model ensemble methods that combine RemoteCLIP embeddings with GeoChat reasoning, and the development of multimodal agents capable of autonomously triaging drone imagery, generating structured reports, and interacting with downstream geospatial databases. Additionally, exploring Azure‑native optimizations—such as ONNX‑runtime quantization, Triton‑based inference, and vector‑search integration with Azure AI Search—could yield substantial performance gains for large‑scale deployments. These directions align naturally with our broader goal of constructing a reproducible, benchmark‑driven, cloud‑scalable analytics framework for next‑generation aerial intelligence. 

Saturday, December 27, 2025

 This is a summary of the book titled “The 9% Edge: The life-changing secrets to create more revenue for your business and more freedom for yourself” written by Candy Valentino and published by Wiley in 2024. In the world of entrepreneurship, the odds are daunting. According to acclaimed business consultant Candy Valentino, only 9% of businesses manage to generate enough profit to sustain themselves year after year. The vast majority—91%—fail to survive their first decade. In her book, she sets out to demystify what sets this rare group apart, offering a practical guide filled with action plans and checklists for those determined to beat the odds.

There is no universal formula for business success. Instead, enduring companies are built by founders who create a “one size that fits you” business, not by chasing after a mythical “one size fits all” solution. These entrepreneurs don’t rely on silver-bullet strategies or overnight miracles. Instead, they cultivate skills, talents, and disciplines that allow them to thrive where others falter. They focus relentlessly on earning revenue, building profits, and expanding their customer base, all while maintaining the quality of their products or services and finding ways to overcome the inevitable obstacles that arise.

A recurring theme in Valentino’s narrative is the importance of vigilance and discipline. She urges entrepreneurs to avoid waste, keep a close eye on their numbers, and be wary of the hazards that can derail a business. Achieving the coveted 9% edge requires digging beneath the surface, rooting out inefficiencies, and maintaining robust financial and operational systems.

Valentino also challenges the popular notion that passion alone should drive business decisions. While passion can be a powerful motivator, she argues that it’s far more important to build a business model that aligns with your current and future goals and is congruent with your authentic self. Blindly following passion, she warns, can lead to narrow-mindedness and missed opportunities. Instead, entrepreneurs should strive for a balance between businesslike practices and personal fulfillment.

The book identifies several “deadly land mines” that threaten new businesses. One is “Self-Employed Sabotage,” where founders become so involved in every aspect of their business that they stifle growth and scalability. Valentino advises learning to delegate and automate as soon as possible. Another is the “Passion Paradox,” where overconfidence and tunnel vision prevent entrepreneurs from seeking sound advice. “The Shiny Object Syndrome” describes the tendency to become distracted by lofty goals at the expense of practical realities, while “Controlitis” and “Competency Disease” refer to the dangers of micromanagement and the refusal to trust others. The antidote to these pitfalls is to build replicable systems, invest in team development, and delegate with confidence.

Financial literacy is another cornerstone of long-term success. Valentino emphasizes the importance of tracking key metrics, especially EBITDA—earnings before interest, taxes, depreciation, and amortization—which provides a clear picture of a business’s core value. Even if selling the company is a distant prospect, focusing on EBITDA is essential for steady growth. She also highlights the emotional and complex nature of business exits, reminding readers that the journey is rarely as simple as the celebratory handshake photo suggests.

To build EBITDA and ensure financial health, Valentino recommends a series of tactical steps: increase revenue by expanding your customer base and offerings, optimize for profit by streamlining operations and cutting costs, strengthen customer relationships, diversify your products or services, make strategic investments in technology and people, and manage debt wisely. She notes that acquiring new customers is always more expensive than retaining existing ones, so maximizing revenue from current customers is crucial.

Valentino points to companies like McDonald’s as examples of enduring success, citing their ability to adapt to changing markets, price strategically, and focus on efficient administration. She explains how businesses can boost their average order value (AOV) through upselling, bundling, cross-selling, volume discounts, loyalty programs, and personalized recommendations. Understanding customer behavior and tailoring marketing strategies accordingly is key to increasing AOV and, by extension, profitability.

Valentino asserts that long-term business success requires resolve and determination. It’s not about having the perfect background, degree, or mentor, but about persistence and fortitude. She encourages entrepreneurs to look beyond personal gain and contribute to the welfare of others, finding fulfillment in making a positive impact on their communities. By consistently applying the principles of the 9% edge, business owners can achieve steady sales, solid revenues, and meaningful profits—while also discovering a deeper sense of purpose.

#codingexercise: Codingexercise-12-27-2025.docx

Friday, December 26, 2025

 Solving Jumble:

# Find a common word that is a jumble of the letters RGYUEN

Solution:

import java.util.*;

import java.lang.*;

import java.io.*;

/* Name of the class has to be "Main" only if the class is public. */

class Ideone

{

 public static void main (String[] args) throws java.lang.Exception

 {

  String a = "RGYUEN";

  StringBuilder b = new StringBuilder();

  List<String> permutations = new ArrayList<String>();

  boolean[] used = new boolean[a.length()];

  permute(a, b, used, 0, a.length()-1, perumations);

  for (int i = 0; i < permutations.size(); i++) {

   if (isValid(permutations.get(i))) {

    System.out.println(permutations.get(i));

   }

  }

 }

 public static void permute(String a, StringBuilder b, boolean[] used, int start, int end, List<String> permutations) {

  if (b.length() == end - start + 1) {

   permutations.add(b.toString());

   return;

  }

  for (int i = start; i <= end; i++) {

   if (used[i]) continue;

   used[i] = true;

                                            b.append(a.charAt(i));

   permute(a, b, used, start, end, permutations);

   b.deleteCharAt(b.length()-1);

   used[i] = false;

  }

 }

 public static boolean isValid(String s) {

  if ((s.charAt(0) == 'G' || s.charAt(2) == 'Y' || s.charAt(5) == 'R') &&

      (s.charAt(0) == 'G' || s.charAt(2) == 'Y' || s.charAt(5) == 'R') &&

      (s.charAt(0) == 'G' || s.charAt(2) == 'Y' || s.charAt(5) == 'R'))

      return true;

  return false;

 }

}

Answer: Gurney

#codingexercise: CodingExercise-12-26-2025.docx 

#booksummary: BookSummary401.docx

Thursday, December 25, 2025

 This is a continuation of article from day before yesterday on benchmark cases for aerial drone image analytics:

Case 3: types of hazards detected:

• Cars Parked in Bicycle Lanes: Vehicles obstructing dedicated bike lanes can force cyclists into traffic.

• Pedestrians Crossing Intersections: Pedestrians may cross at intersections unpredictably, sometimes against traffic signals.

• Car Crossings at Intersections: Vehicles turning or crossing at intersections can pose risks to cyclists and pedestrians.

• Improperly Marked Crosswalks: Lack of clear signage or faded markings can lead to confusion for pedestrians and drivers.

• Construction Zones: Temporary constructions can create obstacles and require detours, increasing risks.

• Poor Visibility Areas: Curves, or poorly lit areas can reduce visibility for both cyclists and drivers.

• Cyclists Riding on Sidewalks: In some areas, cyclists riding on sidewalks can surprise pedestrians.

• Vehicle Door Zones: Cyclists are at risk from opened car doors when riding near parked vehicles.

• Inadequate Lighting: Poorly lit areas can make it difficult for drivers to see cyclists and pedestrians.

• Obstructed Views: Trees, signs, or buildings may block sightlines at intersections.

• Weather Conditions: Rain, snow, or ice can affect road conditions and visibility.

• Bicycle Infrastructure: Inadequate or poorly designed bike paths can create hazardous situations.



Wednesday, December 24, 2025

 This is a summary of a book titled “Buyable: your guide to building a self-managing, fast-growing, and a high-profit business” written and self-published by Steve Preda in 2021. The ultimate dream for many founders is not just to build a thriving business, but to one day profitably cash out—reaping the rewards of years of hard work. Yet, as Steve Preda reveals in his book, this dream is elusive for most. The reality is stark: only a small fraction of business owners manage to sell their companies for the value they desire. The reason? Too many entrepreneurs become so absorbed in the daily grind and the relentless pursuit of profit that they neglect to plan for the eventual sale of their business.

Preda prescribes a set of management blueprints to maximize the value of your business and keep your options open for the future, you must build a “buyable” company. A buyable business is not just profitable and growing—it is structured, predictable, and operates with processes that can be replicated by others. Such a company is attractive to buyers because it offers stability, regular cash flows, and the promise of continued success even after the founder steps away. In contrast, businesses that are overly dependent on their founders or lack clear systems are often deemed “unbuyable,” and their owners may struggle to find buyers willing to pay a premium.

The statistics are sobering: most business owners face long odds—just a one-in-ten chance—of selling their company for the price they want. However, those who proactively “groom” their businesses for sale can achieve prices 30% to 50% higher than those who do not prepare. The key is to start with the end in mind, making strategic decisions that enhance the company’s marketability from the outset.

Preda outlines three primary paths to building a buyable business. The first is creative entrepreneurship, where founders launch independent ventures, often learning through trial and error. This route is rewarding but risky, with a steep learning curve—statistics show that 90% of startups don’t survive to their tenth year. The second path is franchise ownership, which offers a turnkey operation with proven systems but less room for innovation and a share of profits going to the parent company. The third, and perhaps most strategic, is to follow a tested management blueprint—leveraging the collective wisdom of business experts to build a company that is both independent and scalable.

He lists seven foundational management pillars: culture, structure, vision, strategy, execution, process, and alignment. A strong culture unites employees around a shared purpose, while a clear structure ensures accountability and smart decision-making. Vision gives the company direction, inspired by Maslow’s hierarchy of needs, motivating people to strive for higher goals once the basics are met. Strategy involves defining the company’s mission and understanding customer needs, while execution is about setting objectives and achieving measurable results—exemplified by Andy Grove’s leadership at Intel. Process design, as advocated by Frederick Winslow Taylor, ensures that operations are systematic and knowledge is passed on efficiently. Finally, alignment—championed by Jim Collins—ensures that everyone in the organization is moving in the same direction, preventing chaos and maximizing effectiveness.

To help entrepreneurs put these pillars into practice, Preda introduces ten leading management blueprints, each distilled from successful business books and real-world experience. These include Michael Gerber’s “E-Myth,” which urges founders to work on their business, not just in it; Jack Stack’s “The Great Game of Business,” which gamifies operations to engage employees; Verne Harnish’s “Rockefeller Habits,” which emphasizes priorities, data, and regular meetings; Gino Wickman’s “Entrepreneurial Operating System (EOS),” which focuses on vision, people, and execution; and several others, each offering practical frameworks for building a resilient, scalable company.

Preda’s narrative is one of proactive leadership. Savvy founders begin with the end in mind, understanding that selling a business is a process that can take 12 to 18 months. They know their “magic number”—the profit they need from a sale—and they prepare meticulously, maintaining records, building loyal customers, and strengthening contractual relationships. They seek out strategic buyers who can benefit from synergies, and they surround themselves with experienced advisors. In contrast, reactive founders who fail to plan may find themselves unable to sell or forced to accept far less than their business is worth.

#codingexercise: CodingExercise-12-24-2025.docx

Tuesday, December 23, 2025

 The following is distance measurement studies from DVSA benchmarks:

1. Determine scale of each frame.

2. Identification of the largest built-up structure encountered during the drone tour

3. Estimating the size of that largest built-up structure

4. Identification of the largest free space encountered during the drone tour

5. Estimating the size of that largest free space.

6. Identifying the count of similar sized structures within a scene.

7. Identifying the count of objects that can occupy a given free space

8. Identifying the distance between two points of interest across disparate frames, such as the length traversed by the drone in a specific direction prior to a turning point.

9. Total distance covered prior to revisits to a point of interest.

Methodology:

1. Scale (eg. 1:100 1 unit in the image = 100 units in real life). This needs to be found out only once.

a. Each frame has a location and timestamp before it is vectorized and stored in the vector store along with its insights on objects and bounding boxes. Therefore, scale resolution can be achieved in a few ways:

i. Using Ground Sample Distance as in 2cm/pixel as a fraction of real distance versus the tiniest point in an image with smaller GSD being better for details.

1. With GSD either known earlier or already computed as (Flight Altitude x Sensor Dimension) / (Focal length x Image Dimension), return scale as inversion of GSD

ii. Using well-known objects or landmarks:

1. Given the bounding box of a well-known object in the frame, say an intermediate sedan or a known landmark, compute the scale as representative fraction comprising of pixel-length by actual length on ground such as that of a semi-trailer.

2. Width of road: Given the width of the road in pixels and the ground distance from a city record or google maps, we can determine the scale.

iii. Using GPS co-ordinates:

1. Using overall tour:

a. get the overall tour bounding box width and height in terms of latitude and longitude by computing (min Latitude, min Logitude, max Latitude, max Longitude)

b. Calculate the fraction of the tour area covered by the current frame:

c. Proportionately distribute the height to width given the frame width and height or take the square root of the (fw x fh) / (tw x th)

d. Emit the scale

2. Using GPS co-ordinates of two points in the same frame:

a. Take two points in the frame such as one pertaining to the center of the frame given by the drone and another found from Google Maps and compute the actual distance using Haversine Formula.

height_m = haversine(lat_min, lon_min, lat_max, lon_min)

width_m = haversine(lat_min, lon_min, lat_min, lon_max)

Note: Since every frame has a GPS co-ordinate to begin with, to find another gps coordinate in the same frame, detect, clip and vectorize an object in that frame and find it in Google Maps of the scene at the Latitude and Longitude and get its GPS co-ordinates. Haversine can then be used to the actual distance while the pixel width gives the image-based distance.

b. Emit the scale

For example:

from math import radians, cos, sin, asin, sqrt

# Step 1: Haversine function to compute distances in meters

def haversine(lat1, lon1, lat2, lon2):

    R = 6371000 # Earth's radius in meters

    dlat = radians(lat2 - lat1)

    dlon = radians(lon2 - lon1)

    a = sin(dlat/2)**2 + cos(radians(lat1))*cos(radians(lat2))*sin(dlon/2)**2

    c = 2 * asin(sqrt(a))

    return R * c

# Bounding rectangle corners (nearest and farthest)

lat_min, lon_min = 42.37043, -71.12165

lat_max, lon_max = 42.37125, -71.11733

# Compute east-west (width) and north-south (height) ground distances, in meters

height_m = haversine(lat_min, lon_min, lat_max, lon_min)

width_m = haversine(lat_min, lon_min, lat_min, lon_max)

# Step 2: Area in square meters

area_m2 = width_m * height_m

# Step 3: Convert to square feet (1 m = 3.28084 ft)

area_ft2 = area_m2 * (3.28084 ** 2)

# Step 4: Convert to square miles (1 sq mile = 27,878,400 sq ft)

area_miles2 = area_ft2 / 27878400

print(f"Ground area covered: {area_miles2:.6f} square miles")

2. Largest built-up find:

a. The bounding boxes of all detected objects in a scene gives the area of each

b. sort and filter these to include only the buildings

c. Return the top most from descending order

3. Largest built-up area:

a. Using 2. Find the bounding box of the corresponding object in the scene and calculate width and height

b. With the scale computed from 1. And the width and height from previous step, calculate the area as width x scale x height x scale

4. Largest free-space find:

a. If the detected objects are tagged as one of park, street intersection, courtyard, parking lot, transit center, grass, pavement, lake, river etc, pick the largest one as shown from 2. Above

b. Use color histogram based analysis to classify land cover

5. Largest free-space area:

a. If the free space is in one of the detected objects, then its bounding box and scale gives the largest free space area

b. Otherwise get the color histogram and proportionately divide the area of the scene for the chosen color

6. Count of objects in a scene can be done with trained models or clustering and hdbscan

7. Given the object size is found by bounding box and scale and the free space is given by its bounding box and scale, this is just a simple multiple

8. Distance calculation based on disparate frames is easy to do with GPS co-ordinates for each which is a given and a Haversine computation. The trick is to find the nearest and the furthest frames from the scene catalog and either a ground truth can be relied upon such as Google Maps or Geodnet or preferably turning point frames can be identified from the video and such frames can be correlated with timestamps and velocity of the drone to find displacement in that direction.

9. Cumulation of the above in all directions traversed by the drone provides the total distance covered or as speed of drone x (flight time – hover time).

Operators for logic above become re-usable and must be curated into a library of the DVSA application or framework. Improvements to object detection and counting in a scene can be accomplished by better training and fine-tuning the corresponding model


Monday, December 22, 2025

 As drones evolve toward higher levels of autonomy, the need for contextual intelligence—beyond raw sensor fusion and rule-based planning—becomes increasingly critical. While these drones excel in structured environments using LiDAR, radar, and HD maps, they often lack the semantic depth and temporal foresight that a vision-driven analytics layer can provide. This is where our drone-based video sensing architecture, enriched by importance sampling, online overlays, and agentic retrieval, offers transformative potential: a contextual copilot that augments autonomy with memory, judgment, and adaptive feedback. As a non-invasive overlay over existing drone operations and platforms, this architecture brings down cost substantially with a dual approach of making on-board enhancements unnecessary with parallel and often uncontested capabilities in the overlay plane and using commodity and cloud infrastructure.

Drones operate with modular autonomy stacks: perception, localization, prediction, planning, and control. These modules rely heavily on real-time sensor input and preloaded maps, which can falter in dynamic or degraded conditions—poor visibility, occlusions, or unexpected traffic behavior. Our system introduces a complementary layer: a selective sampling engine that curates high-value video frames from vehicle-mounted or aerial cameras, forming a spatiotemporal catalog of environmental states and trajectory outcomes. This catalog becomes a living memory of the tour, encoding not just what was seen, but how the drone responded and what alternatives existed.

By applying importance sampling, our copilot prioritizes frames with semantic richness—intersections, merges, pedestrian zones, or adverse weather—creating a dense vector space of contextually significant moments. These vectors are indexed by time, location, and scenario type, enabling retrospective analysis and predictive planning. For example, if a drone needs to calculate distance to a detour waypoint, this could help with similar geometry, overlay ground data, and suggest trajectory adjustments based on historical success rates.

This retrieval is powered by agentic query framing, where the copilot interprets system or user intent—“What’s the safest merge strategy here?” or “How did similar vehicles handle this turn during rain?”—and matches it against cataloged vectors and online traffic feeds. The result is a semantic response, not just a path: a recommendation grounded in prior information, enriched by real-time data, and tailored to current conditions.

Our analytics framework respects both autonomous and non-autonomous drone or swarm architectures, acting as a non-invasive overlay that feeds contextual insights into the planning module. It does not replace the planner—it informs it, offering scores, grounded preferences, and fallback strategies when primary sensors degrade.

Moreover, our system’s integration with online maps and traffic information allows for enriched drone video sensing applications. By leveraging standard 100m high point of reference for aerial images adjusted from online satellite maps of urban scenes, we detect objects that help beyond what custom models are trained for. In addition, the use of catalogued objects, grounded truth, and commodity models for analysis, we make this cost-effective. With our architecture offering a plug-and-play intelligence layer, this help drones to evolve from perceive and plan to remember, compare and adapt which is aligned with the future of agentic mobility


Sunday, December 21, 2025

 Absolute Difference Between Maximum and Minimum K Elements

You are given an integer array nums and an integer k.

Find the absolute difference between:

the sum of the k largest elements in the array; and

the sum of the k smallest elements in the array.

Return an integer denoting this difference.

Example 1:

Input: nums = [5,2,2,4], k = 2

Output: 5

Explanation:

The k = 2 largest elements are 4 and 5. Their sum is 4 + 5 = 9.

The k = 2 smallest elements are 2 and 2. Their sum is 2 + 2 = 4.

The absolute difference is abs(9 - 4) = 5.

Example 2:

Input: nums = [100], k = 1

Output: 0

Explanation:

The largest element is 100.

The smallest element is 100.

The absolute difference is abs(100 - 100) = 0.

Constraints:

1 <= n == nums.length <= 100

1 <= nums[i] <= 100

1 <= k <= n

import java.util.ArrayList;

import java.util.Arrays;

import java.util.Collections;

import java.util.List;

class Solution {

    public int absDifference(int[] nums, int k) {

        int[] sortedNums = IntStream.of(nums)

                                   .boxed()

                                   .sorted(Comparator.reverseOrder())

                                   .mapToInt(Integer::intValue)

                                   .toArray();

        long max = 0;

        long min = 0;

        for (int i = 0; i < k; i++) {

            max += (long) sortedNums[i];

        }

        for (int i = nums.length - 1; i >= nums.length - k; i--) {

            min += (long) sortedNums[i];

        }

        return (int) Math.abs(max - min);

    }

}

994 / 994 testcases passed


Saturday, December 20, 2025

 Many of the drone vision analytics queries are about objects located in a scene. For example, a search for a “parking garage” in a scene should yield a result with a clipped image showing the garage.  

As a multimodal search, this does not always accurately result in the correct answer but a few techniques can help. This article list those. 

  1. When the scenes are vectorized frame by frame, they could also be analyzed to detect as many objects as possible along with their bounding boxes and saved with the scenes as documents with id, vector, captions, title, location, bounding box and tags. 

  1. The search over these accumulated scenes and objects can make use of various search options to narrow down the search. For example: 

  1. Create a vector from the text: 

search_text = "parking garage" 

vector_query = VectorizableTextQuery(text=search_text, exhaustive=True, k_nearest_neighbors=50, fields="vector", weight=0.5) 

results = dest_search_client.search( 

    search_text=search_text, 

    vector_queries=[vector_query], 

    query_type=QueryType.SEMANTIC, 

    select=["id", "description","vector"], 

    filter = f"description ne null and search.ismatch('{search_text}', 'description')", 

    semantic_configuration_name="mysemantic", 

    query_caption=QueryCaptionType.EXTRACTIVE, 

    query_answer=QueryAnswerType.EXTRACTIVE, 

    top=10, 

) 

  1. use semantic configuration 

  1. Semantic configuration leverages the text based content in the fields such as title, description and tags for keyword and semantic search. 

  1. Specify the Hierarchical Navigable Small World (HNSW) search or Exhaustive KNN search as appropriate. The differences are that HNSW has high accuracy and low latency but might miss neighbors while exhaustive counts all neighbours at higher cost. Usually with large datasets, HNSW performs better 

  1. filter the results: 

  1. You can always leverage the text associated with the images to narrow down your results. 

  1. Even if the match is not at the top of the list, retrieving ten results as tensors can still be used in a subsequent clustering to find the centroid. 

These are some of the tips to make the results of a multimodal search more deterministic and high quality on a scale of 1 to 5.