Wednesday, December 25, 2024

 

AI for security

The ways in which AI is used for Security Research and vulnerability management depends a lot on human expertise as much as the risk management in AI deployments and keeping it trustworthy for everyone. As industry struggles to catch up with AI improvements, the AI-based tools are often pitched one against another which is showing significant number of defects and bringing into question the quality of the tools. LLM-as-a-judge is one such example to evaluate AI models for security. Among the risks faced by organizations, the most notable ones are GenAI, supply chain/third parties, phishing, exploited vulnerabilities, insider threat, and nation-state actors in that order. While there is growing confidence in managing risks in AI deployments, the listing of GenAI as a top concern is reflected in the widespread use of GenAI. There are no standards but there is a growing perception that AI legislations will help enhance safety and security. Most organizations are already reaping benefits of GenAI in their operations, so the ability to defend against AI threats is catching up. In the high-tech sector, there is a deep understanding of the challenges in securing this emerging technology while in other industry sectors, there is more concern for reputational risks of AI.

Safety and security flaws in the AI products are being addressed with a practice called AI Red Teaming – where organizations invite security researchers for an external unbiased review. This is highly effective, and the benefits of cross-company expertise are valuable. The AI assets inventory must be actively managed to make the most use of this best practice. Organizations that are engaged in AI Red Teaming have discovered a common pattern in the common vulnerabilities in AI applications with a simple majority of those falling in AI safety defects. The remainder comprises business logic errors, prompt injection, training data poisoning and sensitive information disclosure. Unlike traditional security vulnerabilities, it is rather difficult to gate the reporting of defects and presents a different risk profile. This might explain why the AI safety defect category is dominating other categories.

Companies without AI and automation face longer response times and higher breach costs. Copilots have gained popularity among security engineers across the board for the wealth of information at their fingertips. Agents, monitors, alerts, and dashboards are being sought after by those savvy to leverage automation and continuous monitoring. Budgets for AI projects including security are blooming as specific investments become deeper while others are being pulled back after less successful ventures or initial experiments.

AI powered vulnerability scanners, for example, quickly identify potential weak points in a system and AI is also helpful for reporting. There is a lot of time saved by AI from streamlining processes and report writing. All the details are still included, the tone is appropriate, and the review is often easy. This allows security experts to focus on more complex and nuanced aspects of security testing.

#codingexercise: CodingExercise-12-25-2024.docx


Tuesday, December 24, 2024

 The best security programs are built around a defense-in-depth strategy. In order to continually strengthen every layer of their security posture, organizations must ensure continuous vulnerability detection throughout the SDLC, maximizing coverage from the earliest stages of development through deployment and beyond. The layered approach is not just helpful to visualize each step of the process but also stands out as a critical element on its own. Findings from one layer can inform and refine the effectiveness of the others. When put in a loop, insights become actionable for continuous vulnerability detection. An iterative process ensures that the security strategy is always evolving, becoming more robust and adaptive over time.

A vulnerability disclosure program is an effective response and is effective with rapid deployment. Source code review is helpful to code security and audit and must be continuously incorporated into automations and integrations. Programmatic on-demand penetration tests are helpful for pentest-as-a-service and go well with direct researcher access. Testing AI for safety and security can be achieved with AI Red Teaming and helps build intelligence and analytics. Time-bound offensive testing in the form of challenges goes well payment management. Continuous offensive testing can be incorporated into bug bounty programs and followed up with enhances security controls.

Everyone looks for a return while reducing risk but scaling the security program across multiple lines of business is a challenge. It is not a specialized discipline to be exclusively handled by a central security team although earmarking efforts do help but a culture to be fostered among everyone. Specific events like security assessments for product releases, bug bounty for continuous testing and a mechanism for third-party security researchers to submit vulnerabilities help significantly for external engagements.

As with all tracking of defects, certain timeless principles hold true. Information workers must have the ability to log into a platform portal, receive a notification when a bug is reported, and take remedial actions that can be tracked. As long as there is workflow, and information workers can contribute to it, there is no point of failure, and the state progresses continuously on the work items. The capabilities of ITSM, ITBM, ITOM and CMDB are useful for security vulnerabilities as well. These acronyms denote the techniques for situations such as:

1. If you have a desired state you want to transition to, use a workflow,

2. If you have a problem, open a service ticket.

3. If you want orchestration and subscribe to events, use events monitoring and alerts.

4. If you want a logical model of inventory, use a configuration management database.

Lastly, training, and self-paced learning programs as well as campaigns during company events and executive endorsements help as always.


Monday, December 23, 2024

 This is a summary of the book titled “Superconvergence” written by Jamie Metzl and published by Hachette Book Group USA in 2024. The author is a biotech expert who explores the future with advances in biotechnology, genetics, and AI. He covers the gamut of healthcare, industries and private life and the hopes and threats that people can expect. He reminds us that the good of the people and the planet depends on how we embrace our responsibilities. The advances give us unprecedented power and we will use them to reshape more sophisticated and personalized experiences for ourselves. Biotechnology will be indispensable for feeding the world. Even industrial farming will change. A global circular economy is emerging where we source our materials from plants and then recycle or reuse them. Among the threats, we can count DIY biohacking, antibiotic resistant superbugs.

Advances in AI, genetics, and biotechnology are giving humanity "godlike" powers, transforming the relationship between humans and the biological world. Scientists have made progress in synthesizing life, with researchers collaborating across universities to synthesize 16 chromosomes found in baker's yeast and inserting these genes into living cells. They are also working on creating new amino acid sequences and implanting them into cells to produce new proteins. AI will reshape health care, providing more sophisticated, personalized prevention and treatment. In the future, healthcare practitioners, individuals, and AI systems will work together to improve health outcomes. AI tools will enable healthcare providers to offer individualized care based on patient's electronic health record and other health and biological data. AI systems could also suggest individualized medical treatments and health interventions, such as personalized medications and cancer vaccines. Expanding health care with AI could have significant positive effects on health and longevity, but it must be mitigated to avoid negative outcomes like privacy violations and false positives.

Biotechnology is set to become indispensable for feeding the world, as it has been used to genetically modify agricultural plants since ancient times. As the global population continues to grow, it is crucial to consider sustainable agriculture methods that use less land and less land. Researchers predict that consumption of domesticated crops will rise 50% by 2050, requiring the conversion of vast stretches of wild areas into farmland, water use, and climate change impacts. Bioengineered crops, such as synthetic microbial soil communities, could help feed future humans while reducing dependence on chemical fertilizers. Industrial farming will need to be transformed to meet the growing demand for meat, as the consumption of land-based animals will rise from 340 billion kilograms to 460 to 570 billion tons by 2050. Alternative solutions include genetically engineered cows, salmon, pigs, and animals that can tolerate extreme temperatures. Lab-grown meat, created by scientist Mark Post, may not seem strange in the future, and companies are exploring innovative ways to grow cultivated meat scalably.

The future of sustainable grocery stores will likely see an increase in plant-based meat alternatives, reducing climate footprints and providing affordable nutrition. A global circular bioeconomy is emerging, where manufacturers source materials from plants and recycle or reuse them. This shift away from extractive capitalism towards a circular bioeconomy involves bioengineered plant materials, replacing fossil fuels and valuing waste products as raw materials. Investing in biotechnology can boost the global economy and help nations transition from fossil fuels to sustainable biofuels. Researchers are currently bioengineering biofuels through genetic engineering of plants, such as CelA, which facilitates the breakdown of plants into simple sugars. The UK government, China, India, and many African countries are developing strategies for the bioeconomy. However, mitigating potential risks and navigating challenges of embracing a new economic model is crucial.

The rapid development of biotech, genetics, and AI has the potential to make the world worse if not carefully managed. The rise of "do-it-yourself" biology and AI modeling tools has led to individuals sharing knowledge and tools, creating "biohacker spaces." Society must prepare for potential harms and mistakes, and a global effort is needed to minimize potential harms. The interconnectedness of all people and the health of our planet must be acknowledged. The OneShared.World movement, which promotes the democratic expression of common humanity, is a step in the right direction. Uniting humanity in the coming years is crucial.


Sunday, December 22, 2024

 AI Safety and security are distinct concerns. AI safety focuses on preventing AI systems from generating harmful content, from instructions for creating weapons to offensive language and inappropriate images and demonstrating bias, misrepresentations and hallucinations. It aims to ensure responsible use of AI and adherence to ethical standards. AI security involves testing AI systems with the goal of preventing bad actors from abusing AI to compromise the confidentiality, integrity, or availability of the systems the AI is hosted in.

These are significant concerns that have driven organizations to seek third-party testing. More than half of the AI vulnerability reports submitted pertain to AI safety. These issues often have a lower barrier to entry for valid reporting and present a different risk profile compared to traditional security vulnerabilities. Bounties for AI Safety are lower than those for AI security. The volume of reports is higher and one of top five reported vulnerabilities with the business logic errors, prompt injection, sensitive information disclosure, and training data poisoning constituting the remainder. Prompt injection is a new threat and unlike the perceived dismissals of it as a threat to get the model to say something it shouldn’t, it can be used by attackers to disclose a victim’s entire chat history, files and objects which leads to far greater security risks.

The recommendations for AI safety and security align with the well-recognized charter of a defense-in-depth strategy with fortified security posture at every layer and continuous vulnerability testing throughout software development lifecycle. These include establishing continuous testing, evaluation, verification, and validation, throughout the AI model lifecycle, providing regular executive metrics and updates on AI model functionality, security, reliability and robustness, and regularly scanning, and updating the underlying infrastructure and software for vulnerabilities. There are also some geographical compliance requirements from the mandates issued by country, state or government-specific agencies. In these cases, the regulations might exist around specific AI features, such as facial recognition, and employment-related systems. Organizations do well when they establish an AI governance framework outlining roles, responsibilities, and ethical considerations, including incident response planning and risk management. All employees and users of AI could be trained on ethics, responsibilities, legal issues, AI security risks, best practices including warranty, license, and copyright. Ultimately, this strives to establish a culture of open and transparent communication on the organization’s use of predictive or generative AI.


Saturday, December 21, 2024

 

Security and Vulnerability Management in Infrastructure Engineering has become more specialized than ever. In this article, we explore some of the contemporary practices and emerging trends.

Cyberthreats have been alarming as technology capabilities grow for enterprises but with AI deployments and with AI-powered threat actors now mainstream, the digital threat landscape is growing and changing faster than ever. Just a few years ago, chatbots and copilots gained popularity, organizations were contending with OWASP for web-based applications and connectivity from mobile applications. The OWASP top 10 document identifies the most critical security risks to web applications. With AI being so data voracious and models being so lightweight and hosted even in a browser on a mobile device, researchers are discovering new impact every day. A defense-in-depth strategy with fortified security posture at every layer and continuous vulnerability testing throughout software development lifecycle, has become a mainstream response against these threats.

Human-powered AI-enabled security testing remains vital where vulnerabilities scanners fall short. The security researcher community has played a phenomenal role in this area, constantly upgrading their skills, and delivering ongoing value and even gaining the trust of risk-averse organizations.  Companies in turn are defining their vulnerability reporting program and bug bounty awards in compliance with the Department of Justice safe harbor guidelines.

Researchers and security experts are both aware that Generative AI is one of the most significant risks impacting organizations today, particularly with securing data integrity. As they upskill their AI prowess, the AI testing engagements are gaining shape. The top vulnerability reported  to a bug bounty program is cross-site scripting aka XSS and for penetration testing is misconfiguration. Usually, researchers target bug bounty programs that focus more on real-world attack vectors while security experts target penetration testing that uncovers more systemic and architectural vulnerabilities. High-end security initiatives by organizations have well-defined engagements with these information workers and usually involving a broad scope and a select team of trusted researchers. The results also speak for themselves with over 30% of valid vulnerability submissions rated to be high or critical.

With the recent impact of CrowdStrike’s software causing windows machines all across the world to fail and companies like Delta Airlines suing for five hundred million dollars, the efforts to reduce common vulnerabilities in production has never been emphasized more. Those companies that are technologically savvy are getting far fewer reports for OWASP top 10 security risks than the industry average


Friday, December 20, 2024

 

Many of the BigTech companies maintain datacenters for their private use. Changes to the data center infrastructure to support AI workloads are not covered in as much detail as the public cloud does. Advances in machine learning, increased computational power, and the increased availability of and reliance on data is not limited to the public cloud. This article dives into the changes in so-called “on-premises” infrastructure.

AI has been integrated into a diverse range of industries with key applications standing out in each such as personalized learning experiences in Education, autonomous vehicle navigation in Automotive, predictive maintenance in manufacturing, design optimization in architecture, fraud detection in finance, demand forecasting in retail, improved diagnostics, and monitoring in healthcare, and natural language processing in Technology.

Most AI deployments can be split into training and inference. Inference is dependent on training and while it can cater to mobile and edge computing with new data, training poses heavy demands on both computational power and data ingestion. Lately, data ingestion itself has become so computationally expensive that hardware is needed to evolve. Servers are increasingly being upgraded with GPUs, TPUs, or NPUs along with traditional CPUs. Graphis processing units aka GPUs were initially developed for high-quality video graphics but are now used for high volumes of parallel computing tasks. A Neural Processing Unit has specialized hardware accelerators. A Tensor Processing Unit has a Specialized Application specific Integrated Circuit to increase efficiency of AI workloads.    The more intensive the algorithm or the higher the throughput of data ingestion, the more the computational demand. This results in about 400 watts of power consumption per server and about 10KW for high-end performance servers. Data center cooling with traditional fans no longer suffices and liquid cooling is called for.

Enterprises that want to introduce AI to their IT on-premises infrastructure quickly discover the same ballooning costs that they were afraid of with the public cloud, and they don’t just come from real-estate and energy consumption but also more inventory both hardware and software. While traditional on-premises data were housed in dedicated storage systems, the rise of AI workloads is posing more DevOps requirements and challenges that can only be addressed by exploding number of data and code pipelines. Collocation of data is a significant challenge both in terms of networking as well as the volume of data in transit. The concept of Lakehouse architecture is finding popularity on-premises as well. Large-scale collocation sites are becoming more cost-effective than on-site datacenter improvements. The gray area between workload and infrastructure for AI versus consolidation in datacenters is still an ongoing exploration. Collocation datacenters are gaining popularity because they pack the highest physical security, adequate renewable power, scalability when density increases, low-latency connectivity options, dynamic cooling technology, and backup and disaster recovery options. These are even dubbed as built-to-suit datacenters. Hyperconvergence in these datacenters is yet to be realized but there are significant improvements in terms of redesigning rack placement and in-rack cooling efficiency. These efficiencies drive up the mean-time-between-failures.

There’s talk about even more efficiency required for quantum computing and these will definitely drive up the demands on computational power, data collocation, network capable of supporting billions of transactions per hour and cooling efficiency. Sustainability is also an important perspective driven by and sponsored by both the market and the leadership. Hot air from the datacenter for instance finds new applications for comfort as does solar cells and battery storage.

The only constant seems to be the demand on the people for understanding the infrastructure behind high-performance computation.

Previous article: IaCResolutionsPart219.docx

 

Thursday, December 19, 2024

 From the previous article, RAG-based chatbots can be evaluated by LLM-as-a-judge to agree on human grading on over 80% of judgements if the following can be maintained: using a 1-5 grading scale, use GPT-3.5 to save costs and when you have one grading example per score and use GPT-4 as an LLM judge when you have no examples to understand grading rules. Emitting the metrics about correctness, comprehensiveness, and readability provides justification that becomes valuable. Whether we use a GPT-4, GPT-3.5 or human judgement, the composite scores can be used to tell results apart quantitatively.

Together with the data and the evaluations, a chatbot can be successfully made ready for production but that is not all. A data and ML pipeline are also required. The full orchestration workflow involves data ingestion pipeline, data science and data analysis. The lakehouse paradigm combines key capabilities of data lakes and data warehouses. It expedites the creation of a pipeline from a few weeks to one week at most. The use of data personas on the same lakehouse reduces overhead and improves handoff to various disciplines.

Let us explore these stages of the workflow. Data Ingestion is best served by a medallion architecture. The raw data is stored in a Bronze table. Then it is enriched, and a Delta lake silver table is created. This makes it easy to reprocess the data as it guarantees atomicity with every operation. Schema is enforced and constraints ensure data quality. Filtering out unnecessary data also makes the delta table more refined.

The data science portion consists of three major parts: exploratory data analysis, a chatbot model and LLM-as-a-judge model. Models’ lifecycle management, including experimentation, reproducibility, deployment, and a central model registry can be accomplished with open-source platforms like MLflow. This also comes with programmability and user interface. With model lineage such as the experiment and the run that produced the model, its versioning and stage transitions from staging to production or archiving, and annotations, a complete picture is available in the data science step of the workflow.

Business Intelligence in terms of correlation, dashboards and reports can be facilitated with analytical SQL-like queries and materialized data views. For each query, different views of the data can be studied for the best representation which can then be directly promoted to dashboards, and this cuts down on redundancies and cost. Even a library like Plotly is sufficient to get a comprehensive and appealing visualization of the report. The data analysis is inherently a read-only system and independent of all activities for model training and testing.

A robust data environment with smooth ingestion, processing and retrieval by the whole team, the data collection and cleaning pipelines deployed using Delta tables, the model development and deployment using MLflow for simpler organization and powerful analytical dashboards complete the gamut for launching the project.


Wednesday, December 18, 2024

 From the previous article, RAG-based chatbots can be evaluated by LLM-as-a-judge to agree on human grading on over 80% of judgements if the following can be maintained: using a 1-5 grading scale, use GPT-3.5 to save costs and when you have one grading example per score and use GPT-4 as an LLM judge when you have no examples to understand grading rules. Emitting the metrics about correctness, comprehensiveness, and readability provides justification that becomes valuable. Whether we use a GPT-4, GPT-3.5 or human judgement, the composite scores can be used to tell results apart quantitatively.

Together with the data and the evaluations, a chatbot can be successfully made ready for production but that is not all. A data and ML pipeline are also required. The full orchestration workflow involves data ingestion pipeline, data science and data analysis. The lakehouse paradigm combines key capabilities of data lakes and data warehouses. It expedites the creation of a pipeline from a few weeks to one week at most. The use of data personas on the same lakehouse reduces overhead and improves handoff to various disciplines.

Let us explore these stages of the workflow. Data Ingestion is best served by a medallion architecture. The raw data is stored in a Bronze table. Then it is enriched, and a Delta lake silver table is created. This makes it easy to reprocess the data as it guarantees atomicity with every operation. Schema is enforced and constraints ensure data quality. Filtering out unnecessary data also makes the delta table more refined.

The data science portion consists of three major parts: exploratory data analysis, a chatbot model and LLM-as-a-judge model. Models’ lifecycle management, including experimentation, reproducibility, deployment, and a central model registry can be accomplished with open-source platforms like MLflow. This also comes with programmability and user interface. With model lineage such as the experiment and the run that produced the model, its versioning and stage transitions from staging to production or archiving, and annotations, a complete picture is available in the data science step of the workflow.

Business Intelligence in terms of correlation, dashboards and reports can be facilitated with analytical SQL-like queries and materialized data views. For each query, different views of the data can be studied for the best representation which can then be directly promoted to dashboards, and this cuts down on redundancies and cost. Even a library like Plotly is sufficient to get a comprehensive and appealing visualization of the report. The data analysis is inherently a read-only system and independent of all activities for model training and testing.

A robust data environment with smooth ingestion, processing and retrieval by the whole team, the data collection and cleaning pipelines deployed using Delta tables, the model development and deployment using MLflow for simpler organization and powerful analytical dashboards complete the gamut for launching the project.


Tuesday, December 17, 2024

 

Constant evaluation and monitoring of deployed large language models and generative AI applications are important because both the data and the environment might vary. There can be shifts in performance, accuracy or even the emergence of biases. Continuous monitoring helps with early detection and prompt responses, which in turn makes the models’ outputs relevant, appropriate, and effective. Benchmarks help to evaluate models but the variations in results can be large. This stems from a lack of ground truth. For example, it is difficult to evaluate summarization models  based on traditional NLP metrics such as BLEU, ROUGE etc. because summaries generated might have completely different words or word order. Comprehensive evaluation standards are elusive for LLMs and reliance on human judgment can be costly and time-consuming. The novel trend of “LLMs as a judge” still leaves unanswered questions about reflecting human preferences in terms of correctness, readability and comprehensiveness of the answers, reliability and reusability on different metrics, use of different grading scales by different frameworks and the applicability of the same evaluation metric across diverse use cases.

Since chatbots are common applications of LLM, an example of evaluating a chatbot now follows. The underlying principle in a chatbot is Retrieval Augmented Generation and it is quickly becoming the industry standard for developing chatbots. As with all LLM and AI models, it is only as effective as the data which in this case is the vector store aka knowledge base. The LLM could be newer GPT3.5 or GPT4 to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge. Evaluating the quality of chatbot responses must take into account both the knowledge base and the model involved. LLM-as-a-judge fits this bill for automated evaluation but as noted earlier, it may not be at par with human grading, might require several auto-evaluation samples and may have different responsiveness to different chatbot prompts. Slight variations in the prompt or problem can drastically affect its performance.

RAG-based chatbots can be evaluated by LLM-as-a-judge to agree on human grading on over 80% of judgements if the following can be maintained: using a 1-5 grading scale, use GPT-3.5 to save costs and when you have one grading example per score and use GPT-4 as an LLM judge when you have no examples to understand grading rules.

The initial evaluation dataset can be formed from say 100 chatbot prompts and context from the domain in terms of (chunks of ) documents that are relevant to the question based on say F-score. Using the evaluation dataset, different language models can be used to generate answers and stored in question-context-answers pairs in a dataset called “answer sheets”. Then given the answer sheets, various LLMs can be used to generate grades and reasoning for grades. Each grade can be a composite score with weighted contributions for correctness (mostly), comprehensiveness and readability  in equal proportions of the remaining weight. A good choice of hyperparameters is equally applicable to LLM-as-a-judge and this could include low temperature of say 0.1 to ensure reproducibility, single-answer grading instead of pairwise comparison, chain of thoughts to let the LLM reason about the grading process before giving the final score and examples in grading for each score value on each of the three factors. Factors that are difficult to measure quantitatively include helpfulness, depth, creativity etc. Emitting the metrics about correctness, comprehensiveness, and readability provides justification that becomes valuable. Whether we use a GPT-4, GPT-3.5 or human judgement, the composite scores can be used to tell results apart quantitatively. The overall workflow for the creation of LLM-as-a-judge is also similar to the data preparation, indexing relevant data, information retrieval and response generation for the chatbots themselves.

Monday, December 16, 2024

 Large Language Models (LLMs) have the potential to significantly improve organizations' workforce and customer experiences. By addressing tasks that currently occupy 60%-70% of employees' time, LLMs can significantly reduce the time spent on background research, data analysis, and document writing. Additionally, these technologies can significantly reduce the time for new workers to achieve full productivity. However, organizations must first rethink the management of unstructured information assets and mitigate issues of bias and accuracy. This is why many organizations are focusing on internal applications, where a limited scope provides opportunities for better information access and human oversight. These applications, aligned with core capabilities already within the organization, have the potential to deliver real and immediate value while LLMs and their supporting technologies continue to evolve and mature. Examples of applications include automated analysis of product reviews, inventory management, education, financial services, travel and hospitality, healthcare and life sciences, insurance, technology and manufacturing, and media and entertainment.

The use of structured data in GenAI applications can enhance their quality such as in the case of a travel planning chatbot. Such an application would use a vector search and feature-and-function serving building blocks to serve personalized user preferences and budget and hotel information often involving agents for programmatic access to external data sources. To access data and functions as real-time endpoints, federated and universal access control could be used. Models can be exposed as Python functions to compute features on-demand. Such functions can be registered with a catalog for access control and encoded in a directed acyclic graph to compute and serve features as a REST endpoint.

To serve structured data to real-time AI applications, precomputed data needs to be deployed to operational databases, such as DynamoDB and Cosmos DB as in the case of AWS and Azure public clouds respectively. Synchronization of precomputed features to a low-latency data format is required. Fine-tuning a foundation model allows for more deeply personalized models, requiring an underlying architecture to ensure secure and accurate data access.

Most organizations do well with an Intelligence Platform that helps with model fine-tuning, registration for access control, secure and efficient data sharing across different platforms, clouds and regions for faster distribution worldwide, and optimized LLM serving for improved performance. The choice of such Intelligence platforms should be such that it is simple infrastructure for fine-tuning models, ensuring traceability from models to datasets, and enabling faster throughput and latency improvements compared to traditional LLM serving methods.

Software-as-a-service LLMs aka SaaS LLMs are way more costly than those developed and hosted using foundational models in workspaces either on-premises or in the cloud because they need to address all the use cases including a general chatbot. The generality incurs cost. For a more specific use case, a much smaller prompt suffices and it can also be fine-tuned by baking the instructions and expected structure into the model itself. Inference costs can also rise with the number of input and output tokens and in the case of SaaS services, they are charged per token. Specific use case models can even be implemented with 2 engineers in 1 month with a few thousand dollars of compute for training and experimentation and tested by 4 human evaluators and an initial set of evaluation examples.

SaaS LLMs could be a matter of convenience. Developing a model from scratch often involves significant commitment both in terms of data and computational resources such as pre-training. Unlike fine-tuning, pre-training is a process of training a language model on a large corpus of data without using any prior knowledge or weights from an existing model. This scenario makes sense when the data is quite different from what off-the-shelf LLMs are trained on or where the domain is rather specialized when compared to everyday language or there must be full control over training data in terms of security, privacy, fit and finish for the model’s foundational knowledge base or when there are business justifications to avoid available LLMs altogether.

Organizations must plan for the significant commitment and sophisticated tooling required for this. Libraries like PyTorch FSDP and Deepspeed are required for their distributed training capabilities when pretraining an LLM from scratch. Large-scale data preprocessing is required and involves distributed frameworks and infrastructure that can handle scale in data engineering. Training of an LLM cannot commence without a set of optimal hyperparameters. Since training involves high costs from long-running GPU jobs, resource utilization must be maximized. Even the length of time for training might be quite large which makes GPU failures more likely than normal load. Close monitoring of the training process is essential. Saving model checkpoints regularly and evaluating validation sets acts as safeguards.

Constant evaluation and monitoring of deployed large language models and generative AI applications are important because both the data and the environment might vary. There can be shifts in performance, accuracy or even the emergence of biases. Continuous monitoring helps with early detection and prompt responses, which in turn makes the models’ outputs relevant, appropriate and effective. Benchmarks help to evaluate models but the variations in results can be large. This stems from a lack of ground truth. For example, it is difficult to evaluate summarization models based on traditional NLP metrics such as BLEU, ROUGE etc because summaries generated might have completely different words or word ordes. Comprehensive evaluation standards are elusive for LLMs and reliance on human judgment can be costly and time-consuming. The novel trend of “LLMs as a judge” still leaves unanswered questions about reflecting human preferences in terms of correctness, readability and comprehensiveness of the answers, reliability and reusability on different metrics, use of different grading scales by different frameworks and the applicability of the same evaluation metric across diverse use cases.

Finally, the system must be simplified for use with model serving to manage, govern and access models via unified endpoints to handle specific LLM requests.


Sunday, December 15, 2024

 This is a summary of the book titled “The Learning Mindset: Combining Human Competencies with Technology to thrive” written by Katja Schipperheijn and published by Kogan Page in 2024. The title refers to continuous learning and adaptability which the author argues is beneficial across ages, gender, and demography. She dispels myths about learning capacity and suggests that the visionary leaders aka “LearnScapers” will combine human competencies such as curiosity, empathy, critical thinking and learning mindset with advancing AI which is catalyzing the rate of innovations across domains. Engaged citizens maintain a learning mindset and it distinguishes them in their workplace which results in organization’s growth and culture. Combined with strategic thinking and social learning, the status quo can be challenged, and a culture of trust, respect and communication can be cultivated.

Artificial intelligence (AI) presents a challenge for people, companies, and governments, and adopting a learning and growth mindset is crucial for survival and thrive in this ever-faster-changing world. Learning encompasses personal, cultural, social, and experiential influences, and managing emotions can optimize learning potential. Differentiating between training and learning, such as unconscious learning, can boost adaptability and creativity. Mental complexity and motivation can increase with age, and focusing on developing a growth mindset can enrich personal and collaborative learning experiences. As AI evolves, people must enforce human-centric values and push for bias-free algorithms. Ensuring accountability, transparency, and security in AI applications is essential to foster trust and uphold fundamental human rights. The evolving legal environment presents challenges in creating frameworks for AI use, particularly regarding privacy and ethical standards. To remain relevant in rapidly changing fields, workers must shift from a single-skill focus to a multi-skilled approach, fostering curiosity and self-directed learning. Learning extends beyond formal education, encompassing digital worlds where informal skills flourish.

In the era of AI, unique competencies like curiosity, critical thinking, and empathy will become more important in the workplace. These competencies complement a learning mindset, encouraging resilience and a positive outlook on challenges and innovation. Consilience, an interdisciplinary approach that combines insights from various fields, fosters creativity and problem-solving. A learning mindset promotes adaptability, open-mindedness, and continuous improvement, breaking down disciplinary barriers.

Adopting a learning influencer role involves setting an example and encouraging a culture of trust and feedback. Embracing diverse perspectives and challenging the status quo can unlock creative potential within teams. Implementing initiatives like hackathons and open dialogue can foster a dynamic workforce ready to tackle future challenges collaboratively.

Optimizing team efficiency through strategic communication and social learning can prevent frustration and improve team dynamics. Efficient time management and identifying team member expertise are essential in rapidly changing environments. A supportive environment helps build trust and learn from one another, enhancing cohesion. Nielsen's "1-9-90 rule" can help engage team members by identifying different participation levels.

To foster a learning culture, leaders should challenge the status quo, balance innovation with effective team management, and create a culture that encourages collaboration and autonomy. Influential learning leaders combine inspiration, authenticity, empathy, and communication to foster strong team dynamics, trust, respect, and communication. They establish clear goals, embrace open dialogue, and empower team members to contribute ideas and take ownership of their roles. They view problems as opportunities and encourage an experimental mindset, using strategic frameworks like design thinking and scenario planning. As a "LearnScaper" leader, they build an ecosystem that encourages creativity and human-AI interactions, prioritizing the integration of humans and machines. By fostering a learning mindset, leaders can ensure adequate information flow within the organization and nurture employees' personal growth and adaptation.


Friday, December 13, 2024

 Constant evaluation and monitoring of deployed large language models and generative AI applications are important because both the data and the environment might vary. There can be shifts in performance, accuracy or even the emergence of biases. Continuous monitoring helps with early detection and prompt responses, which in turn makes the models’ outputs relevant, appropriate, and effective. Benchmarks help to evaluate models but the variations in results can be large. This stems from a lack of ground truth. For example, it is difficult to evaluate summarization models based on traditional NLP metrics such as BLEU, ROUGE etc. because summaries generated might have completely different words or word order. Comprehensive evaluation standards are elusive for LLMs and reliance on human judgment can be costly and time-consuming. The novel trend of “LLMs as a judge” still leaves unanswered questions about reflecting human preferences in terms of correctness, readability and comprehensiveness of the answers, reliability and reusability on different metrics, use of different grading scales by different frameworks and the applicability of the same evaluation metric across diverse use cases.

Since chatbots are common applications of LLM, an example of evaluating a chatbot now follows. The underlying principle in a chatbot is Retrieval Augmented Generation and it is quickly becoming the industry standard for developing chatbots. As with all LLM and AI models, it is only as effective as the data which in this case is the vector store aka knowledge base. The LLM could be newer GPT3.5 or GPT4 to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge. Evaluating the quality of chatbot responses must take into account both the knowledge base and the model involved. LLM-as-a-judge fits this bill for automated evaluation but as noted earlier, it may not be at par with human grading, might require several auto-evaluation samples and may have different responsiveness to different chatbot prompts. Slight variations in the prompt or problem can drastically affect its performance.

RAG-based chatbots can be evaluated by LLM-as-a-judge to agree on human grading on over 80% of judgements if the following can be maintained: using a 1-5 grading scale, use GPT-3.5 to save costs and when you have one grading example per score and use GPT-4 as an LLM judge when you have no examples to understand grading rules.

The initial evaluation dataset can be formed from say 100 chatbot prompts and context from the domain in terms of (chunks of ) documents that are relevant to the question based on say F-score. Using the evaluation dataset, different language models can be used to generate answers and stored in question-context-answers pairs in a dataset called “answer sheets”. Then given the answer sheets, various LLMs can be used to generate grades and reasoning for grades. Each grade can be a composite score with weighted contributions for correctness (mostly), comprehensiveness and readability in equal proportions of the remaining weight. A good choice of hyperparameters is equally applicable to LLM-as-a-judge and this could include low temperature of say 0.1 to ensure reproducibility, single-answer grading instead of pairwise comparison, chain of thoughts to let the LLM reason about the grading process before giving the final score and examples in grading for each score value on each of the three factors. Factors that are difficult to measure quantitatively include helpfulness, depth, creativity etc. Emitting the metrics about correctness, comprehensiveness, and readability provides justification that becomes valuable. Whether we use a GPT-4, GPT-3.5 or human judgement, the composite scores can be used to tell results apart quantitatively. The overall workflow for the creation of LLM-as-a-judge is also similar to the data preparation, indexing relevant data, information retrieval and response generation for the chatbots themselves.


Thursday, December 12, 2024

 

Software-as-a-service LLMs aka SaaS LLMs are way more costly than those developed and hosted using foundational models in workspaces either on-premises or in the cloud because they need to address all the use cases including a general chatbot. The generality incurs cost. For a more specific use case, a much smaller prompt suffices and it can also be fine-tuned by baking the instructions and expected structure into the model itself. Inference costs can also rise with the number of input and output tokens and in the case of SaaS services, they are charged per token. Specific use case models can even be implemented with 2 engineers in 1 month with a few thousand dollars of compute for training and experimentation and tested by 4 human evaluators and an initial set of evaluation examples.

SaaS LLMs could be a matter of convenience. Developing a model from scratch often involves significant commitment both in terms of data and computational resources such as pre-training. Unlike fine-tuning, pre-training is a process of training a language model on a large corpus of data without using any prior knowledge or weights from an existing model. This scenario makes sense when the data is quite different from what off-the-shelf LLMs are trained on or where the domain is rather specialized when compared to everyday language or there must be full control over training data in terms of security, privacy, fit and finish for the model’s foundational knowledge base or when there are business justifications to avoid available LLMs altogether.

Organizations must plan for the significant commitment and sophisticated tooling required for this.  Libraries  like PyTorch FSDP and Deepspeed are required for their distributed training capabilities when pretraining an LLM from scratch.  Large-scale data preprocessing is required and involves distributed frameworks and infrastructure that can handle scale in data engineering. Training of an LLM cannot commence without a set of optimal hyperparameters. Since training involves high costs from long-running GPU jobs, resource utilization must be maximized. Even the length of time for training might be quite large which makes GPU failures more likely than normal load. Close monitoring of the training process is essential. Saving model checkpoints regularly and evaluating validation sets acts as safeguards.

Constant evaluation and monitoring of deployed large language models and generative AI applications are important because both the data and the environment might vary. There can be shifts in performance, accuracy or even the emergence of biases. Continuous monitoring helps with early detection and prompt responses, which in turn makes the models’ outputs relevant, appropriate and effective.  Benchmarks help to evaluate models but the variations in results can be large. This stems from a lack of ground truth. For example, it is difficult to evaluate summarization models  based on traditional NLP metrics such as BLEU, ROUGE etc because summaries generated might have completely different words or word ordes. Comprehensive evaluation standards are elusive for LLMs and reliance on human judgment can be costly and time-consuming. The novel trend of “LLMs as a judge” still leaves unanswered questions about reflecting human preferences in terms of correctness, readability and comprehensiveness of the answers, reliability and reusability on different metrics, use of different grading scales by different frameworks and the applicability of the same evaluation metric across diverse use cases.

Finally, the system must be simplified for use with model serving to manage, govern and access models via unified endpoints to handle specific LLM requests.

Wednesday, December 11, 2024

 continued from previous post...

A fine-grained mixture-of-experts (MoE) architecture typically works better than any single model. Inference efficiency and model quality are typically in tension: bigger models typically reach higher quality, but smaller models are more efficient for inference. Using MoE architecture makes it possible to attain better tradeoffs between model quality and inference efficiency than dense models typically achieve.

Companies in the foundational stages of adopting generative AI technology often lack a clear strategy, use cases, and access to data scientists. To start, companies can use off-the-shelf Learning Logistic Models (LLMs) to experiment with AI tools and workflows. This allows employees to craft specialized prompts and workflows, helping leaders understand their strengths and weaknesses. LLMs can also be used as a judge to evaluate responses in practical applications, such as sifting through product reviews.

Large Language Models (LLMs) have the potential to significantly improve organizations' workforce and customer experiences. By addressing tasks that currently occupy 60%-70% of employees' time, LLMs can significantly reduce the time spent on background research, data analysis, and document writing. Additionally, these technologies can significantly reduce the time for new workers to achieve full productivity. However, organizations must first rethink the management of unstructured information assets and mitigate issues of bias and accuracy. This is why many organizations are focusing on internal applications, where a limited scope provides opportunities for better information access and human oversight. These applications, aligned with core capabilities already within the organization, have the potential to deliver real and immediate value while LLMs and their supporting technologies continue to evolve and mature. Examples of applications include automated analysis of product reviews, inventory management, education, financial services, travel and hospitality, healthcare and life sciences, insurance, technology and manufacturing, and media and entertainment.

The use of structured data in GenAI applications can enhance their quality such as in the case of a travel planning chatbot. Such an application would use  a vector search and feature-and-function serving building blocks to serve personalized user preferences and budget and hotel information often involving agents for programmatic access to external data sources. To access data and functions as real-time endpoints, federated and universal access control could be used. Models can be exposed as Python functions to compute features on-demand. Such functions can be registered with a catalog for access control and encoded in a directed acyclic graph to compute and serve features as a REST endpoint.

To serve structured data to real-time AI applications, precomputed data needs to be deployed to operational databases, such as DynamoDB and Cosmos DB as in the case of AWS and Azure public clouds respectively. Synchronization of precomputed features to a low-latency data format is required. Fine-tuning a foundation model allows for more deeply personalized models, requiring an underlying architecture to ensure secure and accurate data access.

Most organizations do well with an Intelligence Platform that helps with model fine-tuning, registration for access control, secure and efficient data sharing across different platforms, clouds and regions for faster distribution worldwide, and optimized LLM serving for improved performance. The choice of such Intelligence platforms should be such that it is simple infrastructure for fine-tuning models, ensuring traceability from models to datasets, and enabling faster throughput and latency improvements compared to traditional LLM serving methods.


Tuesday, December 10, 2024

 There is a growing need for dynamic, dependable, and repeatable infrastructure as the scope of deployment expands from small footprint to cloud scale. With emerging technologies like Generative AI, the best practices for cloud deployment have not matured enough to create playbooks. Generative Artificial Intelligence (AI) refers to a subset of AI algorithms and models that can generate new and original content, such as images, text, music, or even entire virtual worlds. Unlike other AI models that rely on pre-existing data to make predictions or classifications, generative AI models create new content based on patterns and information they have learned from training data. Many organizations continue to face challenges to deploy these applications at production quality. The AI output must be accurate, governed and safe.

Data infrastructure trends that have become popular in the wake of Generative AI include data lakehouses which brings out the best of data lakes and data warehouses allowing for both storage and processing, vector databases for both storing and querying vectors, and the ecosystem for ETL, data pipelines and connectors facilitating input and output of data at scale and even supporting real-time ingestion. In terms of infrastructure for data engineering projects, customers usually get started on a roadmap that progressively builds a more mature data function. One of the approaches for drawing this roadmap that experts observe as repeated across deployment stamps involves building a data stack in distinct stages with a stack for every phase on this journey. While needs, level of sophistication, maturity of solutions, and budget determines the shape these stacks take, the four phases are more or less distinct and repeated across these endeavors. They are starters, growth, machine-learning and real-time. Customers begin with a starters stack where the essential function is to collect the data and often involve implementing a drain. A unified data layer in this stage significantly reduces engineering bottlenecks. A second stage is the growth stack which solves the problem of proliferation of data destinations and independent silos by centralizing data into a warehouse which also becomes a single source of truth for analytics. When this matures, customers want to move beyond historical analytics and into predictive analytics. At this stage, a data lake and machine learning toolset come handy to leverage unstructured data and mitigate problems proactively. The next and final frontier to address is the one that overcomes a challenge in this current stack which is that it is impossible to deliver personalized experiences in real-time.

Even though it is a shifting landscape, the AI models are largely language models and some serve as the foundation for layers for increasingly complex techniques and purpose. Foundation models commonly refer to large language models that have been trained over extensive datasets to be generally good at some task(chat, instruction following, code generation, etc.) and they largely follow in two categories: proprietary (such as Phi, GPT-3.5 and Gemini) and open source (such as Llama2-70B and DBRX). DBRX for its popularity with Databricks platform that is ubiquitously found on different public clouds are transformer-based decoder large language models that are trained using next-token prediction. There are benchmarks available to evaluate foundational models.

Many end-to-end LLM training pipeline are becoming more compute-efficient. This efficiency is the result of a number of improvements including better architecture, network changes, better optimizations, better tokenization and last but not the least – better pre-training data which has a substantial impact on model quality.


Monday, December 9, 2024

 There are N points (numbered from 0 to N−1) on a plane. Each point is colored either red ('R') or green ('G'). The K-th point is located at coordinates (X[K], Y[K]) and its color is colors[K]. No point lies on coordinates (0, 0).

We want to draw a circle centered on coordinates (0, 0), such that the number of red points and green points inside the circle is equal. What is the maximum number of points that can lie inside such a circle? Note that it is always possible to draw a circle with no points inside.

Write a function that, given two arrays of integers X, Y and a string colors, returns an integer specifying the maximum number of points inside a circle containing an equal number of red points and green points.

Examples:

1. Given X = [4, 0, 2, −2], Y = [4, 1, 2, −3] and colors = "RGRR", your function should return 2. The circle contains points (0, 1) and (2, 2), but not points (−2, −3) and (4, 4).

class Solution {

    public int solution(int[] X, int[] Y, String colors) {

        // find the maximum

        double max = Double.MIN_VALUE;

        int count = 0;

        for (int i = 0; i < X.length; i++)

        {

            double dist = X[i] * X[i] + Y[i] * Y[i];

            if (dist > max)

            {

                max = dist;

            }

        }

        for (double i = Math.sqrt(max) + 1; i > 0; i -= 0.1)

        {

            int r = 0;

            int g = 0;

            for (int j = 0; j < colors.length(); j++)

            {

                if (Math.sqrt(X[j] * X[j] + Y[j] * Y[j]) > i)

                {

                    continue;

                }

                if (colors.substring(j, j+1).equals("R")) {

                    r++;

                }

                else {

                    g++;

                }

            }

            if ( r == g && r > 0) {

                int min = r * 2;

                if (min > count)

                {

                    count = min;

                }

            }

        }

        return count;

    }

}

Compilation successful.

Example test: ([4, 0, 2, -2], [4, 1, 2, -3], 'RGRR')

OK

Example test: ([1, 1, -1, -1], [1, -1, 1, -1], 'RGRG')

OK

Example test: ([1, 0, 0], [0, 1, -1], 'GGR')

OK

Example test: ([5, -5, 5], [1, -1, -3], 'GRG')

OK

Example test: ([3000, -3000, 4100, -4100, -3000], [5000, -5000, 4100, -4100, 5000], 'RRGRG')

OK


Sunday, December 8, 2024

 This is a summary of the book titled “Stolen Focus: Why you can’t pay attention and how to think deeply again” written by Johann Hari and published by Crown in 2022. The author struggled with addiction to electronic devices and information overload, so he escaped to Cape Cod without internet enabled devices and embraced the “Cold turkey” method. His time gave him insights into focus that most others may not have experienced in this manner. He examines the issues surrounding human’s struggle to focus and proposes individual as well as societal solutions to attention crisis. He suggests that this valuable commodity can be reclaimed by entering a “flow state”. Some of the modern-day ailments include sleep deprivation are societal issues and they impede our ability to focus. Letting our mind wander can help us regain focus. Big Tech steals our data, focus and attention but somehow subtly passes the blame back to individuals via privacy notices. Other modern-day habits of poor or inefficient diet, exposure to environmental factors, and chemical imbalances also destroy focus. Even medications do not address these predicament we find ourselves in.

Humanity is facing an attention crisis due to the overwhelming amount of information available to us. A study by Sune Lehmann found that people's collective focus had been declining before the internet age, but the internet has accelerated its decline. The flood of information hinders the brain's ability to filter out irrelevant information and makes it less likely to understand complex topics. Multitasking is a myth, as it is actually "task-switching," which impairs focus in four ways: "the switch cost effect," "the screw-up effect," "the creativity drain," and "the diminished memory effect." To regain attention, individuals should enter a "flow state" and engage in activities that promote their well-being. Social media companies, like Instagram, use rewards to steal attention, as users engage with platforms to accumulate rewards, such as "hearts and likes," that represent social validation.

Psychologist Mihaly Csikszentmihalyi's work on the "flow state" suggests that finding a clear, meaningful goal can redirect personal focus. However, the popularity of reading books is decreasing due to the rise of social media and the need for shorter, bite-sized messages. To enter a flow state, choose a goal that is neither too far beyond one's abilities nor too easy.

Sleep deprivation is a societal issue, with people's sleep duration decreasing by 20% in the past century. Sleep is essential for the brain, as it removes metabolic waste and helps process waking-life emotions. To combat sleep deprivation, avoid chemically inducing sleep, avoid blue light-emitting screens, and limit exposure to artificial light at night.

Letting the mind wander can help regain focus by activating the "default-mode network" region of the brain, which helps make sense of the world, engage in creative problem-solving, and enable mental time travel. Daydreaming may seem like distracted thinking, but it is just a temporary solution to the problem of lost focus and mind-wandering.

Tristan Harris and Aza Raskin, two tech experts, have raised concerns about the ethics of social media platforms. Harris, who became Google's first "design ethicist," questioned the ethics of designing distractions to increase user engagement, which corroded human thinking. Raskin, creator of the "infinite scroll," calculated that his invention wasted enough user time to equate to 200,000 human life spans every day. Both Harris and Raskin left Google when they realized the company had no intention of changing its behavior meaningfully, as doing so would harm its bottom line. They share concerns about Big Tech's "surveillance capitalism" and how it impedes not only individuals' attention but also society's collective attention. Big Tech shifts the blame onto individuals, as evidenced by Facebook's algorithm promoting fascism and Nazi groups in Germany. The tech industry's rhetoric claims that consumers can train themselves to cut back on online time, but when those attempts fail, individuals will blame themselves.

The modern Western diet and exposure to pollutants erode focus, leading to a cycle of blood sugar spikes and dips, causing lack of energy and unable to focus. Studies have shown that cutting artificial preservatives and additives from kids' diets can improve their focus by up to 50%. Exposure to pollutants, such as pesticides and flame-retardants, can damage the brain's neurons. Systemic change is necessary to address these issues.

Medicating people with ADHD often fails to target the root cause of their focus problems. A growing body of evidence suggests that 70%-80% of sufferers with ADHD are a product of the patient's environment rather than a biological disorder. Overprescribing ADHD medications to children can be risky, as they can be addictive and can stunt growth and cause heart problems.

To protect focus and channel it towards solving global challenges, it is time to lobby for societal changes, such as a ban on surveillance capitalism, subscription, or public ownership models for social media sites, and a four-day work week for chronic pain sufferers.

Saturday, December 7, 2024

 Problem statement: Given a wire grid of size N * N with N-1 horizontal edges and N-1 vertical edges along the X and Y axis respectively, and a wire burning out every instant as per the given order using three matrices A, B, C such that the wire that burns is  

(A[T], B[T] + 1), if C[T] = 0 or 
(A[T] + 1, B[T]), if C[T] = 1 

Determine the instant after which the circuit is broken  

     public static boolean checkConnections(int[] h, int[] v, int N) { 

        boolean[][] visited = new boolean[N][N]; 

        dfs(h, v, visited,0,0); 

        return visited[N-1][N-1]; 

    } 

    public static void dfs(int[]h, int[]v, boolean[][] visited, int i, int j) { 

        int N = visited.length; 

        if (i < N && j < N && i>= 0 && j >= 0 && !visited[i][j]) { 

            visited[i][j] = true; 

            if (v[i * (N-1) + j] == 1) { 

                dfs(h, v, visited, i, j+1); 

            } 

            if (h[i * (N-1) + j] == 1) { 

                dfs(h, v, visited, i+1, j); 

            } 

            if (i > 0 && h[(i-1)*(N-1) + j] == 1) { 

                dfs(h,v, visited, i-1, j); 

            } 

            if (j > 0 && h[(i * (N-1) + (j-1))] == 1) { 

                dfs(h,v, visited, i, j-1); 

            } 

        } 

    } 

    public static int burnout(int N, int[] A, int[] B, int[] C) { 

        int[] h = new int[N*N]; 

        int[] v = new int[N*N]; 

        for (int i = 0; i < N*N; i++) { h[i] = 1; v[i] = 1; } 

        for (int i = 0; i < N; i++) { 

            h[(i * (N)) + N - 1] = 0; 

            v[(N-1) * (N) + i] = 0; 

        } 

        System.out.println(printArray(h)); 

        System.out.println(printArray(v)); 

        for (int i = 0; i < A.length; i++) { 

            if (C[i] == 0) { 

                v[A[i] * (N-1) + B[i]] = 0; 

            } else { 

                h[A[i] * (N-1) + B[i]] = 0; 

            } 

            if (!checkConnections(h,v, N)) { 

                return i+1; 

            } 

        } 

        return -1; 

    } 

        int[] A = new int[9]; 

        int[] B = new int[9]; 

        int[] C = new int[9]; 

        A[0] = 0;    B [0] = 0;    C[0] = 0; 

        A[1] = 1;    B [1] = 1;    C[1] = 1; 

        A[2] = 1;    B [2] = 1;    C[2] = 0; 

        A[3] = 2;    B [3] = 1;    C[3] = 0; 

        A[4] = 3;    B [4] = 2;    C[4] = 0; 

        A[5] = 2;    B [5] = 2;    C[5] = 1; 

        A[6] = 1;    B [6] = 3;    C[6] = 1; 

        A[7] = 0;    B [7] = 1;    C[7] = 0; 

        A[8] = 0;    B [8] = 0;    C[8] = 1; 

        System.out.println(burnout(9, A, B, C)); 

1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0  

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0  

8 
Alternatively, 

    public static boolean burnWiresAtT(int N, int[] A, int[] B, int[] C, int t) { 

        int[] h = new int[N*N]; 

        int[] v = new int[N*N]; 

        for (int i = 0; i < N*N; i++) { h[i] = 1; v[i] = 1; } 

        for (int i = 0; i < N; i++) { 

            h[(i * (N)) + N - 1] = 0; 

            v[(N-1) * (N) + i] = 0; 

        } 

        System.out.println(printArray(h)); 

        System.out.println(printArray(v)); 

        for (int i = 0; i < t; i++) { 

            if (C[i] == 0) { 

                v[A[i] * (N-1) + B[i]] = 0; 

            } else { 

                h[A[i] * (N-1) + B[i]] = 0; 

            } 

        } 

        return checkConnections(h, v, N); 

    } 

    public static int binarySearch(int N, int[] A, int[] B, int[] C, int start, int end) { 

        if (start == end) { 

            if (!burnWiresAtT(N, A, B, C, end)){ 

                return end; 

            } 

            return  -1; 

        } else { 

            int mid = (start + end)/2; 

            if (burnWiresAtT(N, A, B, C, mid)) { 

                return binarySearch(N, A, B, C, mid + 1, end); 

            } else { 

                return binarySearch(N, A, B, C, start, mid); 

            } 

        } 

    } 

1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0  

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0  

8