Sunday, December 15, 2024

 This is a summary of the book titled “The Learning Mindset: Combining Human Competencies with Technology to thrive” written by Katja Schipperheijn and published by Kogan Page in 2024. The title refers to continuous learning and adaptability which the author argues is beneficial across ages, gender, and demography. She dispels myths about learning capacity and suggests that the visionary leaders aka “LearnScapers” will combine human competencies such as curiosity, empathy, critical thinking and learning mindset with advancing AI which is catalyzing the rate of innovations across domains. Engaged citizens maintain a learning mindset and it distinguishes them in their workplace which results in organization’s growth and culture. Combined with strategic thinking and social learning, the status quo can be challenged, and a culture of trust, respect and communication can be cultivated.

Artificial intelligence (AI) presents a challenge for people, companies, and governments, and adopting a learning and growth mindset is crucial for survival and thrive in this ever-faster-changing world. Learning encompasses personal, cultural, social, and experiential influences, and managing emotions can optimize learning potential. Differentiating between training and learning, such as unconscious learning, can boost adaptability and creativity. Mental complexity and motivation can increase with age, and focusing on developing a growth mindset can enrich personal and collaborative learning experiences. As AI evolves, people must enforce human-centric values and push for bias-free algorithms. Ensuring accountability, transparency, and security in AI applications is essential to foster trust and uphold fundamental human rights. The evolving legal environment presents challenges in creating frameworks for AI use, particularly regarding privacy and ethical standards. To remain relevant in rapidly changing fields, workers must shift from a single-skill focus to a multi-skilled approach, fostering curiosity and self-directed learning. Learning extends beyond formal education, encompassing digital worlds where informal skills flourish.

In the era of AI, unique competencies like curiosity, critical thinking, and empathy will become more important in the workplace. These competencies complement a learning mindset, encouraging resilience and a positive outlook on challenges and innovation. Consilience, an interdisciplinary approach that combines insights from various fields, fosters creativity and problem-solving. A learning mindset promotes adaptability, open-mindedness, and continuous improvement, breaking down disciplinary barriers.

Adopting a learning influencer role involves setting an example and encouraging a culture of trust and feedback. Embracing diverse perspectives and challenging the status quo can unlock creative potential within teams. Implementing initiatives like hackathons and open dialogue can foster a dynamic workforce ready to tackle future challenges collaboratively.

Optimizing team efficiency through strategic communication and social learning can prevent frustration and improve team dynamics. Efficient time management and identifying team member expertise are essential in rapidly changing environments. A supportive environment helps build trust and learn from one another, enhancing cohesion. Nielsen's "1-9-90 rule" can help engage team members by identifying different participation levels.

To foster a learning culture, leaders should challenge the status quo, balance innovation with effective team management, and create a culture that encourages collaboration and autonomy. Influential learning leaders combine inspiration, authenticity, empathy, and communication to foster strong team dynamics, trust, respect, and communication. They establish clear goals, embrace open dialogue, and empower team members to contribute ideas and take ownership of their roles. They view problems as opportunities and encourage an experimental mindset, using strategic frameworks like design thinking and scenario planning. As a "LearnScaper" leader, they build an ecosystem that encourages creativity and human-AI interactions, prioritizing the integration of humans and machines. By fostering a learning mindset, leaders can ensure adequate information flow within the organization and nurture employees' personal growth and adaptation.


Friday, December 13, 2024

 Constant evaluation and monitoring of deployed large language models and generative AI applications are important because both the data and the environment might vary. There can be shifts in performance, accuracy or even the emergence of biases. Continuous monitoring helps with early detection and prompt responses, which in turn makes the models’ outputs relevant, appropriate, and effective. Benchmarks help to evaluate models but the variations in results can be large. This stems from a lack of ground truth. For example, it is difficult to evaluate summarization models based on traditional NLP metrics such as BLEU, ROUGE etc. because summaries generated might have completely different words or word order. Comprehensive evaluation standards are elusive for LLMs and reliance on human judgment can be costly and time-consuming. The novel trend of “LLMs as a judge” still leaves unanswered questions about reflecting human preferences in terms of correctness, readability and comprehensiveness of the answers, reliability and reusability on different metrics, use of different grading scales by different frameworks and the applicability of the same evaluation metric across diverse use cases.

Since chatbots are common applications of LLM, an example of evaluating a chatbot now follows. The underlying principle in a chatbot is Retrieval Augmented Generation and it is quickly becoming the industry standard for developing chatbots. As with all LLM and AI models, it is only as effective as the data which in this case is the vector store aka knowledge base. The LLM could be newer GPT3.5 or GPT4 to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge. Evaluating the quality of chatbot responses must take into account both the knowledge base and the model involved. LLM-as-a-judge fits this bill for automated evaluation but as noted earlier, it may not be at par with human grading, might require several auto-evaluation samples and may have different responsiveness to different chatbot prompts. Slight variations in the prompt or problem can drastically affect its performance.

RAG-based chatbots can be evaluated by LLM-as-a-judge to agree on human grading on over 80% of judgements if the following can be maintained: using a 1-5 grading scale, use GPT-3.5 to save costs and when you have one grading example per score and use GPT-4 as an LLM judge when you have no examples to understand grading rules.

The initial evaluation dataset can be formed from say 100 chatbot prompts and context from the domain in terms of (chunks of ) documents that are relevant to the question based on say F-score. Using the evaluation dataset, different language models can be used to generate answers and stored in question-context-answers pairs in a dataset called “answer sheets”. Then given the answer sheets, various LLMs can be used to generate grades and reasoning for grades. Each grade can be a composite score with weighted contributions for correctness (mostly), comprehensiveness and readability in equal proportions of the remaining weight. A good choice of hyperparameters is equally applicable to LLM-as-a-judge and this could include low temperature of say 0.1 to ensure reproducibility, single-answer grading instead of pairwise comparison, chain of thoughts to let the LLM reason about the grading process before giving the final score and examples in grading for each score value on each of the three factors. Factors that are difficult to measure quantitatively include helpfulness, depth, creativity etc. Emitting the metrics about correctness, comprehensiveness, and readability provides justification that becomes valuable. Whether we use a GPT-4, GPT-3.5 or human judgement, the composite scores can be used to tell results apart quantitatively. The overall workflow for the creation of LLM-as-a-judge is also similar to the data preparation, indexing relevant data, information retrieval and response generation for the chatbots themselves.


Thursday, December 12, 2024

 

Software-as-a-service LLMs aka SaaS LLMs are way more costly than those developed and hosted using foundational models in workspaces either on-premises or in the cloud because they need to address all the use cases including a general chatbot. The generality incurs cost. For a more specific use case, a much smaller prompt suffices and it can also be fine-tuned by baking the instructions and expected structure into the model itself. Inference costs can also rise with the number of input and output tokens and in the case of SaaS services, they are charged per token. Specific use case models can even be implemented with 2 engineers in 1 month with a few thousand dollars of compute for training and experimentation and tested by 4 human evaluators and an initial set of evaluation examples.

SaaS LLMs could be a matter of convenience. Developing a model from scratch often involves significant commitment both in terms of data and computational resources such as pre-training. Unlike fine-tuning, pre-training is a process of training a language model on a large corpus of data without using any prior knowledge or weights from an existing model. This scenario makes sense when the data is quite different from what off-the-shelf LLMs are trained on or where the domain is rather specialized when compared to everyday language or there must be full control over training data in terms of security, privacy, fit and finish for the model’s foundational knowledge base or when there are business justifications to avoid available LLMs altogether.

Organizations must plan for the significant commitment and sophisticated tooling required for this.  Libraries  like PyTorch FSDP and Deepspeed are required for their distributed training capabilities when pretraining an LLM from scratch.  Large-scale data preprocessing is required and involves distributed frameworks and infrastructure that can handle scale in data engineering. Training of an LLM cannot commence without a set of optimal hyperparameters. Since training involves high costs from long-running GPU jobs, resource utilization must be maximized. Even the length of time for training might be quite large which makes GPU failures more likely than normal load. Close monitoring of the training process is essential. Saving model checkpoints regularly and evaluating validation sets acts as safeguards.

Constant evaluation and monitoring of deployed large language models and generative AI applications are important because both the data and the environment might vary. There can be shifts in performance, accuracy or even the emergence of biases. Continuous monitoring helps with early detection and prompt responses, which in turn makes the models’ outputs relevant, appropriate and effective.  Benchmarks help to evaluate models but the variations in results can be large. This stems from a lack of ground truth. For example, it is difficult to evaluate summarization models  based on traditional NLP metrics such as BLEU, ROUGE etc because summaries generated might have completely different words or word ordes. Comprehensive evaluation standards are elusive for LLMs and reliance on human judgment can be costly and time-consuming. The novel trend of “LLMs as a judge” still leaves unanswered questions about reflecting human preferences in terms of correctness, readability and comprehensiveness of the answers, reliability and reusability on different metrics, use of different grading scales by different frameworks and the applicability of the same evaluation metric across diverse use cases.

Finally, the system must be simplified for use with model serving to manage, govern and access models via unified endpoints to handle specific LLM requests.

Wednesday, December 11, 2024

 continued from previous post...

A fine-grained mixture-of-experts (MoE) architecture typically works better than any single model. Inference efficiency and model quality are typically in tension: bigger models typically reach higher quality, but smaller models are more efficient for inference. Using MoE architecture makes it possible to attain better tradeoffs between model quality and inference efficiency than dense models typically achieve.

Companies in the foundational stages of adopting generative AI technology often lack a clear strategy, use cases, and access to data scientists. To start, companies can use off-the-shelf Learning Logistic Models (LLMs) to experiment with AI tools and workflows. This allows employees to craft specialized prompts and workflows, helping leaders understand their strengths and weaknesses. LLMs can also be used as a judge to evaluate responses in practical applications, such as sifting through product reviews.

Large Language Models (LLMs) have the potential to significantly improve organizations' workforce and customer experiences. By addressing tasks that currently occupy 60%-70% of employees' time, LLMs can significantly reduce the time spent on background research, data analysis, and document writing. Additionally, these technologies can significantly reduce the time for new workers to achieve full productivity. However, organizations must first rethink the management of unstructured information assets and mitigate issues of bias and accuracy. This is why many organizations are focusing on internal applications, where a limited scope provides opportunities for better information access and human oversight. These applications, aligned with core capabilities already within the organization, have the potential to deliver real and immediate value while LLMs and their supporting technologies continue to evolve and mature. Examples of applications include automated analysis of product reviews, inventory management, education, financial services, travel and hospitality, healthcare and life sciences, insurance, technology and manufacturing, and media and entertainment.

The use of structured data in GenAI applications can enhance their quality such as in the case of a travel planning chatbot. Such an application would use  a vector search and feature-and-function serving building blocks to serve personalized user preferences and budget and hotel information often involving agents for programmatic access to external data sources. To access data and functions as real-time endpoints, federated and universal access control could be used. Models can be exposed as Python functions to compute features on-demand. Such functions can be registered with a catalog for access control and encoded in a directed acyclic graph to compute and serve features as a REST endpoint.

To serve structured data to real-time AI applications, precomputed data needs to be deployed to operational databases, such as DynamoDB and Cosmos DB as in the case of AWS and Azure public clouds respectively. Synchronization of precomputed features to a low-latency data format is required. Fine-tuning a foundation model allows for more deeply personalized models, requiring an underlying architecture to ensure secure and accurate data access.

Most organizations do well with an Intelligence Platform that helps with model fine-tuning, registration for access control, secure and efficient data sharing across different platforms, clouds and regions for faster distribution worldwide, and optimized LLM serving for improved performance. The choice of such Intelligence platforms should be such that it is simple infrastructure for fine-tuning models, ensuring traceability from models to datasets, and enabling faster throughput and latency improvements compared to traditional LLM serving methods.


Tuesday, December 10, 2024

 There is a growing need for dynamic, dependable, and repeatable infrastructure as the scope of deployment expands from small footprint to cloud scale. With emerging technologies like Generative AI, the best practices for cloud deployment have not matured enough to create playbooks. Generative Artificial Intelligence (AI) refers to a subset of AI algorithms and models that can generate new and original content, such as images, text, music, or even entire virtual worlds. Unlike other AI models that rely on pre-existing data to make predictions or classifications, generative AI models create new content based on patterns and information they have learned from training data. Many organizations continue to face challenges to deploy these applications at production quality. The AI output must be accurate, governed and safe.

Data infrastructure trends that have become popular in the wake of Generative AI include data lakehouses which brings out the best of data lakes and data warehouses allowing for both storage and processing, vector databases for both storing and querying vectors, and the ecosystem for ETL, data pipelines and connectors facilitating input and output of data at scale and even supporting real-time ingestion. In terms of infrastructure for data engineering projects, customers usually get started on a roadmap that progressively builds a more mature data function. One of the approaches for drawing this roadmap that experts observe as repeated across deployment stamps involves building a data stack in distinct stages with a stack for every phase on this journey. While needs, level of sophistication, maturity of solutions, and budget determines the shape these stacks take, the four phases are more or less distinct and repeated across these endeavors. They are starters, growth, machine-learning and real-time. Customers begin with a starters stack where the essential function is to collect the data and often involve implementing a drain. A unified data layer in this stage significantly reduces engineering bottlenecks. A second stage is the growth stack which solves the problem of proliferation of data destinations and independent silos by centralizing data into a warehouse which also becomes a single source of truth for analytics. When this matures, customers want to move beyond historical analytics and into predictive analytics. At this stage, a data lake and machine learning toolset come handy to leverage unstructured data and mitigate problems proactively. The next and final frontier to address is the one that overcomes a challenge in this current stack which is that it is impossible to deliver personalized experiences in real-time.

Even though it is a shifting landscape, the AI models are largely language models and some serve as the foundation for layers for increasingly complex techniques and purpose. Foundation models commonly refer to large language models that have been trained over extensive datasets to be generally good at some task(chat, instruction following, code generation, etc.) and they largely follow in two categories: proprietary (such as Phi, GPT-3.5 and Gemini) and open source (such as Llama2-70B and DBRX). DBRX for its popularity with Databricks platform that is ubiquitously found on different public clouds are transformer-based decoder large language models that are trained using next-token prediction. There are benchmarks available to evaluate foundational models.

Many end-to-end LLM training pipeline are becoming more compute-efficient. This efficiency is the result of a number of improvements including better architecture, network changes, better optimizations, better tokenization and last but not the least – better pre-training data which has a substantial impact on model quality.


Monday, December 9, 2024

 There are N points (numbered from 0 to N−1) on a plane. Each point is colored either red ('R') or green ('G'). The K-th point is located at coordinates (X[K], Y[K]) and its color is colors[K]. No point lies on coordinates (0, 0).

We want to draw a circle centered on coordinates (0, 0), such that the number of red points and green points inside the circle is equal. What is the maximum number of points that can lie inside such a circle? Note that it is always possible to draw a circle with no points inside.

Write a function that, given two arrays of integers X, Y and a string colors, returns an integer specifying the maximum number of points inside a circle containing an equal number of red points and green points.

Examples:

1. Given X = [4, 0, 2, −2], Y = [4, 1, 2, −3] and colors = "RGRR", your function should return 2. The circle contains points (0, 1) and (2, 2), but not points (−2, −3) and (4, 4).

class Solution {

    public int solution(int[] X, int[] Y, String colors) {

        // find the maximum

        double max = Double.MIN_VALUE;

        int count = 0;

        for (int i = 0; i < X.length; i++)

        {

            double dist = X[i] * X[i] + Y[i] * Y[i];

            if (dist > max)

            {

                max = dist;

            }

        }

        for (double i = Math.sqrt(max) + 1; i > 0; i -= 0.1)

        {

            int r = 0;

            int g = 0;

            for (int j = 0; j < colors.length(); j++)

            {

                if (Math.sqrt(X[j] * X[j] + Y[j] * Y[j]) > i)

                {

                    continue;

                }

                if (colors.substring(j, j+1).equals("R")) {

                    r++;

                }

                else {

                    g++;

                }

            }

            if ( r == g && r > 0) {

                int min = r * 2;

                if (min > count)

                {

                    count = min;

                }

            }

        }

        return count;

    }

}

Compilation successful.

Example test: ([4, 0, 2, -2], [4, 1, 2, -3], 'RGRR')

OK

Example test: ([1, 1, -1, -1], [1, -1, 1, -1], 'RGRG')

OK

Example test: ([1, 0, 0], [0, 1, -1], 'GGR')

OK

Example test: ([5, -5, 5], [1, -1, -3], 'GRG')

OK

Example test: ([3000, -3000, 4100, -4100, -3000], [5000, -5000, 4100, -4100, 5000], 'RRGRG')

OK


Sunday, December 8, 2024

 This is a summary of the book titled “Stolen Focus: Why you can’t pay attention and how to think deeply again” written by Johann Hari and published by Crown in 2022. The author struggled with addiction to electronic devices and information overload, so he escaped to Cape Cod without internet enabled devices and embraced the “Cold turkey” method. His time gave him insights into focus that most others may not have experienced in this manner. He examines the issues surrounding human’s struggle to focus and proposes individual as well as societal solutions to attention crisis. He suggests that this valuable commodity can be reclaimed by entering a “flow state”. Some of the modern-day ailments include sleep deprivation are societal issues and they impede our ability to focus. Letting our mind wander can help us regain focus. Big Tech steals our data, focus and attention but somehow subtly passes the blame back to individuals via privacy notices. Other modern-day habits of poor or inefficient diet, exposure to environmental factors, and chemical imbalances also destroy focus. Even medications do not address these predicament we find ourselves in.

Humanity is facing an attention crisis due to the overwhelming amount of information available to us. A study by Sune Lehmann found that people's collective focus had been declining before the internet age, but the internet has accelerated its decline. The flood of information hinders the brain's ability to filter out irrelevant information and makes it less likely to understand complex topics. Multitasking is a myth, as it is actually "task-switching," which impairs focus in four ways: "the switch cost effect," "the screw-up effect," "the creativity drain," and "the diminished memory effect." To regain attention, individuals should enter a "flow state" and engage in activities that promote their well-being. Social media companies, like Instagram, use rewards to steal attention, as users engage with platforms to accumulate rewards, such as "hearts and likes," that represent social validation.

Psychologist Mihaly Csikszentmihalyi's work on the "flow state" suggests that finding a clear, meaningful goal can redirect personal focus. However, the popularity of reading books is decreasing due to the rise of social media and the need for shorter, bite-sized messages. To enter a flow state, choose a goal that is neither too far beyond one's abilities nor too easy.

Sleep deprivation is a societal issue, with people's sleep duration decreasing by 20% in the past century. Sleep is essential for the brain, as it removes metabolic waste and helps process waking-life emotions. To combat sleep deprivation, avoid chemically inducing sleep, avoid blue light-emitting screens, and limit exposure to artificial light at night.

Letting the mind wander can help regain focus by activating the "default-mode network" region of the brain, which helps make sense of the world, engage in creative problem-solving, and enable mental time travel. Daydreaming may seem like distracted thinking, but it is just a temporary solution to the problem of lost focus and mind-wandering.

Tristan Harris and Aza Raskin, two tech experts, have raised concerns about the ethics of social media platforms. Harris, who became Google's first "design ethicist," questioned the ethics of designing distractions to increase user engagement, which corroded human thinking. Raskin, creator of the "infinite scroll," calculated that his invention wasted enough user time to equate to 200,000 human life spans every day. Both Harris and Raskin left Google when they realized the company had no intention of changing its behavior meaningfully, as doing so would harm its bottom line. They share concerns about Big Tech's "surveillance capitalism" and how it impedes not only individuals' attention but also society's collective attention. Big Tech shifts the blame onto individuals, as evidenced by Facebook's algorithm promoting fascism and Nazi groups in Germany. The tech industry's rhetoric claims that consumers can train themselves to cut back on online time, but when those attempts fail, individuals will blame themselves.

The modern Western diet and exposure to pollutants erode focus, leading to a cycle of blood sugar spikes and dips, causing lack of energy and unable to focus. Studies have shown that cutting artificial preservatives and additives from kids' diets can improve their focus by up to 50%. Exposure to pollutants, such as pesticides and flame-retardants, can damage the brain's neurons. Systemic change is necessary to address these issues.

Medicating people with ADHD often fails to target the root cause of their focus problems. A growing body of evidence suggests that 70%-80% of sufferers with ADHD are a product of the patient's environment rather than a biological disorder. Overprescribing ADHD medications to children can be risky, as they can be addictive and can stunt growth and cause heart problems.

To protect focus and channel it towards solving global challenges, it is time to lobby for societal changes, such as a ban on surveillance capitalism, subscription, or public ownership models for social media sites, and a four-day work week for chronic pain sufferers.