continued from previous post...
A fine-grained mixture-of-experts (MoE) architecture
typically works better than any single model. Inference efficiency and model
quality are typically in tension: bigger models typically reach higher quality,
but smaller models are more efficient for inference. Using MoE architecture
makes it possible to attain better tradeoffs between model quality and
inference efficiency than dense models typically achieve.
Companies in the foundational stages of adopting generative
AI technology often lack a clear strategy, use cases, and access to data
scientists. To start, companies can use off-the-shelf Learning Logistic Models
(LLMs) to experiment with AI tools and workflows. This allows employees to
craft specialized prompts and workflows, helping leaders understand their
strengths and weaknesses. LLMs can also be used as a judge to evaluate
responses in practical applications, such as sifting through product reviews.
Large Language Models (LLMs) have the potential to
significantly improve organizations' workforce and customer experiences. By
addressing tasks that currently occupy 60%-70% of employees' time, LLMs can
significantly reduce the time spent on background research, data analysis, and
document writing. Additionally, these technologies can significantly reduce the
time for new workers to achieve full productivity. However, organizations must
first rethink the management of unstructured information assets and mitigate
issues of bias and accuracy. This is why many organizations are focusing on
internal applications, where a limited scope provides opportunities for better
information access and human oversight. These applications, aligned with core
capabilities already within the organization, have the potential to deliver
real and immediate value while LLMs and their supporting technologies continue
to evolve and mature. Examples of applications include automated analysis of
product reviews, inventory management, education, financial services, travel
and hospitality, healthcare and life sciences, insurance, technology and
manufacturing, and media and entertainment.
The use of structured data in GenAI applications can enhance
their quality such as in the case of a travel planning chatbot. Such an
application would use a vector search
and feature-and-function serving building blocks to serve personalized user
preferences and budget and hotel information often involving agents for
programmatic access to external data sources. To access data and functions as
real-time endpoints, federated and universal access control could be used.
Models can be exposed as Python functions to compute features on-demand. Such
functions can be registered with a catalog for access control and encoded in a directed
acyclic graph to compute and serve features as a REST endpoint.
To serve structured data to real-time AI applications,
precomputed data needs to be deployed to operational databases, such as
DynamoDB and Cosmos DB as in the case of AWS and Azure public clouds
respectively. Synchronization of precomputed features to a low-latency data
format is required. Fine-tuning a foundation model allows for more deeply
personalized models, requiring an underlying architecture to ensure secure and
accurate data access.
Most organizations do well with an Intelligence Platform
that helps with model fine-tuning, registration for access control, secure and
efficient data sharing across different platforms, clouds and regions for
faster distribution worldwide, and optimized LLM serving for improved
performance. The choice of such Intelligence platforms should be such that it
is simple infrastructure for fine-tuning models, ensuring traceability from
models to datasets, and enabling faster throughput and latency improvements
compared to traditional LLM serving methods.
No comments:
Post a Comment