From the previous article, RAG-based chatbots can be evaluated by LLM-as-a-judge to agree on human grading on over 80% of judgements if the following can be maintained: using a 1-5 grading scale, use GPT-3.5 to save costs and when you have one grading example per score and use GPT-4 as an LLM judge when you have no examples to understand grading rules. Emitting the metrics about correctness, comprehensiveness, and readability provides justification that becomes valuable. Whether we use a GPT-4, GPT-3.5 or human judgement, the composite scores can be used to tell results apart quantitatively.
Together with the data and the evaluations, a chatbot can be successfully made ready for production but that is not all. A data and ML pipeline are also required. The full orchestration workflow involves data ingestion pipeline, data science and data analysis. The lakehouse paradigm combines key capabilities of data lakes and data warehouses. It expedites the creation of a pipeline from a few weeks to one week at most. The use of data personas on the same lakehouse reduces overhead and improves handoff to various disciplines.
Let us explore these stages of the workflow. Data Ingestion is best served by a medallion architecture. The raw data is stored in a Bronze table. Then it is enriched, and a Delta lake silver table is created. This makes it easy to reprocess the data as it guarantees atomicity with every operation. Schema is enforced and constraints ensure data quality. Filtering out unnecessary data also makes the delta table more refined.
The data science portion consists of three major parts: exploratory data analysis, a chatbot model and LLM-as-a-judge model. Models’ lifecycle management, including experimentation, reproducibility, deployment, and a central model registry can be accomplished with open-source platforms like MLflow. This also comes with programmability and user interface. With model lineage such as the experiment and the run that produced the model, its versioning and stage transitions from staging to production or archiving, and annotations, a complete picture is available in the data science step of the workflow.
Business Intelligence in terms of correlation, dashboards and reports can be facilitated with analytical SQL-like queries and materialized data views. For each query, different views of the data can be studied for the best representation which can then be directly promoted to dashboards, and this cuts down on redundancies and cost. Even a library like Plotly is sufficient to get a comprehensive and appealing visualization of the report. The data analysis is inherently a read-only system and independent of all activities for model training and testing.
A robust data environment with smooth ingestion, processing and retrieval by the whole team, the data collection and cleaning pipelines deployed using Delta tables, the model development and deployment using MLflow for simpler organization and powerful analytical dashboards complete the gamut for launching the project.
No comments:
Post a Comment