Scores
Scores in the OuterBox AI Observability platform provide a systematic way to evaluate and measure the quality and effectiveness of your P3 AI Web Assistant's responses. They help you quantify performance, track improvements over time, and identify areas that need attention.
What Are Scores?
Scores are evaluation metrics that can be applied to traces, sessions, or specific observations. They allow you to:
- Measure the quality of your assistant's responses
- Track user satisfaction
- Evaluate technical performance
- Compare effectiveness across different types of inquiries
Think of scores as your quality measurement system, providing objective metrics to evaluate how well your assistant is performing.
Types of Scores
The OuterBox AI Observability platform supports several types of scores:
Numeric Scores
Quantitative ratings on a scale, such as:
- User satisfaction ratings (1-5 stars)
- Response accuracy percentages (0-100%)
- Technical performance metrics (response time in seconds)
Categorical Scores
Qualitative assessments using predefined categories, such as:
- Response quality (Excellent, Good, Fair, Poor)
- Resolution status (Resolved, Partially Resolved, Unresolved)
- User intent classification (Product Inquiry, Technical Support, Pricing Question)
Boolean Scores
Simple yes/no evaluations, such as:
- Whether the query was answered correctly
- Whether the user received the information they needed
- Whether a follow-up was required
How to Use Scores Effectively
Accessing Scores
- Navigate to the "Scores" section in the left sidebar
- View aggregated score data across all interactions
- Filter scores by type, time period, or other attributes
Analyzing Score Data
When examining scores, you can:
- Track average scores over time to measure improvement
- Compare scores across different types of inquiries
- Identify patterns in low or high scores
- Correlate scores with specific product areas or question types
Common Use Cases for Scores
- Quality Monitoring: Track the overall quality of your assistant's responses
- Improvement Measurement: Quantify the impact of training or configuration changes
- Problem Area Identification: Pinpoint specific topics or question types where your assistant struggles
- User Satisfaction Tracking: Monitor how satisfied users are with the assistance they receive
Sources of Score Data
Scores can come from several sources:
User Feedback
Direct ratings or feedback provided by users after interacting with your assistant
Human Evaluation
Manual review and scoring of conversations by your team members
Automated Evaluation
Algorithmic assessment of response quality based on predefined criteria
System Metrics
Technical performance measurements like response time or resource usage
Best Practices for Working with Scores
- Define Clear Criteria: Establish consistent standards for what constitutes good or poor performance
- Collect Diverse Metrics: Use a combination of user feedback, human evaluation, and automated metrics
- Track Trends Over Time: Focus on patterns and changes rather than individual scores
- Act on Insights: Use score data to guide specific improvements to your assistant
Setting Up Score Dashboards
To get the most value from scores:
- Create custom dashboards focusing on your most important score metrics
- Set up regular reports to track score trends over time
- Establish thresholds or targets for key performance indicators
- Configure alerts for when scores fall below acceptable levels
Scores provide the quantitative foundation for measuring and improving your P3 AI Web Assistant's performance, helping you deliver consistently high-quality experiences to your users.