Question Collections, Monitoring, Testing Best Practices
Seamless workflow:
- Identify issues: Use monitoring dashboard to spot problematic classifications
- Investigate scope: Click "View All Questions" to see comprehensive question history
- Select related questions: Use filters and multi-select to identify affected questions
- Create test collections: Group questions into themed collections for follow-up testing
- Validate fixes: Use collections for systematic verification after issue resolution
Example scenario:
- Monitoring shows 15 questions classified as "Authentication Issues"
- Click into the classification to see all affected questions
- Multi-select the questions that should be resolved by your upcoming fix
- Create "Auth Fix Validation - March 2024" collection
- Schedule regular tests on this collection to verify the fix
Advanced Filtering for Collection Building
Groups-based collections:
- Filter questions by user groups (admin, UAT, training groups)
- Create collections specific to user privilege levels
- Test how different user types experience your assistant
Time-based collections:
- Use date filtering to focus on recent issues
- Build collections from specific incident time periods
- Compare question patterns before and after changes
Classification-based collections:
- Build collections from questions with specific classifications
- Create validation suites for particular issue types
- Organize testing around functional areas or problem categories
Collection Organization Best Practices
Naming Conventions
Descriptive, purposeful names:
- ✅ "Authentication Issues - March 2024"
- ✅ "Post-Login-Fix Validation"
- ✅ "UAT Group Regression Tests"
- ❌ "Test Collection 1"
- ❌ "Random Questions"
Include context that helps with:
- Issue type or functional area
- Time period or version relevance
- User group or testing scope
- Purpose (validation, regression, exploration)
Strategic Collection Types
Issue-Specific Collections:
- Group questions by the type of problem they represent
- Useful for focused testing after fixes
- Easy to schedule for regular regression testing
User-Journey Collections:
- Organize questions that represent complete user workflows
- Test end-to-end experiences across your assistant
- Validate that complex interactions work as expected
Validation Collections:
- Questions specifically chosen to verify fixes or improvements
- Pre and post-fix comparison sets
- Critical path testing for important functionality
Exploratory Collections:
- Questions that represent edge cases or unusual requests
- Help identify new potential issues
- Support ongoing assistant improvement efforts
Context-Aware Testing
When leveraging the full context preservation feature:
- Test both standalone questions and context-dependent follow-ups
- Verify that context is properly maintained across collection executions
- Use this feature to build comprehensive conversation test suites
Testing Multi-Turn Conversations
With full context preservation, you can now effectively test complex conversation flows:
Steps to Implement
- Engage in a multi-turn conversation with the assistant
- Add follow-up questions from deep in the conversation to test collections
- Run tests to verify context-dependent responses work correctly
Benefits
- Ensures conversational continuity
- Tests real-world usage patterns
- Validates context retention
Building Test Suites from Monitoring
Leverage monitoring insights to create targeted test collections:
Steps to Implement
- Identify problematic questions or patterns in monitoring
- Use multi-select to gather related questions
- Create focused test collections for specific issues
- Run regular tests to validate fixes
Benefits
- Proactive issue detection
- Systematic validation of improvements
- Organized approach to quality assurance
Using Prompts to Build Custom Evaluations
Associating Evaluation Prompts to Question Collections
When an evaluation prompt is associated with a question collection, any variables defined within the prompt are automatically added to each question in that collection.
For example, suppose you want to verify that time periods are selected correctly in a set of questions. You could create an evaluation prompt that asserts expectations about the results. In this prompt, the expected values would be defined as a variable. Once the prompt is associated with the question collection, that variable becomes available for each question, allowing you to specify different expected values per question.
Expected Output
The final output returned from the prompt must match this:
A JSON object that matches the following schema:explanation: string, pass: boolean
Available Context
The full chat_entry object is available to assert against. This will include answers, visualizations, timing and more. See AnswerRockets SDK for further documentation.
Updated 9 days ago