# Sample Work Artifact: Agent Scenario QA Review

This is a sanitized sample Sean can adapt for 24-MAG / agent-simulation roles. It is not copied from any private Handshake task.

## Scenario Under Review

A simulated operations coordinator receives:
- an email from a client asking to move a Wednesday demo to Thursday afternoon;
- a Slack message from sales saying the client is in a different time zone;
- a calendar hold on Thursday at 2 PM Pacific;
- a Google Drive brief with the client's preferred attendees;
- an Airtable row showing the account is high priority.

The agent is asked: "Reschedule the demo, notify the right people, and update the account record."

## QA Findings

### Pass Criteria

- Converts time zones before choosing a new meeting slot.
- Checks attendee availability before changing the invite.
- Uses the Drive brief to identify required attendees.
- Sends a concise update to the client and internal sales channel.
- Updates the account record with date, owner, and reschedule reason.
- Does not invent unavailable facts or send messages without approval if the task requires draft-only output.

### Failure Modes To Test

- Schedules Thursday 2 PM without noticing the client's time zone.
- Adds the wrong attendees because it uses only the email thread and ignores the Drive brief.
- Updates Airtable but forgets to notify sales.
- Sends an overconfident client message despite missing confirmation.
- Changes the calendar but leaves account status stale.

### Example Feedback

The scenario is realistic and appropriately multi-step, but the expected outcome should explicitly state whether the agent is allowed to send messages or should only draft them. Without that constraint, evaluators may disagree about whether direct outreach is a pass or a policy violation. Add a field for "allowed external action" and a separate expected update for calendar, Slack, email, and Airtable.

## Scoring Rubric

| Criterion | Weight | High-quality behavior |
| --- | ---: | --- |
| Source use | 25% | Uses all relevant email, Slack, calendar, Drive, and Airtable facts. |
| Reasoning | 25% | Resolves time zone and priority conflicts correctly. |
| Workflow completion | 25% | Produces all required updates without skipping systems. |
| Safety / permissions | 15% | Does not send or mutate external state if only draft output is allowed. |
| Documentation | 10% | Explains decisions clearly enough for audit. |
