The challenge
Nova Retail, a D2C fashion brand doing roughly ₹60 crore ARR across their own site, Amazon, Flipkart, and Myntra, was drowning in operational overhead. A 14-person support team was handling 2,400+ tickets per week — order status queries, return requests, sizing questions, refund follow-ups, abandoned cart outreach, marketplace complaint resolution.
The math was breaking:
- Average first response time: 9 hours
- CSAT trending down from 4.4 to 3.7 over two quarters
- Support cost was growing faster than revenue
- Senior agents were spending 60% of their time on queries a well-trained intern could answer
- Every marketplace (Amazon, Flipkart, Myntra) had its own ticketing system, its own SLAs, and its own penalty structure for missed response times
Their previous attempt — a rule-based chatbot — handled 11% of queries and actively annoyed the other 89%.
Our approach
Phase 1: Workflow archaeology (Week 1-2)
Before designing any agent, we reverse-engineered what the humans actually did. We sat with Nova's support team for two weeks and traced 400 real tickets end-to-end:
- What information did the agent need, and where did they get it?
- What tools did they touch (Shopify admin, Unicommerce, courier APIs, marketplace seller panels, Gorgias)?
- What decisions required judgment vs. followed a repeatable pattern?
- Where did escalations actually come from?
We mapped the result into a decision tree. 73% of tickets followed deterministic paths. 19% needed light judgment. 8% genuinely required a human.
That 73% became the scope for agent automation.
Phase 2: Agent architecture (Week 2-5)
We did not build one monolithic agent. We built a small team of specialists, each with a narrow scope and strict guardrails:
- Triage Agent — Receives every incoming ticket (email, WhatsApp, Instagram DM, marketplace message), classifies intent, and routes to the right specialist. Claude Sonnet 4.6 with intent classification fine-tuning against 8,000 historical Nova tickets.
- Order Status Agent — Handles "where is my order" queries. Has read access to Shopify, Unicommerce, and four courier APIs. Can proactively detect stuck shipments and reach out before the customer asks.
- Returns Agent — Walks customers through the return flow, validates return eligibility against SKU-specific policies, generates pickup requests, and updates the ledger. Full closed-loop — customer message to courier dispatch in under 90 seconds.
- Sizing Agent — Answers sizing queries with access to the product fit database and the customer's previous order history. Trained on actual fit complaints so it knows which SKUs run small and which run large.
- Cart Recovery Agent — Runs asynchronously. Reviews abandoned carts hourly, segments by customer value and likely objection, and dispatches personalized WhatsApp and email follow-ups. Generates its own copy per segment rather than using templates.
- Marketplace Agent — Speaks four dialects: Amazon, Flipkart, Myntra, and Ajio. Knows each platform's SLA windows, penalty structures, and acceptable response patterns. Prevents the weekly marketplace fines that were quietly costing Nova ₹3-4 lakhs a month.
- Escalation Agent — The one that knows what it doesn't know. Its whole job is deciding when to hand to a human, with the complete context package assembled.
Agents communicate through a shared state layer (Postgres + Redis) rather than calling each other directly. Every agent action is logged with the full reasoning trace.
Phase 3: Guardrails and safety (Week 4-7)
Autonomous agents acting on customer operations is high-stakes. We built four layers of containment:
Action whitelisting — Every tool call is explicitly scoped. The Returns Agent can create a return; it cannot issue a refund. Refunds above ₹2,000 require human confirmation. Refunds above ₹10,000 require a second human approval.Anomaly detection — A separate monitoring agent watches for unusual patterns (one customer getting three refunds, the same address across many orders, sudden spike in return requests). Flags for review, never auto-blocks.Tone and brand guardrails — Every outgoing customer message runs through a guardrail check for tone, factual grounding, and brand voice before sending. About 4% of drafts are auto-rewritten.Full audit trail — Every agent decision is logged with its reasoning, the tools it called, and the data it saw. Full traceability for compliance and debugging.
Phase 4: Rollout and continuous learning (Week 6-12)
We rolled out in supervised mode first — agents drafted responses, humans approved them. After two weeks at 98%+ approval rates per agent, we flipped to autonomous mode for that agent class.
Weekly we review:
- Every escalation (why did the agent hand off? Should it have handled this?)
- Every customer complaint about an agent interaction
- Sample of 50 random agent conversations per week
- Drift metrics — is the agent's language pattern changing?
Findings feed back into the prompt library and the evaluation suite.
The results
Six months post-launch:
- Tickets handled end-to-end by agents: 73% (up from the 11% rule-based chatbot)
- Average first response time: 9 hours → 42 seconds
- CSAT: 3.7 → 4.6, and rising
- Abandoned cart recovery: ₹1.8 crore recovered in 6 months (0.3% → 7.2% recovery rate)
- Marketplace SLA compliance: 67% → 99.1%; marketplace penalties down from ₹3.8L/month to under ₹20K
- Support team size: unchanged at 14 people — but they now handle the complex cases, upsell, and VIP relationships. Two were promoted to ops analyst roles working on agent performance.
- Support cost per order: down 64%
- Agent-driven revenue: ₹2.3 crore attributable (cart recovery + upsell during conversations + prevented cancellations)
Critically, not a single incident of an agent taking a high-impact wrong action in six months of production. The guardrails held.
Key insight
The temptation with AI agents is to build one agent that does everything. That is the wrong shape. Real leverage comes from a team of narrow agents, each with a clear scope, strict tool access, and an explicit handoff protocol. Narrow agents are easier to evaluate, easier to debug, and easier to improve. They also fail more gracefully — when one agent misbehaves, the blast radius is contained.
The second lesson: spend more time on guardrails than on the agents themselves. Anyone can prompt an LLM to be helpful. The hard work is making sure it stays helpful when it's running 24/7 with access to real customer data and real financial actions.
Client feedback
"We expected automation. What Agix built is an operations team that happens to be made of software. The agents understand our brand, our policies, and our customers better than most humans we've hired. More importantly, they know when to step back." — Founder & CEO, Nova Retail