A project I was working on just fell apart last Tuesday

I spent three months training a small language model to sort customer emails for a local business. It was working well in tests, about 85% accurate. Then on Tuesday, we gave it a real batch of 500 emails from the past week. It completely failed to understand sarcasm and complaints, mislabeling nearly 40% of them as positive feedback. The whole week has been fixing that mess. Has anyone found a good way to teach an AI model to catch tone, or is it still a major weak spot?

3 comments

3 Comments

umaanderson3mo ago

That 85% test accuracy was probably on clean, labeled data, not messy real emails.

david7393mo ago

Why do you think my inbox is such a mess?

mark_cooper2mo ago

@umaanderson is right. Your 85% tests were probably just routing, not tone.