When Amazon rolled out Rufus, its AI shopping assistant that has been trained on the entire Amazon catalog, it didn't launch in one market. It launched in nine at once. And almost immediately, the research team started running into cultural landmines that no amount of model tuning could fix on its own.
In India, customers didn't realize Rufus was an AI. They thought it was "a shopkeeper sitting behind a computer." In Japan, customers told researchers that Rufus wasn't polite enough. In France, informal phrasing was read as disrespectful. In Italy, customers wanted more emojis and more animation. Amazon's Niketa Jhaveri presented on stage at Quirk's Chicago 2026 event last week, covering how her team navigates these challenges. She made a compelling case for what the research function actually looks like in an AI-native company.
Niketa's clearest framing was also her most quotable: "We are not only designing for AI, but also designing with AI."
Rufus is the product. But Rufus is also a research subject, one that sees 90 million daily Amazon users interacting with it across languages, cultures, product catalogs and expectations. Researchers are designing the experience Rufus delivers and shaping the intelligence behind it.
That shift reframes what research is for. In Niketa's words: "Research doesn't just inform AI. It teaches it. Every study, every insight, every shape — how AI learns from people, their behavior, ethics and intent."
The part of the talk that drew the biggest reactions was the list of things Amazon's team caught only because they tested in nine markets in parallel.
A rice cooker in the U.S. is not the same product as a rice cooker in Japan: different hardware, different use case, different language around it. A friendly, casual tone in Rufus read as cold in Japan and disrespectful in France. Italy wanted more expressiveness and visual motion. India needed Rufus to more clearly signal that it was AI.
None of those are technical failures of the model. They're human-system failures, gaps that only show up when you observe real people in real markets. "Scaling AI responsibility isn't just a technical problem," Niketa said. "It's actually a human system problem."
It's also why she described the research function as three things at once: "the connector and the calibrator and a co-pilot — ensuring global consistency while respecting local nuances."
Niketa walked through Amazon's current research workflow: classic stages (problem framing, customer insight, qual and quant, prototype, continuous feedback, synthesis) stitched together with modern tools. What's different isn't the stages, it's the speed.
The old way, she said, was "weeks of synthesizing insights, sitting with the spreadsheet, one market at a time. By the time you actually have your insights, it's already old." The new way keeps up with product managers shipping the next Rufus feature tomorrow. But she was clear-eyed: "We still have to have human judgment in the picture to make sure the AI is actually spitting out the right analysis."
To translate all that into something product teams can use, her team runs five frameworks in parallel: conversation patterns, product features, sentiment, customer pain points, and localization. Her favorite is pain points. "The why behind the behavior," she called it, which is the part teams most often underweight.
The researcher’s role is shifting and evolving to no longer being simple observers and conducting the usability test. They are now the strategic decision maker in the organization.
Niketa's talk said the quiet part out loud. Every cultural miss Amazon caught in various countries was really a data miss. Without high-quality research from real consumers in real markets, those gaps would have been baked into the model, and no amount of post-launch tuning would have surfaced them.
These AI models behave exactly as the inputs shape them to, which means the biggest lever on AI performance isn't another algorithm, it's the consumer truth going in. Messy, real, lived human signal from the right people in the right moments is what separates AI that represents consumers from AI that flattens them into an average.
That's the work insights teams own now. Whether you're training a model, calibrating one, or buying AI-powered tools off the shelf, the question underneath everything is the same: how confident are you in the consumer data doing the teaching?
Niketa put it best: "The future of AI innovation won't be defined by algorithms alone, but by the questions researchers ask."