Grok 4.1

8 min read

You can access Grok models on Definable AI.

We're thrilled to announce that Grok 4.1 is now available on Definable AI, bringing transformative improvements to real-world AI usability. This latest model from xAI delivers exceptional performance in creative, emotional, and collaborative interactions. Grok 4.1 is remarkably perceptive to nuanced intent, engaging to interact with, and maintains coherent personality—all while preserving the razor-sharp intelligence and reliability that defines the Grok family.

Advanced Training for Human-Centric AI

To achieve these breakthroughs, xAI utilized the same large-scale reinforcement learning infrastructure that powered Grok 4, applying it to optimize style, personality, helpfulness, and alignment. For optimizing non-verifiable reward signals, xAI developed innovative methods leveraging frontier agentic reasoning models as reward models to autonomously evaluate and iterate on responses at scale.

Proven Performance Through Silent Rollout

Silent Rollout Period: November 1–14, 2025

xAI conducted a progressive silent rollout of preliminary Grok 4.1 builds to an increasingly larger portion of production traffic across grok.com, X, and mobile applications. Throughout this two-week period, continuous blind pairwise evaluations were performed on live traffic.

Results: Compared to the previous production model, Grok 4.1 was preferred 64.78% of the time.

State-of-the-Art General Capability

Grok 4.1 establishes a new benchmark in blind human preference evaluations, now accessible on Definable AI.

In LMArena's Text Arena, Grok 4.1 Thinking (code name: quasarflux) claims the #1 overall position with 1483 Elo—a commanding 31-point margin over the highest non-xAI model. Grok 4.1 in non-reasoning mode (code name: tensor) uses zero thinking tokens for immediate responses and ranks #2 at 1465 Elo. Remarkably, Grok 4.1 non-thinking surpasses every other model's full-reasoning configuration on the public leaderboard. This represents a substantial advancement from Grok 4, which held an overall rank of #33.

Emotional Intelligence Excellence

To assess progress in personality and interpersonal capabilities, Grok 4.1 was evaluated on EQ-Bench3. This LLM-judged assessment measures active emotional intelligence abilities, understanding, insight, empathy, and interpersonal skills. The test comprises 45 challenging roleplay scenarios, predominantly featuring pre-written prompts spanning 3 turns. The benchmark validates model responses against multiple criteria and conducts pairwise comparisons to generate a normalized Elo computation for leaderboard ranking.

Scores were computed using the official benchmark repository with default sampling parameters, prescribed judge (Claude Sonnet 3.7), and no system prompt in accordance with benchmark standards.

[EQ-Bench3 performance data showing Grok 4.1's superior emotional intelligence scores]

Here's an example of how Grok 4.1 responds to an emotional prompt:

[Example demonstration of Grok 4.1's emotionally intelligent response]

Creative Writing Mastery

Grok 4.1's performance was also measured on the Creative Writing v3 benchmark. In this evaluation, models generate responses to 32 distinct writing prompts across 3 iterations. Similar to EQ-Bench, scoring utilizes both rubrics and model battle normalized Elo.

[Creative Writing v3 benchmark results]

Here are examples of how Grok 4.1 responds to creative writing prompts:

Reduced Hallucinations for Greater Reliability