Measuring the occupational impact of generative AI with 200,000 Microsoft Bing Copilot conversations

CommunicationLLMNews

What 200,000 Copilot chats reveal about work

Drawing on 200,000 anonymized U.S. conversations with Microsoft Bing Copilot in 2024, this study measures where generative AI meaningfully assists or performs real work activities. It introduces an AI applicability score that blends activity coverage, task success, and scope to compare occupations, revealing strong alignment with knowledge work and communication-heavy roles. Results broadly validate prior forecasts while highlighting that assistance is widespread, direct performance is narrower, and downstream economic effects remain uncertain.

Points clés

  • The analysis covers 200,000 anonymized Microsoft Bing Copilot conversations (U.S., Jan–Sep 2024) split into Copilot-Uniform (~100k) and Copilot-Thumbs (~100k) datasets.
  • Work activities are classified with O*NET 29.0 Intermediate Work Activities (332 IWAs) using a GPT-4o pipeline validated against human annotators.
  • An AI applicability score combines coverage (≥0.05% activity share threshold), task completion (LLM classifier correlated with thumbs feedback at r > 0.75), and scope of impact (six-point Likert).
  • Users most often seek information gathering and writing; AI most often responds by advising, teaching, explaining, and providing information.
  • In 40% of conversations, user goals and AI actions are disjoint, underscoring a split between human execution and AI coaching.
  • Common IWAs earn 50%+ positive feedback; writing, researching, and purchasing fare well, while data analysis and visual design lag.
  • Top applicability shows for knowledge and communication roles; Interpreters and Translators lead with 98% overlap, alongside writers/editors, sales, customer service, programming, and clerical roles.
  • Physical, machinery, and manual roles (e.g., nursing assistants, plant operators, dishwashers, roofers) show the lowest applicability for LLM-style tools.
  • The score correlates with Eloundou et al.’s E1 predictions at r = 0.73 by occupation and r = 0.91 by SOC major group, with notable divergences (e.g., higher-than-expected impact for market research analysts and CNC tool programmers).
  • Socioeconomic links are modest: applicability weakly correlates with wages (employment-weighted r = 0.07) and is higher for bachelor’s-degree jobs (mean 0.27 vs. 0.19), especially on the AI action side.

À retenir

If your job leans on gathering, writing, or explaining information, congratulations—you’ve basically been training Copilot for it. Start by offloading research, drafting, and customer responses, then iterate with clear prompts, thumbs feedback, and a reality check on scope (AI makes a great coach, not always a closer). If you’re in hands-on or machine-heavy roles, don’t hold your breath for chatbot magic—focus on adjacent digital tasks and communication workflows; it’s not glamorous, but neither is manually summarizing emails for the hundredth time.

Sources

Quiz sur le document: 10 questions

Loading