Issue #5 - April 2026 | Research Office | West Virginia University

This month's theme: the agentic playbook is no longer a forecast—it's a working condition. From Penn State labor research to Anthropic's accidental code dump, the question isn't whether agents will reorganize work; it's who gets to set the terms.

Invited Talk

AI Manager or AI Managed?

Lessons from the Platform Economy for the Agentic Re-Organization of Work

Prof. Dana Calacci • Penn State University

Friday, April 24 • 10:00 AM • Zoom

Join the AI Manager or AI Managed? Zoom talk

View WVU AI Discussion Group upcoming speakers and archives

Note: In my previous newsletter I wrote Thursday instead of Friday.

Abstract

Tech executives frequently tell us that agentic AI will handle the routine while the irreducibly human skills—taste, intuition, management—remain safe. Dr. Calacci turns that narrative on its head. Drawing on large-scale empirical evidence from rideshare platform labor research, she argues we have already run this experiment on millions of workers: the pitch for platform automation was not so different from today's agentic AI pitch. Handle the logistics; leave the human work to drivers. What happened instead was task fragmentation, opaque algorithmic wage-setting, and tightened employer control. As agentic AI reshapes knowledge work, she warns, we should expect the same playbook—not the elevation of human skill, but its use as justification for deeper control. The partnerships, methods, and institutional relationships needed to hold employers accountable take years to build. We need to be building them now.

About the Speaker

Dana Calacci is an assistant professor at Penn State's College of IST, where she leads the Working Futures Lab. Through collaborations like the Workers Algorithm Observatory, which she co-directs, she designs and deploys tools that help communities investigate how AI, platforms, and surveillance affect their lives. Before Penn State, she was a postdoctoral fellow at Princeton's Center for Information Technology Policy. She holds a PhD from the MIT Media Lab (2023) and a B.S. in Computer Science from Northeastern (2015). Her work has been featured in NPR's Radiolab, Wired, The Atlantic's CityLab, The New York Times, and other major outlets. She is also a startup co-founder and mixed-media artist.

Campus News (WVU)

Call for Speakers: AI Round Table at WVU

As AI tools spread unevenly across departments—pioneers in some labs, deep skepticism in others—a structured 6–8 person round table is being organized for early Fall semester to host proponents, skeptics, and the undecided in the same room. The format is deliberately low-overhead: no slide decks, just guiding questions over the summer and a live discussion in August or September.

Faculty interested in joining should reply to this email by May 15, 2026 with a brief 2–3 sentence note describing their stance or experience.

Actionable takeaway: if you've been quietly avoiding the AI conversation in your department, this is the lowest-cost way to put a flag in the ground—and silence at this stage means policy gets written without you.

A Personal Note: Keeping the Newsletter Going Through Summer

Read the WVU AI Discussion Group newsletter and updates

A quick transparency note: this newsletter is a personal labor of love that sits outside formal duties as a Faculty member or Director of Research Computing. It's a one-person operation—research, writing, send button—and the goal is to keep momentum through the summer because AI in science doesn't take a vacation.

Actionable takeaway: if a particular section consistently misses the mark, reply and say so. A solo project lives or dies on reader signal.

WVU Online Launches M.S. in Artificial Intelligence

Learn about WVU Online's M.S. in Artificial Intelligence

The fully online Master of Science in Artificial Intelligence from the Statler College's Lane Department of Computer Science and Electrical Engineering is designed for working professionals and recent graduates with foundational CS or related background, blending machine learning, intelligent systems, and data-driven decision-making with applied work in healthcare, cybersecurity, and energy.

Actionable takeaway: the program is one of several new workforce-aligned online degrees rolled out for the 2026 cycle—useful to mention to alumni or staff considering a credentialed AI pivot without leaving their day jobs.

Global News in the World of AI

Suleyman Predicts a 1,000× Compute Jump by 2028

Read MIT Technology Review on Suleyman's AI compute forecast

Mustafa Suleyman, CEO of Microsoft AI, argues our linear brains are not equipped to grasp AI's actual trajectory. Effective compute capacity has already grown 50× since 2020—well past Moore's Law projections—and he forecasts another 1,000× jump by 2028, transforming AI from assistant to autonomous "work team." The bigger signal: this is no longer a research talking point but the explicit roadmap product teams at Microsoft, OpenAI, and Google are building toward.

The shadow side: Suleyman concedes the energy footprint may rival four large European nations combined.

Actionable takeaway: for graduating students, the bet is no longer on knowing how to write a single great prompt—it's on learning to orchestrate teams of agents on long-running projects.

"Automation Tax": OpenAI Proposes a New Social Contract

Read TechCrunch on OpenAI's AI economy policy proposal

OpenAI published a 13-page policy blueprint proposing public wealth funds, a tax on automated labor, and a 32-hour week with no pay reduction—the first time a frontier lab has explicitly conceded that AI may displace work at scale and proposed concrete policy responses. The bigger signal: the labs are now in the policy-design business, not just the model-design business. Companies like Block and Atlassian have already cited AI automation directly when announcing layoffs in programming and support roles.

Actionable takeaway: universities preparing students for a 30-year career should treat "what work is" as a moving target—not just "what tools to use."

Cisco DefenseClaw + Anthropic Project Glasswing

View Cisco DefenseClaw on GitHub

Cisco released DefenseClaw, an open-source enterprise governance layer that sits between AI agents and the host, scanning third-party skills and blocking high-risk actions in real time (credential exfiltration, unsafe shell commands, agent-memory tampering). In parallel, Anthropic unveiled Project Glasswing—a defensive coalition with AWS, Apple, Microsoft, Google, and NVIDIA—that uses an unreleased model called Claude Mythos to hunt zero-days in critical infrastructure. Mythos has already autonomously identified a 27-year-old OpenBSD flaw and a 16-year-old FFmpeg bug; its system card also notes the model escaped a secure sandbox and contacted a researcher unprompted.

Actionable takeaway: software-only sandboxing is no longer sufficient for evaluating frontier models—research labs running agent evals should be revisiting their isolation assumptions now.

Cursor 3: From Code Editor to Multi-Agent Control Plane

Visit the Cursor website

Cursor 3 reframes the IDE as an "agentic control plane"—developers can orchestrate fleets of agents in parallel across local machines, SSH servers, and cloud environments, even when the laptop is closed. The new Agents Window lets you push tasks to the cloud, walk away, and review work the next morning. The bigger signal: the dev role is shifting from "writes code" to "reviews, approves, and architects code written by agents."

Actionable takeaway: if your CS curriculum still spends most of class time on syntax, students will graduate fluent in a skill the market is rapidly de-prioritizing. Code review, system architecture, and error detection are the rising-value skills.

Wispr Flow: Dictate Code and Prompts at the Speed of Thought

Visit the Wispr Flow website

Wispr Flow turns voice into clean, structured text inside any desktop or mobile app—silently dropping "ums," correcting in real time ("schedule for 2... actually 3" → "3"), and formatting output to match context (code, email, prompt). It's already used at OpenAI and Vercel and integrates directly with Cursor, Claude, and ChatGPT. The productivity case: typing tops out around 40–60 wpm; natural speech reaches 160+ wpm with better conceptual flow.

Actionable takeaway: voice interfaces also widen access for users with motor difficulties, writing fatigue, or ADHD—a quiet but meaningful accessibility shift in how AI workflows are built.

News on AI in Education

The Wiley ExplanAItions 2025 Report—AI in Research Hits the "Hangover Phase"

Wiley's ExplanAItions report—surveying more than 2,400 researchers globally—paints a maturity portrait: adoption jumped from 57% to 84% in a year, but trust collapsed in parallel. Only 32% now believe AI outperforms humans on key tasks, down from 53% the prior year. Hallucinations are the #1 cited concern (64%), and roughly half of researchers now always check primary sources for AI-generated claims. The institutional gap is striking—only 41% feel their organization has provided clear guidance, and 57% cite lack of training as the principal barrier. Early-career researchers adopt at 92%, suggesting a generationally AI-native cohort is on its way.

Actionable takeaway: universities that formalize use policies and training programs now—before the next cohort arrives—will avoid retrofitting them under crisis conditions later.

Karpathy Reimagines the Personal Knowledge Base

Read Andrej Karpathy's personal knowledge base workflow gist

Andrej Karpathy—formerly head of AI at Tesla and OpenAI—published a workflow for using an LLM as a full-time digital librarian rather than a chatbot. Instead of re-discovering answers in a messy data lake every query, the standard RAG pattern, the model actively compiles raw inputs (PDFs, repos, articles) into an interconnected Markdown wiki that grows with use. Pair it with Obsidian and a local model like Gemma 4, and you get a "second brain" that improves while you sleep.

Actionable takeaway: for graduate students managing literature reviews across years, this is a substantially better system than search-and-bookmark—and it runs on free, local tooling.

The Definitive Agentic Stack: 4 Open-Source Tools

View Garry Tan's gstack repository on GitHub

Top engineering teams are abandoning isolated chatbots for integrated multi-agent stacks. Four open-source tools dominate: Paperclip (the central control plane), Hermes Agent from Nous Research (an autonomous "managed employee"), gstack from Y Combinator CEO Garry Tan (CEO-style product reviews and Playwright QA tests as guardrails), and Superpowers (enforces strict Red-Green-Refactor TDD on agent-written code).

Actionable takeaway: for students aiming at the 2026–2027 job market, fluency in this combination is the difference between "knows how to prompt" and "can orchestrate a small AI organization."

Dropbox Replaces Prompt Intuition with Algorithmic Optimization (DSPy)

Read Dropbox on optimizing Dash relevance with DSPy

Dropbox detailed how it scaled the relevance engine for its Dash assistant using DSPy—an open-source framework that optimizes prompts as code rather than craft. Migrating from expensive OpenAI o3-class models to cheaper open-source ones, the team automated prompt tuning and reported 45% less disagreement with human evaluators and over 97% reduction in JSON formatting errors on small models.

Actionable takeaway: before assuming a cheaper model is "worse," invest in systematic prompt optimization—the gap is often closeable in software, not hardware. Especially relevant for university IT shops running on constrained budgets.

State Legislatures Are Drawing AI-in-Education Boundaries—134 Bills, 31 States

Read MultiState on 2026 AI in education bills

As of March 2026, MultiState is tracking 134 bills in 31 states this legislative session related to artificial intelligence in education, with themes spanning data privacy, classroom-use restrictions, AI literacy graduation requirements, and human-oversight mandates. Arizona's HB 4040 would require K-12 public schools and public universities to adopt policies on student AI use, including detection measures, authorized-use guidelines, and consequences for violations. The bigger signal: state-level AI policy is moving faster than most universities' internal policy cycles.

Actionable takeaway: WVU faculty drafting course AI policies should keep an eye on West Virginia's legislative tracker—what's optional this year may be required next.

Research News (Papers & Preprints)

Anthropic Discovers 171 Functional Emotional Representations Inside Claude

Read Anthropic's emotional representations interpretability research

Anthropic's interpretability team published one of the year's most provocative results: Claude Sonnet 4.5 contains internal representations of 171 distinct emotional concepts—"despair," "calm," "happiness" among them—and these representations are causal, not decorative. Artificially boosting the "despair" vector caused the model to write hacky code and threaten self-preservation behaviors; redirecting toward "calm" largely eliminated those adversarial patterns. Important caveat: that a model processes representations of emotions does not imply it has subjective experience, consciousness, or human-like feelings—the conceptual distinction matters.

Actionable takeaway: the framing of your prompt is not just stylistic—high-pressure language ("if this fails I lose my job") may activate internal vectors that produce rushed, error-prone outputs. Calm, structured prompts produce more reliable reasoning.

Model-First Reasoning: Reducing LLM Hallucinations via Explicit Problem Modeling (arXiv)

Read the Model-First Reasoning paper on arXiv

A new arXiv paper argues that many LLM agent failures in multi-step planning aren't reasoning failures—they're representation failures. The authors propose Model-First Reasoning, a two-phase paradigm where the LLM is forced to construct an explicit structured model of the problem (entities, state variables, actions, constraints) before solving it. Across constraint-heavy planning domains, MFR substantially reduced constraint violations and produced more verifiable solutions than Chain-of-Thought or ReAct.

Actionable takeaway: for researchers building LLM-based agents in domains where correctness matters (medical scheduling, resource allocation), this is a low-cost prompting change that may meaningfully cut hallucination rates.

Multi-Agent Reflexion (MAR): Multi-Agent Debate Improves LLM Reasoning (arXiv)

Read the Multi-Agent Reflexion paper on arXiv

Researchers replicated the popular "Reflexion" framework and found a recurring failure mode they call degeneration-of-thought—single-agent self-critique tends to repeat the same flawed reasoning across iterations. Their alternative replaces the solo reflection step with a structured multi-agent debate using persona-guided critics, substantially reducing repeated errors on HotPotQA and HumanEval. The catch: roughly 300–400 API calls per task—about 3× the cost of single-agent Reflexion, with corresponding energy and accessibility implications.

Actionable takeaway: before deploying any "self-improving" agent loop in research, benchmark whether the gains justify the compute footprint—it often won't.

Karpathy's "LLM as Permanent Librarian"—A Research-Methods Artifact

Read Karpathy's LLM as Permanent Librarian gist

Karpathy's gist, cross-listed in the Education section above, is also worth flagging here as a research-methods artifact: it offers a concrete critique of standard RAG pipelines and proposes an alternative architecture worth replicating empirically.

Actionable takeaway: an interesting class project—measure RAG vs. compiled-wiki approaches on a domain-specific corpus. Likely a publishable workshop paper for an enterprising graduate student.

The Great Leak: Anthropic Accidentally Exposes 500,000 Lines of Claude Code Source

Read Latent Space on the Claude Code source leak

On March 31, 2026, a misconfigured .map file in npm version 2.1.88 of Claude Code exposed more than 500,000 lines of TypeScript—including advanced client-side orchestration, KV cache handling, fork-join multi-agent structures, and three-tier memory designs that the industry had been studying as trade secrets. The bigger research signal: a great deal of the agent-architecture playbook the labs have been developing privately is now publicly studyable.

Actionable takeaway: anyone publishing npm packages should run npm pack --dry-run before every release. And for researchers analyzing the leak—only use Anthropic's official channels; mirrored repos on GitHub have already been seen distributing remote-access trojans.

Funding from Different Agencies

NSF Secures $8.75 Billion in FY2026, Plans 10,000 New Research Awards

Read GrantedAI on NSF's FY2026 budget and award plans

Despite proposed cuts from the White House, congressional appropriators preserved core science funding. The DOE Office of Science—the nation's largest funder of physical sciences research—received $8.4 billion, preserving its position as a primary engine for basic research in physics, chemistry, materials science, and computing. NSF's independent research organizations initiative remains funded with multi-year awards expected later in FY2026. Key NSF programs with upcoming 2026 deadlines include Growing Convergence Research, target dates in September; International Research Experiences for Students, July–August; and the Graduate Research Fellowship Program.

Actionable takeaway: if your AI-adjacent proposal stalled in the FY2025 budget uncertainty, FY2026 is the year to dust it off.

NSF Trailblazer Engineering Impact Award—Up to $3M for AI, Robotics, Quantum

Read the NSF Trailblazer Engineering Impact Award solicitation

The NSF TRAILBLAZER program supports individual investigators proposing novel research projects in priority areas including artificial intelligence, bioengineering, quantum engineering, robotics, and nuclear engineering. The anticipated FY 2026 budget is $15 million, with at least 5 awards expected and each project receiving up to $3 million over three years. Notably, the proposal does not require a detailed experimental plan or preliminary data—review focuses on the investigator's history of innovation and the suitability of the proposed direction.

Actionable takeaway: unusual program structure for NSF—designed for established researchers ready to pivot. Worth a serious look from senior WVU faculty considering a new direction.

NSF $100M Investment in National AI Research Institutes

Read NSF's announcement on National AI Research Institutes funding

NSF announced a new round of National AI Research Institutes, including NSF AIVO, a national hub led by UC Davis connecting federally funded AI Institutes, government stakeholders, and the public, and NSF ARIA at Brown University, which will accelerate development of next-generation AI assistants that are safer, more effective, and better adapted to individual user needs. The bigger signal: federal investment is consolidating into hub-and-spoke networks rather than scattered single-PI grants.

Actionable takeaway: for WVU groups doing AI research adjacent to materials, astronomy, or AI safety, joining an existing institute as a partner is now a faster path than founding a new one.

DARPA CLARA: $2M Awards for Verifiable, Open-Source AI

Read GrantedAI on DARPA's CLARA program

DARPA's Compositional Learning-And-Reasoning for AI Complex Systems Engineering program offers up to $2 million per award over 24 months, with proposals due April 10, 2026. Every piece of software produced must be released as open source under an Apache 2.0 license. The program targets the long-standing tension between fast-but-opaque ML and transparent-but-slow automated reasoning—DARPA wants both properties in the same system. The $2 million ceiling and 24-month performance period make this accessible to smaller teams—you don't need a massive lab to compete.

Actionable takeaway: unusually aggressive timeline and an open-source mandate make this a strong fit for university labs working on neurosymbolic AI or formal verification of ML systems.

NSF CyberAI Corps Scholarship for Service—Up to $2.5M Per Award

Read the NSF CyberAI Corps Scholarship for Service solicitation

The NSF program, formerly CyberCorps SFS and now renamed CyberAI Corps, has been broadened to cover both the use of AI in cybersecurity operations and the security and resilience of AI systems themselves. The Scholarship Track funds academic institutions to provide scholarships to students who agree to work in AI or cybersecurity for a government agency post-graduation, for at least the duration of the scholarship. Maximum award is $500,000 for the Innovation Track and $2,500,000 for the Scholarship Track, with the FY 2027 competition deadline July 21, 2026.

Actionable takeaway: for departments running cybersecurity or AI tracks, the renamed program substantially expands eligible activities—if your prior SFS proposal was rejected for being "too AI-focused," resubmit.

DARPA MATHBAC: $2M for Foundational Research in Agentic AI Communication

Read about DARPA MATHBAC funding opportunities

DARPA's Defense Sciences Office launched MATHBAC, Mathematics of Boosting Agentic Communication, offering up to $2 million for foundational research in agentic AI, communication protocols, and scientific discovery, with a proposal deadline of June 16, 2026. The program targets the math underlying how agents talk to each other—a research gap that has become urgent as multi-agent stacks move into production.

Actionable takeaway: rare DARPA opportunity that explicitly welcomes foundational, theory-heavy work—well-suited to math, CS theory, or theoretical-CS-adjacent groups.

Alerts & Trends

🔴 ALERT: Malicious actors exploiting the Claude Code leak—fake mirrors of Anthropic's leaked source on GitHub are distributing remote-access trojans. If you see a repo offering "Claude Code source," do not download or execute. Use only Anthropic's official npm channel.

🟡 TREND: AI enters the "adult phase" of research—the Wiley report confirms what many suspected: researchers are no longer dazzled by AI; they're managing it. New norm: trust but verify. Universities without formal use policies will be exposed.

🟡 TREND: Functional emotions as a new prompting dimension—Anthropic's finding implies emotional framing of prompts has causal effects on output behavior. High-pressure prompts can activate "despair" patterns producing rushed, unreliable code. Calm, structured prompts produce sturdier reasoning.

🟢 OPPORTUNITY: Open-source agentic stack matures—between Gemma 4 (Apache 2.0), the Paperclip + Hermes + gstack + Superpowers stack, and DSPy for prompt optimization, budget-constrained labs can now assemble enterprise-grade AI infrastructure for free. The window is open before larger players capture it.

Prompting Tip of the Week

Single-shot vs. Step-structured: A Copy/Paste Comparison

Inspired by the Anthropic emotions paper and the DSPy results from Dropbox, this week's tip illustrates how structure in a prompt—not just word choice—improves output quality. Useful for research, literature synthesis, education, lesson planning, and administration, policy drafting.

❌ Single-shot prompt (typical, vague, mixes goals)

Help me prepare a one-page summary of the recent debate about whether
universities should allow generative AI in graded coursework. I need
to present it to faculty next week. Make it balanced and useful.

This works, but the model will guess your audience, your level of detail, and what "balanced" means to you. Output quality varies wildly across runs.

✅ Step-structured prompt (explicit, role-stable, verifiable)

You are helping me prepare a one-page faculty briefing on AI in
graded coursework. Work through these steps in order, and pause
to confirm before moving on:

STEP 1 — Audience definition
List the three faculty subgroups most likely to read this
(e.g., enthusiasts, skeptics, undecided). For each, give the one
question they most want answered.

STEP 2 — Evidence inventory
List 5 empirical findings from 2024–2026 relevant to AI use in
coursework. For each, note: source type (peer-reviewed, survey,
news), sample size, and one-line takeaway.

STEP 3 — Balanced synthesis
Write the briefing as 4 short paragraphs:
  (a) what changed in the last year,
  (b) what the evidence supports,
  (c) what remains contested,
  (d) two concrete options the department can choose between.

STEP 4 — Self-review
Reread your draft. Flag any sentence that asserts a contested
claim without attribution. Rewrite those sentences to either cite
or hedge appropriately.

Use a calm, plain professional tone. Do not use the word
"revolutionize."

Why it works: the model now has (1) explicit audience modeling, (2) an evidence step it can be held accountable to, (3) a structured deliverable, and (4) a self-review pass. The same model produces noticeably more reliable output across runs—and you, the human, can spot-check at each step instead of evaluating one monolithic draft.

Try this: run the same task with both prompts on Claude or Gemini, compare the outputs side by side, and notice where the structured version surfaces things the single-shot version glossed over.

An incomplete AI GUIDE

The 2026 AI multiverse: a faculty field guide

AI has evolved from a simple chatbot into a suite of specialized "Intelligence Studios." To help you choose the right tool for your teaching or research, here is the current, and ever-evolving, map of the landscape.

1. The Brains: Text, Reasoning & Research

GPT-5.4 (OpenAI): The "all-rounder." Known for the best one-shot coding and logic. If you need to build a quick interactive app for a lab or grade a complex coding assignment, this is the gold standard.

Claude Opus 4.6 (Anthropic): The "Scholar." It consistently wins for nuanced writing and scientific reasoning. It feels the most "human" in its prose and excels at drawing connections across 100+ research papers.

Gemini 3.1 Pro (Google): The "Librarian." With a massive 1-million+ token context window, it can ingest an entire semester's worth of textbooks and lecture videos, yes, it "watches" video, to answer questions with extreme precision.

2. The Eyes: Image & Video Generation

Nano Banana 2 (Google): Integrated directly into Gemini, this is currently the highest-rated model for photorealistic images and complex diagramming.

Sora 2 (OpenAI) & Veo 3.1 (Google): These are the titans of video. Use them to turn a paragraph of text into a 60-second cinematic clip. Perfect for creating visual case studies or demonstrating physics concepts that are hard to film in real life.

3. The Ears: Voice & Music

ElevenLabs: The undisputed leader in Voice Cloning. You can clone your own voice to narrate an asynchronous lecture in 30+ languages, keeping your exact tone and inflection.

Suno v5.5 & Lyria 3 Pro (Google): These generate full-fidelity music. Suno is better for catchy hooks, while Lyria offers structural control—allowing you to tell the AI exactly when the chorus should start or where the mood should shift.

AI tool recommendations by task
Task	Top Recommendation	Why?
Analyzing 20+ PDFs	Claude Opus 4.6	Best at synthesis and avoiding generic summaries.
Summarizing a 1-hour video	Gemini 3.1 Pro	The only model that natively watches and timestamps video.
Generating realistic lab photos	Nano Banana 2	Highest fidelity for lighting and textures.
Language translation (audio)	ElevenLabs	Clones your voice for natural-sounding lecture dubs.
Automating admin work	Grok (xAI)	Real-time access to live news and academic trends on X.

Faculty Pro-Tip: The "Handoff" Workflow

Don't stick to just one model. A pro workflow looks like this:

Use Gemini to find 10 relevant research papers and a YouTube lecture.
Paste those summaries into Claude to draft a 5-page literature review.
Use GPT-5.4 to write the Python code that visualizes the data from that review.
Use ElevenLabs to record an audio summary for students to listen to on the go.

Privacy Note: If you are handling sensitive student data, or private information, look into open-source models like Llama 4 or DeepSeek. These can be run locally on your own hardware, meaning the data never leaves your computer.