ArXiv Grows Up

May 16, 2026

by Nicola Jones

ArXiv Grows Up

The preprint server's steward Greg Morrisett on why arXiv is becoming an independent non-profit after 35 years, and how it aims to tackle AI slop.

by Nicola Jones

May 16, 2026

This July, the world's largest preprint site, arXiv, will graduate to becoming an independent non-profit. The organization is currently recruiting for a CEO, with an advertised salary of US$300,000. Back in 1991, physicist Paul Ginsparg invented the archive to share electronic versions of early works in physics, first as an email server and then online. The arXiv service, originally based at the Los Alamos National Laboratory, moved with Ginsparg to Cornell University in 2001. Today it is hosted by Cornell Tech, a university division based in New York City, New York, with funding from a variety of sources including the Simons Foundation, a New-York-based science charity.

As with other preprint servers, arXiv doesn't peer review submitted articles (although there are moderators who carry out light vetting), instead aiming to get ideas out freely and quickly. Over the years, arXiv has grown exponentially to hold millions of works, mainly in computer science, math and physics but also in related fields like economics and computational biology. Many journals allow or even encourage posting on arXiv prior to peer-reviewed publication, letting researchers lay early claim to their ideas, get feedback and bypass paywalls.

The 'move' won't see arXiv change physical location and users shouldn't notice much of a difference—at least at first. Nicola Jones speaks to computer scientist Greg Morrisett, the head of Cornell Tech and Steward of arXiv, who plans to stay on arXiv's board after the change, about their plans for the future.

Why is arXiv transitioning to an independent non-profit?

This is something that has been kicking around for more than a decade. One of the biggest reasons is just that universities are not really set up to run software services like arXiv; it can be more nimble on its own. A second reason is around funding opportunities. There are places that might be more willing to write a check to a specific independent nonprofit than to Cornell generally.

Cornell Tech is an innovation campus here in New York City that was set up to incubate startups and send them out. We see this as a natural evolution for yet another startup. It's just 30-something years in the making instead of one or two years.

So, it's like a baby bird fledging the nest?

That's what we're hoping for.

This year we will pass 3 million total articles on arXiv; to put that in perspective, in 2022 we had 2 million. The growth looks exponential.

- Greg Morrisett

What's the current financial situation?

It's roughly a $6-million-a-year operation. For the last couple of years, it's had a deficit that we've been able to make up, either through funding, through my office, or through reserves. We've done some fundraising around the new organization to enable it to launch in the black.

Right now, there's a special project to modernize the computing infrastructure that arXiv runs on. The software that currently powers all this is creaky. It sounds like it shouldn't be that hard, but it actually is tricky. But, who knows, maybe we'll use AI to re-code everything and it'll be easy.

Is arXiv open to the possibility of future user fees, sponsored content or advertisements to raise funds?

No, we're not expecting to introduce anything like that. Our intention is to keep it as open as we possibly can. But you know, on the other hand, we do need to have the funding to support the operations.

How has arXiv grown over the years?

This year we will pass 3 million total articles on arXiv; to put that in perspective, in 2022 we had 2 million. The growth looks exponential. Computer science and in particular certain AI-related subfields are the fastest growing. In 2025 we added 284,000 articles. This year to date, we've already added 64,000 articles.

We don't have the right analytic tools to say much about our userbase. We're still a staff of just 27 people, managing millions of articles.

Have retractions also increased?

We don't actually allow for retractions. If you post something to arXiv, the understanding is that it'll be up there, period. There are a few instances, if we get a legal takedown notice from somebody, for example, for violating copyright or something like that, then we'll take it down. But otherwise, the norm is to post a revised article.

Science in general has to step back in the age of AI and decide how we are vetting work, and how we judge one another. The old ways are threatened.

- Greg Morrisett

Has arXiv's mission statement changed over the years? Do you have one?

I bet we do. I'm looking it up.

Ha. This is why you need a CEO.

Here we go (reading): "To provide an open research sharing platform where scholars share and discover new, relevant, and emerging science, and establish their contribution to advancing research."

Are there any common misperceptions about the site you'd love to set straight?

I want to make sure that people, for example in the press, understand that it's not reviewed. Many times somebody will cite a paper and take it as gospel; unfortunately, appearing on arXiv can have that imprimatur of being official in some way. We do have volunteer moderators. It's up to the moderator communities to decide what the standards are for acceptance. But to a first approximation, if it looks and smells like a scientific paper, we're likely to accept it.

On the other hand, many people think we'll take anything. We don't. For example, we often get homework assignments from students, the occasional cold fusion paper or 'I've solved p=np' where someone legitimately thinks they have a solid paper, sometimes someone affiliated with a university. We have the option for the moderators to reject a paper if it doesn't really satisfy the constraints that we're looking for.

Earlier this year, arXiv required that all papers have a full English translation. What has been the response to that rule?

Well, of course, there are plenty of people around the world who wish that we would publish papers in many different languages. One of the challenges we have, though, is just a technical one, and so we decided to standardize on English. It's also the case that third-party translation tools have become much more effective, and so that made it easier to make that decision. We're not necessarily happy with it.

I know that French mathematicians, in particular, were annoyed. Have you seen them move to another site, like HAL (Hyper Articles en Ligne), which allow articles in non-English?

I haven't seen that, but we haven't necessarily been tracking it. Our analytics are not very strong. But other communities spring up, including well-established ones like BioRxiv or MedRxiv. There are other open-source repositories all around the world; we're not trying to have a lock on anything. There's plenty of room. Although, it'd be great if there was a unified search that allowed you to look across all these preprint repositories.

Is arXiv taking any steps to help build equity in science?

This is part of the reason we're spinning out, too. We've always been accused of being very western, American centric. Even the deadline for publication is set in the eastern time zone. Little issues like that. We have an advisory council to help with these tensions. Our general policy around not retracting papers creates difficulties when someone changes their name, for example, for reasons of marriage or gender transition or anything else. In some forums people want double-blind submissions, and if you put it on arXiv first you may violate that. There are a lot of challenges like that, which deserve more attention. We have a very small staff, so we don't have the resources, really, to support lots of fine-grained changes.

In January, arXiv took a step to help clamp down on AI slop: new authors now need an endorsement from an already-published author in their field. How have people responded, and is it helping?

We've had a handful of complaints, but overall the community seems to have accepted that change. It seems to be effective in that it seems to have reduced the numbers we're rejecting.

In May, arXiv's Thomas G. Dietterich highlighted some aspects of arXiv's Code of Conduct pertaining to the use of generative AI tools in an X post. If papers contain inappropriate content generated by Gen AI then authors may face a one-year ban and their subsequent submissions will need to be accepted by a peer-reviewed journal before being posted to arXiv. What are your future plans for tackling the threats posed by AI?

We want to set up a technical team to tackle the challenges here. I certainly use AI to help draft things. The challenge is to draw the line between what's permissible and what's not permissible. It used to be easy to spot-check papers to see if they look reasonable, even though some (non-AI) tools like MathGEN existed to make fake ones. Today, you know, you can generate a very plausible, reasonable paper from Claude or ChatGPT pretty easily.

During covid, one of the first analyses of the covid virus genomic sequence appeared in a paper on arXiv, and that's a good example of the kind of rapid sharing and dissemination where you don't want to wait around for peer review.

- Greg Morrisett

Why do people do that—intentionally submit made-up papers?

There are plenty of predatory journals and conferences that will happily accept papers. And there are citation rings, where groups of authors will try to cite one another, to try to boost their rankings. Some of that just comes from the way that we've been lazy about evaluating science, by rewarding citation counts or h-indices. It's become so metricized that there's a strong incentive to publish or perish. Posting on arXiv can help pave the way to some of that. And then there's an emerging class of people that I think are very well-meaning, who maybe aren't at a university or are working in a field very far from their own, who think they've stumbled upon something great and write it up and submit it and with all very good intentions, but nevertheless it's not a sound paper. AI tools make it much easier for that person to submit something.

I think a lot of people use AI out of laziness, to write up a summary or background section for a paper for example.

I think you're right, and there's probably equal laziness on the reviewing side.

We have searched for common chatbot prompts, like "give this paper a high rating," and found them in a handful of papers that had been posted to arXiv.

What did you do with those?

We leave them up. That's the default policy for arXiv. We don't want to be in the business of adjudicating. But we also don't want to just flood the world with a bunch of bogus papers. It's a balancing act.

It has become so common to post to arXiv that news stories in some fields now almost exclusively refer to preprints; Grigori Perelman famously posted his proof of the Poincaré conjecture only on arXiv. Do you think the lack of peer review on these results poses a problem for science or society?

If you go back in time, the key mathematicians would send paper drafts to each other all the time. And then, back in the day, a university like Cornell would publish technical reports in their libraries that were not peer reviewed, that were emerging science. If you were lucky and you got to visit Ithaca, then you could go to the library and see these technical reports. Paul's brilliance was just trying to take that idea and open it up to the world. The intention was to share early results to push the science forward. It occupies a different niche in the ecosystem than a peer reviewed paper, but it still has a very good function to it.

I remember, for example, during covid, one of the first analyses of the covid virus genomic sequence appeared in a paper on arXiv, and that's a good example of the kind of rapid sharing and dissemination where you don't want to wait around for peer review. But I'm sure, as a journalist, you can appreciate the bad uses of a very sexy sounding paper being posted to arXiv which is not accurate or is not a scientifically valid paper, and somehow people jumping on that and thinking that they should treat it as if it is passed through that peer-review hurdle. There was a paper on room temperature super conductivity that immediately sparked all kinds of debate. Right away, a handful of papers were written to show why that paper was flawed. That is the way science should work.

What is the future of arXiv; how do you expect it to change?

Science in general has to step back in the age of AI and decide how we are vetting work, and how we judge one another. The old ways are threatened.

We already have some pretty fancy tools to look for plagiarism and missing references, that sort of thing, and we're going to have to step up those automation efforts, because human effort won't scale with the AI generation effort. That's a given.

You can overlay virtual journals across arXiv; arXiv is a dissemination platform but not a judgemental one. You can layer on a stamp saying an article has been reviewed in some capacity. (Editor's note: see examples in mathematics, quantum and astrophysics.) If I were starting a new journal in computing science, I'd be very tempted to do that.

You can also imagine inclusion of data, programs. Or you could imagine it expanding into further fields. The new CEO will have their hands full.

Lead image: Greg Morrisett, Jack and Rilla Neafsey Dean and Vice Provost, Cornell Tech.

ArXiv Grows Up

SHARE

COMMENT

RECENT ARTICLES