After late '22 I noticed a surge in superlatives like "vital", "essential", "crucial", and "pivotal" in essays and cover letters from students. ChatGPT uses words like these whenever you ask it to write an essay, and it's a dead giveaway. It's ruined the legitimate uses of the words.
Pick one of the Google Scholar results from the article and you'll find something awful like this (the start of the second paper): "The carriage of goods under international commercial law is a complex and essential aspect of international trade."
> surge in superlatives like "vital", "essential", "crucial", and "pivotal"
This is very interesting. It could also be an effect of Grammarly, which suggests these replacements for the generic "important." Perhaps it's a combination of several effects.
My guess is that ChatGPT does it because RLHF rewards strongly stated opinions, because that's what humans prefer. It's a kind of "sycophantic behavior" that researchers have observed in these models (https://arxiv.org/abs/2310.13548)
To be fair, replacing "important" with some less generic synonym is like one of the first tips in any Copy Editing for Dummies book, and definitely suggested by Grammarly, for example.
Any word can become generic when overused. As more people get access to Grammarly and CharGPT, more of them will apply the same recommendations, turning previously rare words into generics.
Chose your words carefully, if lucky you might find some that only fit you.
Thus therefore to provide an example thereof one may replaces all instances of "important" or "essential" with "splendid", "brilliant" "magnificent","gorgeous" or "sophisticated".
—"The carriage of goods under international commercial law is a sophisticated and gorgeous aspect of international trade."
Heh, or maybe in some general sense, everything is essential, crucial, vital, constantly pivoting, etc., and these are the best words to use. Some day ChatGPT will tell me "I told you so!"
I've been interacting with ChatGPT and other LLMs so much I've already caught myself getting influenced by their word choice in the same way I might start subconsciously adopting the vocabulary of a friend. Food for thought .
"Superlative" can mean not just the grammatical category that includes "*est" words, but also, more informally, adjectives that live at an extreme, as "critical" does within the concept of "importance".
How many of these are in serious journals that actually are read by other scientists and how many of these are in low-quality predatory journals that simply help unsuccessful scientists inflate their publication numbers for a fee?
Looking at the search results, none of these are in "real" science journals. This is just yet another way in which predatory journals are terrible for science and academia (and its public perception), but the existence of garbage journals has been a problem for 2 decades (or more?).
On the bright side, productivity at respected institutions has not been significantly hampered by predatory journals. On the other hand, academia as a whole has obviously failed to teach incoming members that this is basically fraud (partially due to the publish-or-perish culture that is now even endorsed by governments).
When I looked at the examples, very few seemed to be fully ChatGPT generated.
That being said, most scientists are very smart and recognize the power of ChatGPT. They are most likely heavily relying on it when producing summaries, abstracts, hypotheses, etc., and because they can and will edit it you won't be able to tell from words alone.
ChatGPT is an equalizer between native English speakers and scientists speaking English as a second language.
But both groups will use it heavily and nobody can foresee what the net effect of it is on science.
Having had that experience, I can tell you that it takes weeks or months to write up results as a paper and go through the most boring steps of formatting and organizing them into antiquated presentation formats: Intro/Discussion/Results, dumbass citations, labeling, backward wordings ... etc
What should be a five-paragraph discovery on a blog has to take the form of a multipage, heavily formatted document that contains enormous, unrelated, and useless information.
With ChatGPT, you can do it in a day. If you are not using ChatGPT, you lose out.
Note that "false positives" may appear in the Google Scholar results for such queries. For example, a high-profile AI researcher shared a screenshot of https://scholar.google.com/scholar?as_sdt=0%2C5&as_ylo=2023&... (on X/Twitter: https://twitter.com/MelMitchell1/status/1768422636944499133).
The second entry is an ironically written abstract of a talk and the author clearly (and consciously) plays with the cliché. Obviously, most of the fans of the person who shared the screenshot to thousands of followers will not notice this.
Is the adjacency in Scholar limited to intra-sentence fragments? It'd be problematic if not, certainly. Here is another example of the sibboleth: when we use "certainly" here. Is this a query hit?
I'm fine with and approve the usage of LLMs in academia, as long as they provide genuine value and something new to the field. These tools should be embraced when they can augment human intellect.
However, I draw a firm line at using them to generate complete academic works or nonsensical content, as that undermines the integrity of research and renders it devoid of originality. LLMs should serve as invaluable assistants to free up scholars for higher-order analysis, not as replacements for human ingenuity.
You may as well say that you only want dynamite to be used for nonviolent purposes, like demolishing condemned buildings. It doesn't work that way, which is exactly what AI ethicists and researchers have been warning about - AI should be regulated, because once its in somebody's hands, you don't control how they use it, and the repercussions could be much wider than people imagine.
IMO academic papers are too long. The decorum dictates that a core of real, novel content must be surrounded in tons of fluff. Most people don't ever read that fluff (some do, and it's not totally useless, but not to most people).
It doesn't actually bother me if some of that fluff is ChatGPT-generated, provided the author actually read it and accepts the autogenerated content.
I'm having flashbacks from writing my Master's thesis. The experimental part was done in two weeks, then I spent a week describing the results, then a few months going from 15 pages to 80 pages and at least 30 citations, with the latter being surprisingly difficult because of uniqueness of my research topic.
Buckle up, because we're not far away from autonomous research agents. It'll start with computational analysis, plot generation and creation of data discovery documents, but soon models will learn how to request simple physical experiments (sequencing some DNA, mass spec a sample, etc) and plan research based around the outcome.
The problem here is not necessarily the use of chatgpt at all. It's that academic idiom requires content-free linguistic noise for acceptance, and people are quite naturally turning to mechanistic routes to automate out the drudgery.
I know a guy who is currently on a higher technical school. He talks very open how he and all in his class use Chat GPT for all the bloat they have to write
Students at universities do that now. And they get passed. It exposes the whole charade that much of the higher education has become.
The idea behind writing a lot has been that it forces students to think systematically and thereby learn. But they are not typically interested in that, they just need a diploma in order to get a job. Before computers they were copying books verbatim and since plagiarism software they paraphrase books. And now chatgpt removes the middle man - the student. It doesn’t make much sense either way. The source of the problem is really the conflation of higher learning with vocational training.
It's a constant crawl of the literature that uses hand-written phrase detectors. Many of the phrases it looks for are caused by using spinners on plagiarised text, like this one:
"At first, the input picture is transformed into the ruddy, green, and blue organize, and the clamor within the green band is evacuated using a median channel"
Here "noise" has been turned into "clamor". Nonsense text like this is easy to find in journals published by well known brands, in this case Springer Nature. A search for "profound learning" (deep learning) yields over 2000 results:
Crazy how broken the entire publication and review process is, that this stuff goes through without anyone noticing, and some even in well recognized journals.
It's basically organised crime. It's public money funding scientific publishing, and a huge part of the industry is just here to scam public money.
The industry is ripe for public prosecution when an article submitter didn't read it, an article reviewer didn't read it, neither did the publisher yet it is published in a journal with a subscription price in the thousands payed by publicly funded libraries.
So many students on every level of higher education, probably even starting from high school, are now using chat GPT. Does this make education a crime?
Until this point in history there has been a reasonable correlation between the quality of writing and the "quality" of the underlying work in most text. If effort was put into the writing (and learning to write), that in itself used to be a decent indicator that effort and skill also was put into the data/ideas that the text conveyed.
That rule no longer works. People are still going to rely on it for a while though, and I'm worried it's going to break some stuff over the next few years.
I think probably the solution here is to stop using publishing in peer reviewed journals as a job requirement for academics. Even before ChatGPT, academic publishing was already falling victim to Goodhart's law. Remove the perverse incentives and it should cut down on this by quite a lot.
Putting aside the well-known issues with academic publication that this problem highlights, another aspect of modern publications more generally that LLMs are going to disrupt is word bloat. Too many words.
For example, many academic papers are pages and pages of background, discussion, citations, etc. Very often the actual new material can fit on an index card + 1 figure. I understand why all the background is needed, in the abstract, but maybe this traditional way of thinking about incremental academic thought needs to shift in light of the reality of LLMs: LLMs will be used to write needless fluff. Indeed, it's what they're good at.
I'm a patent attorney and I can say that the same thing applies there. Patents are enormous documents filled with legalese, background, description, vague, conditional language, etc. There are good legal reasons for that, but again, often the actual novel invention can fit on an index card.
For that reason, it's self-evident to me that my job will be replaced, at least to some significant extent, by LLMs. But why not just cut out all the needless words? I could imagine patent law that added a "parsimony" requirement to patent filings. Examiners could reject filings that contained extraneous words. Then maybe patent lawyers would still be needed: to write the shortest possible patent that still contained the novelty.
This is just a toy thought experiment, not a serious proposal. My point is that LLMs expose as trivially bankrupt some aspects of academic and patent writing (and probably other places: marketing copy, some nonfiction essay-style writing, etc.).
Maybe LLMs will force us to embrace or even require brevity and conciseness to ensure that human beings stay in the loop.
The upside of all this is that professionals are communicating their points better. Lots of examples from non-English (-language) nationalities using LLM's to write exposition sections for English journals. This lowers the communications friction across the language barrier; and lowers the implicit bias and penalization when people read a (technically correct and useful) paper that has weak grammar, and is confusing to parse.
These aren't purely bad things! I wonder what future social conventions for this kind of thing will look like.
I agree that LLM can be used as a tool by non-native English speakers.
But the examples cited in the article are indefensible. The authors of the paper we either so care less they did not notice the the “certainly, here is…” preamble made no sense in a paper. Or they have such a poor grasp of English that they are unable to comprehend what they copy-pasted to notice. Either way this suggests that nothing in the paper can be relied upon.
Perhaps two people can who don't fluently speak the same language can better communicate. But that's not really the bar is it? These are writing articles on behalf of fluent speakers for other fluent speakers to read. Is it improving THEIR "communication".
I'd say by definition the answer is no, since it's no longer humans communicating with each other at all.
Most of the first results have Cyrillic letters in the Author's name. I think these folks perhaps aren't fluent in English, and think "Certainly, here is a summary of…" is a normal thing people would say. They probably see it as a monologue flourish like "Secondly, here is a summary…", and don't realize it is actually a subservient conversational reply, a dead giveaway of an LLM chat.
I don't have time to properly go through the entire list, of course, but the first paper that I've opened seemed to use ChatGPT to format a table of values, presumably not in tabular format originally. It didn't _seem_ like any of the ideas or conclusions were GPT-created.
I'm perfectly fine with this usage, although it is very sloppy if the authors and reviewers didn't pick the "certainly" in the final text.
You can ask it for code that will convert your input into tabulated output. Easier to check the logic of the code than all of the values in the table, then just plug your data into it.
First hit, in the keywords section (Ключевые слова): "methodology. Certainly / here is the English translation of the provided text"
Second hit: "Certainly! Here is a review of the literature on the carriage of goods under international commercial law:"
Third hit: "Certainly, here is the tabular representation of intraocular pressure (IOP) at 1 and 4 hours for Group X and Group Y:"
Fourth hit: "Certainly, here is a list of both Geo-natural and geosynthetic
materials commonly used in civil engineering and environmental applications."
Fifth hit: "Certainly, here is a concise summary of the provided sections:"
Sixth hit: "Certainly, here is a list of various types of polymers:"
Seventh hit: <not found in paper or abstract>
Eighth hit: "Certainly, here is the essential information provided for reference to the proceedings of the conference on the topic of university book production:"
Ninth hit: "Certainly, here is the methodology for conducting a literature review on the attitudes of healthcare workers toward COVID-19 vaccination:"
Tenth hit: <paid access>
Eleventh hit: "Furthermore, to confirm the advantages of the new model, Certainly! Here is the rephrased sentence with improved language quality and clarity:"
I don't think that all results contain such wording, and surely daghamm's statement can't be disproven by listing these. But there is something that (also) rubs me the wrong way about these sentences getting into research papers.
Re-reading Simon's post, the only word that I think others might find contentous is "huge". Would it be better if it read the search expression turns up "a number" of papers, without a qualifier?
If you actually take time to go and read through them, you can see that the writing pattern is clearly different to that of ChatGPT, contrary to the recent ones:
- "[...] Certainly here is a vast and important project for research."
Versus:
- "Certainly! Here is the text with spaces added after each word:"
Another way to look at the problem of LLM-generated garbage papers is that it doesn't actually change anything, because serious researchers in a given field already didn't trust work coming from unknown labs in shady journals. Researchers need to earn a reputation first by doing all the usual things like going to conferences (to build personal relationships), publishing reproducible results in real journals, making an actual impact on important problems, etc. Basically, LLM papers are just a new contribution to the already existing background noise.
Humans will use LLMs to produce long texts from a brief description, and then other humans will use LLMs to make a summary of that text. Maybe eventually we can make this communication more effective.
Hopefully this leads to a shorter communication form for presenting new results. No need for each paper to have a half-assed intro/review of the field. Since now even the authors skip that part.
Leave reviews for quality papers by top experts in the area.
Damming. What is research even becoming at this stage? I hope open peer review is the golden bullet here, since the paid journals are clearly struggling under the deluge of AI spam
It's not that they're struggling, it's that by and large they don't care. They're making money from publishing scientific papers (or whatever they can be called), and that's all that matters.
N. Usually THE AUTHOR pays the journal to publish anything. However, the professors need publications for their CV (number of published articles is a very important metric in the academic world), to acquire new funds ("give us money to do research on topic X. Look at how much we published on this related topic before, we clearly know what we're doing") or advance in their career (ASA function of the former two points). Students need publications as part of their graduation requirements (in a lot of fields, you are required to have N publications in a peer reviewed venue to get your PhD).
So in both cases, you don't make money directly from publications, but from the prestige they bring and the credibility they're associated with ("this guy must be an expert on X, because they published on the topic").
Understandably, concerns about the quality of LLM-generated content are valid. However, it's crucial to recognize that academic research goes through rigorous processes before publication, including oversight by advisors and peer review. These layers of scrutiny help ensure that any content, whether initially drafted with the aid of AI like GPT-4 or not, meets the high standards of accuracy, reliability, and originality expected in scholarly work. AI tools are used as aids in the research process, not replacements for human intellect, and the responsibility for the final output always rests with the human authors. Hence, the academic community already has strong mechanisms in place to filter out inaccuracies or "bullshit," ensuring that the integrity of published research is maintained.
How can someone be intelligent enough to submit a research paper but stupid enough to fail at taking the most basic steps to hide their AI plagiarism?
What do we even do about something like this? Is there an organization that has the power to go through this list and blacklist each of the authors from future publications? Our society should have a zero tolerance policy for drivel like this. It cheapens the entire institution.
I wouldn't call it plagiarism, but not even reading the output of the LLM before pasting it into your paper shows a lack of professionalism and indicates that the data in that table might as well be the hallucination of an LLM. Not exactly a good look.
I disagree. LLMs fundamentally change the situation: They make it easier to "write" low quality publications and they make it harder to immediately recognize that a publication is low quality.
Before LLMs, one of the main deterrents of low quality work was that the effort required to write a low quality paper is within an order of magnitude compared to the effort for a high quality paper. Using LLMs, however, I can "write" hundreds of low quality papers (that look okay-ish on the surface) in the time it takes me to write a single high quality paper.
I also don't think that researchers publishing high quality papers will benefit that much from LLMs. I tried getting ChatGPT to reproduce an argument I made in my PhD thesis and it took me many tries before ChatGPT even produced something without factual errors. As for padding: Papers should ideally not contain any padding. If a section in your paper is so devoid of actual information that an LLM can write it, you should probably remove it altogether.
> Using LLMs, however, I can "write" hundreds of low quality papers (that look okay-ish on the surface) in the time it takes me to write a single high quality paper.
I've been thinking lately that one could probably leverage the same technology to evaluate the quality of research, possibly as a post-hoc verification.
> I also don't think that researchers publishing high quality papers will benefit that much from LLMs.
Oh, but they do! If not for anything, LaTeX formatting is tedious.
> Papers should ideally not contain any padding. If a section in your paper is so devoid of actual information that an LLM can write it, you should probably remove it altogether.
It is mostly correct, except for perhaps the abstract and some parts that provide structure for the paper (Outline, maybe Conclusion, ...). However, you can always seed an LLM with the bits of information you want explained and use the result as a basis for improvement. It can seed writing: there is no obligation to copy-paste and leave it as is.
> I've been thinking lately that one could probably leverage the same technology to evaluate the quality of research, possibly as a post-hoc verification.
I doubt it. They'll be just as effective as ChatGPT-detectors are. Completely useless.
> If not for anything, LaTeX formatting is tedious.
LaTeX formatting is a very, very, very small part of publishing research. At least it was for me.
> However, you can always seed an LLM with the bits of information you want explained and use the result as a basis for improvement.
That's what I tried to do with the argument in my PhD thesis. It didn't work. Also, at the point where you have meticulously thought through how to structure your argument (and that's what I needed to do to get ChatGPT to produce anything of value at all) you did 95% of the work already. Actually producing the text isn't the hard part of writing.
My point still stands: LLMs help way more if you want to produce low-quality papers than when you want to produce high-quality papers and as a consequence, the percentage of low-quality work will increase.
Hell, better that they DO include the prompt. Since that A: clearly discloses AI usage and B: If someone attempts to duplicate the experiment (if relevant), they can take their data (perhaps give the prompt input, at least a subset showing all structure, in an appendix) and run it through the prompt.
One of my close friends (Ivy League guy) did the same with his PhD thesis on Siamese networks in improving robotic perception. With the final presentation upcoming, in 2-3 days he's got a whole polished thesis ready to go and even convinced his guide that this was what he was working on it since 3 months. He was recently offered a doctorate
I wonder how many people adeptly use GPTs and manage to navigate through their academic landscapes nowadays. I bet its majority
But even then, I don't think this is the gotcha OP thinks it is: LLM generation and summarizing is a legit tool in the paper writing process, especially for conclusions.
In my academic experience, the paper introduction and conclusion is often written last-minute, close to the submission deadline. While a good intro and conclusion are important for the reader, very little time is made available for it since researchers spend maximal time on expanding material findings and experiments. No wonder that assistive writing tools are used to speed things up. As long as proper post-editing and proof-reading is applied.
> As long as proper post-editing and proof-reading is applied for course.
The whole point of the original article is that people aren't doing even that.
Plus in an academic context, it's important to acknowledge when parts of the paper are not your own work, especially in cases where the generated summary may not accurately reflect the actual paper. Saying that ChatGPT has been used to generate summaries would allow people to judge if the summary alone can be trusted to represent the content of the paper.
Exactly. A correct analysis would be to determine the baseline pre-chatGPT, examine if there is a trend (some language "clichés" evolve with time) and then check if there is a clear shift post-ChatGPT.
Pick one of the Google Scholar results from the article and you'll find something awful like this (the start of the second paper): "The carriage of goods under international commercial law is a complex and essential aspect of international trade."