Journal on writing a scientific paper
I wrote an article about Clojure, the functional programming language (arXiv, github). It has never been published in a peer reviewed journal. Because this is a personal report I refer to “my paper” throughout although I am not its sole author.
My lessons learned
- I need to know which journals I have in mind when writing a paper
- I need to know people who already published in those journals
- I am prepared for slow academic feedback cycles in the order of weeks
- I make sure that only one single topic can be read into my paper
- I use Clojure on the JVM but hide its Lisp syntax
- I write about a data science topic, not about Lisp
- I write a series of papers
- I make sure that I cite my own papers
Why the paper
I wrote the paper because I wanted to raise awareness for Clojure at CERN, the organisation hosting the particle accelerator near Geneva. I saw an opportunity because the upcoming jank, a Clojure-C++ compiler, was using CERN’s own C++ interpreter.
As I witnessed jank’s author, Jeaye Wilkerson, in heavy discussion with CERN, I thought, well, academics need a paper to justify their work. Having experienced (in the 90’s) the vibrant cafeteria discussions, I was hoping to perhaps plant a seed for a chapter in some thesis going to be churned out between Lake Geneva and Mount Salève. Why not catch up with the US and its IRIS-HEP initiative for data innovation in high energy physics?
In December ‘23, in CERN’s Root forum, I eventually presented the paper which is based on a Clojure compiler called Ferret. At that time I considered Ferret a close predecessor of jank, not the only assumption that turned out to be wrong as by abandoning CERN’s C++ interpreter, jank moved to LLVM IR in October ‘24.
Trying to publish the paper
In the August preceding that forum post, I sent the paper manuscript to the journal Computing in Science and Engineering. For the response I had to idly wait three weeks, but the editors summarised my paper quite accurately:
The manuscript provides a detailed description of the work you performed to develop a functional programing interface to C++ library.
But, whereas my intention was to show off Lisp syntax (and Macros only as the special example case), to the outside reader the paper has two distinct topics: A) Clojure’s functional syntax is better than Python’s, B) C++ interop can be easily done with Macros.
Specifically, the response for topic A) was:
The premise that functional code is more accessible to scientist is a good starting point. However, the paper should better emphasize this point and it should include and discuss more references to support this point.
So in a paper praising Clojure, they wanted more citations of papers praising Lisp languages in general. I am not a computer scientist, so I cannot provide that in a sincere way. I also cannot praise the functional style more than I already did.
According to topic B), i.e. C++ interop, they asked back two questions, the first being:
It is not clear from the current version of the paper, whether this work can be used to generate a functional interface for any arbitrary C++ library. Is the proposal here that C++ library developers will use this capability to provide functional interfaces to their users?
The answer to this question is: yes, indeed Clojure Macros are a great tool for any C++ author to provide interop to his library. But I struggle to highlight this more, not least because the paper was intended about Lisp syntax in general, not only Macros.
The second question on interop was:
Or, if you have the capability of automating the process of creating functional interfaces, then make sure to present and discuss how it is done for different libraries.
And again, yes, in my (admittedly “ambitious”) outlook section of the paper, I propose to generate C++ interop layers in an automated way. However, automation would be a research project entirely on its own, indeed it seems to be the holy grail of C++ ecosystems.
Aside: while automation is out of my scope, I think the key is to provide a “datafied version of the library API”, as also advocated by clong. If the world could agree on a standard how to “datafy” a C++ header, this would be a huge step.
All in all, I figured that nobody seemed to be interested in the specific topic of “Clojure syntax is great” (my main topic), but in automation in general (regardless of the syntax used).
Choosing a journal
As I did not know any journals in the field of computer science, I saw no option but to scout for them in the internet. For example, in the submission requirements of the Journal of Open Source Software, it says: “no thin API clients”. So they would not accept my paper which describes a Clojure wrapper for a specific C++ library. But maybe there is a journal out there which accepts my paper. Who knows.
What I learned in the process is that a paper is not only written for an audience but also for editors. Thus it is good to know some people who have experience in publishing in the targeted journal, because they know for example how ambitious an “outlook” section should be written. It saves time and energy for all people involved.
YAMLScript
Also in writing my paper, in which I wanted (and failed) to show the merits of Lisp syntax, I found out that infix within mathematical formulas is a hard sell. Infix, at least for simple cases, is not essential. But explaining this fact takes up space that should rather be used to solve the problem at hand. Consequently I was not entirely happy with the text, after all I wanted to explain the notion of “code as data”, not explaining away infix.
In looking for a way to write a simpler paper, I discovered the newly proposed language YAMLScript. It looks like Python but is effectively Clojure which in its dynamical way opens the JVM to science.
To me it seems ideal to program in Clojure, but when citing code snippets in papers, it is better to show YAMLScript. This is because it is more readable for the average scientist, yet embraces “code as data” but needs no explaining away of infix syntax.
This necessarily lead me to one conclusion: using Clojure, write about a topic in data science but not, as I did, about Lisp per se.
Aftermath
My motivation for the paper was to magically spark a series of other Clojure papers, written by people of the FIROS-HEP community, standing for “findable, accessible, interoperable, reusable open science in high energy physics”. I since came to think that one needs to secure funding, target a specific journal and write one’s own series of papers.
While being on arXiv, the paper has not been cited anywhere. Note that, however, one cannot publish on arXiv without affiliation (thank you, my star-dust sprinkling co-author).
Concerning the forum post, I got some attention (500+ clicks), but no direct feedback was given.
I have never since visited the CERN cafeteria.
jank became part of the SciCloj mentoring program, so by now Jeaye is firmly embedded in the Clojure community.
Ferret continues to be a great Clojure-to-C++ compiler.
The author of YAMLScript mentioned my paper at the Perl and Raku conference 2024.
In discussions on SciCloj Zulip chat, a Bioinformatics Clojure paper from 2014 came up. Here I see a problem with citations in English language (the currency in academia). According to Google scholar, the paper is cited 5 times (2015 x1, 2016 x2, 2018 x2). It is never cited by the original authors. So, no matter what the paper says about how great Clojure is, this lack of self-citation means one thing: according to the publicly available data, it seems that Clojure has been abandoned by the very people who know it best. Whenever you write a paper about Clojure, make sure to keep citing it.
In closing, I’d like to mention that FAIROS-HEP is purely US based and (most importantly) funded. Europe plays no part whatsoever in this important initiative. This is bad because I think we Europeans, also facing a huge impact on our thinking from Silicon Valley, deserve a counterweight to commercial data-storage. European based CERN is one readily available opportunity in that endeavour.
Conclusion
See lessons learned above.