Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Enhanced Support for Citations on GitHub (github.blog)
80 points by chenzhekl on Aug 21, 2021 | hide | past | favorite | 18 comments


> CITATION.cff files are plain text files with human- and machine-readable citation information. When we detect a CITATION.cff file in a repository, we use this information to create convenient APA or BibTeX style citation links that can be referenced by others.

https://schema.org/ScholarlyArticle RDFa and JSON-LD can be parsed with a standard Linked Data parser. Looks like YAML-LD requires quoting e.g. "@context": and "@id":

From https://docs.github.com/en/github/creating-cloning-and-archi... ; in your repo's /CITATION.cff:

  cff-version: 1.2.0
  message: "If you use this software, please cite it as below."
  authors:
  - family-names: "Lisa"
    given-names: "Mona"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Bot"
    given-names: "Hew"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  title: "My Research Software"
  version: 2.0.4
  doi: 10.5281/zenodo.1234
  date-released: 2017-12-18
  url: "https://github.com/github/linguist"
https://citation-file-format.github.io/


Ah great! It indeed seems to be enhanced now, as opposed to a few weeks ago when GitHub CEO Nat Friedman announced initial support https://twitter.com/natfriedman/status/1420122675813441540. I had used the CFF initializer website https://citation-file-format.github.io/cff-initializer-javas... to create my CFF file, but GitHub couldn't parse it. Now it's working!


> YAML

Can't we let this format die? It's both hard for human to format correctly, and for library creator to make a compliant parser.


> It's both hard for human to format correctly, and for library creator to make a compliant parser.

It is actually very easy to format correctly, unless you don't understand the format. While it may challenging to make a compliant parser, many exists for multiple languages.

JSON isn't a viable replacement.. TOML doesn't support root level lists and has its own set of challenges and is a lot more ugly and difficult to understand for complex nested structures.

If you want to kill YAML then replace it with HCL.


TOML is atrocious. In trying to look ini-like its table and nested structures become completely inscrutable.

I only ever deal with it by writing what I need in YAML and running a converter.


Agreed. I think HCL would make a nice replacement though.


What’s HCL?


It seems to be a HashiCorp Configuration Language, HashiCorp being the creators/sponsors of terraform (I don't know much about terraform or hashicorp)


I have the same problems as you. For some reason I can't work in that type of format (similarly with Python).

No beef against those that love it, my brain just can't handle it.


Meh. It’s ok. I’ve used it in many projects and languages while never caring about and being able to find an OSS parser in all languages.


Most people I know have had an issue where they misunderstood what an object in a YAML file is (and therefore did not apply a configuration) or something similar.

Bonus points for mis-editing a kube object (kubectl edit) that you save correctly only to realize that mis-formatted YAML can get thrown away without reporting an error back (looking at you Istio).

And yet people choose it again and again for new projects. Someone must love it given the popularity...


Why not use bibtex directly?


This generates the bibtex code for you.


Doesn't answer the question: why? BibTex, and others, has been around forever. A lot of software already interconvert Ris, Bibtex, Xml...

At least tell me what is your new format trying to solve, like Png solved Gif flaws. My bibliography DB is in bibtex, my citable repos include a bibtex, and I'm not switching to this thing.

What would I do? Create a doc explaining how our service now parses a file called CITATION that could be a Ris or a Bibtex, with the following fields: A,B,C... and that's it.


Parser bugs are often a source of security vulnerabilities. A quick search reveals a torrent of BibTex CVEs [1].

Imagine writing a program than scanned through GitHub repos for citations in order to generate a bibliography. It'd downloading and parsing files created by a bunch of strangers on the internet. You'd want to be pretty confident in the parser you employed.

Therefore, you should use a well maintained BibTex parser, but that's not an option in every language. For instance, the most popular Node JS library hasn't been updated in four years [2].

CFF allows you represent your citation as YAML, which is enormously popular and part of the toolchains of many projects. So, chances are that there is a YAML interpreter available in your language with active community behind it.

As to your question as to how to inform visitors to repo of your citation file's format. The name of the file isn't have to be CITATION, it can be CITATION.cff. Alternately, you can have a CITATION.bib file. Github supports [3] this and you enjoy the same enhanced UI features.

[1] https://www.cvedetails.com/google-search-results.php?q=yaml&...

[2] https://www.npmjs.com/search?q=bibtex&ranking=popularity

[3] https://docs.github.com/en/github/creating-cloning-and-archi...


> A quick search reveals a torrent of BibTex CVEs [1].

Is this a joke?


This seems like an attempt to make GitHub stickier, and should be avoided. Just give people BibTeX to copy and paste, and don’t let Microsoft railroad you into using their formats, yet again.


How about extending this feature to automatically add in the required licenses for the sources of Copilot-generated code?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: