More

abought · 2025-03-27T15:47:26 1743090446

At various points in my career, I've had to oversee people creating data export features for research-focused apps. Eventually, I instituted a very simple rule:

As part of code review, the developer of the feature must be able to roundtrip export -> import a realistic test dataset using the same program and workflow that they expect a consumer of the data to use. They have up to one business day to accomplish this task, and are allowed to ask an end user for help. If they don't meet that goal, the PR is sent back to the developer.

What's fascinating about the exercise is that I've bounced as many "clever" hand-rolled CSV exporters (due to edge cases) as other more advanced file formats (due to total incompatibility with every COTS consuming program). All without having to say a word of judgment.

Data export is often a task anchored by humans at one end. Sometimes those humans can work with a better alternative, and it's always worth asking!

abought · on Aug 5, 2024

Recently, I attended an hour long meetup from a high-level AWS employee about CI pipelines using CodeCommit. Of several possible deprecation announcements, the latest of those was dated the day of the talk. (!!)

In all fairness to the speaker, except for having one of the most prominent icons across his slide deck deprecated in real time, it would have been a pretty decent talk. He even made an effort to promote the GitHub integrations as a path forward, and provide some guidance on current tooling. It was clear CodeCommit wasn't the path of most momentum, even if the degree was unclear.

abought · on March 19, 2024

A bit of an aside, but I really like "guides to things we otherwise take for granted". So few man pages are built around example use cases, but those are often what make the case for a tool!

A similar spirit to projects like https://github.com/tldr-pages/tldr/ , but this has a lot more useful detail.

The ripgrep author has a blog post on performance and benchmarking that is an interesting read in itself: https://blog.burntsushi.net/ripgrep/

abought · on March 12, 2024

Some departments are starting to hire programmers. There's an effort to define this as a (broad) job category under the heading "Research Software Engineer":

https://society-rse.org/ https://us-rse.org/

Institutional support varies widely; some projects or teams are rather well funded for big projects and senior talent, while at other schools, the cost structure is more aimed at "one off" projects staffed by more recent graduates.

A recent grant is trying to fund this work at several schools with a history of well organized services: https://www.schmidtfutures.org/our-work-old/virtual-institut...

mike_hearn · on March 12, 2024

That's been around for a while but the pay for these roles is very uncompetitive, and universities don't care if professors don't do it. It doesn't make any difference to whether you get published or not so it'll probably remain niche.

Now, if journals start rejecting papers because the code isn't provided or because it was provided but professional software engineers reviewed it and gave it a thumbs down, that'd change. But journals struggle to keep obviously AI generated material out of their pages, so they aren't going to do anything like that.

abought · on March 12, 2024

It depends. Some of the foundation-funded positions are competitive, and a few centers have surprisingly professional leadership. People are trying to organize, and- even if it's an uphill climb- there's been some improvement at the edges.

Anecdotally, some of the RSE leads I've spoken to are seeing more long-term demand than they predicted, which might lead to more room for senior roles. Currently quite a few teams (outside the big centers) seem to be priced way too low, usually explained as because they're testing the waters.... so "cheap student labor" and "one off project" is what they can afford.

Minor heretical aside: one thing I miss about old twitter is that academia was developing a real "second layer" on top of journals, where things like reproducibility could be discussed publicly. PubPeer is a partial solution, as are GitHub issues... if enough gatekeepy people really see value in code quality, norms will shift with or without mandates.

mike_hearn · on March 12, 2024

What's different about current Twitter/X in that regard? I see reproducibility and other science problems discussed there sometimes.

abought · on March 12, 2024

A lot of the audience fragmented for various reasons, and even within the same discipline, not everyone has converged in a new place yet. I'm told mastodon got more CS/phys science people, and Bluesky got more social scientists. That's a shifting landscape though.

There was one obvious place to look for these discussions; now there are many. Changes to search tools and API access didn't help discoverability either.

Some of the departures are for practical reasons: Twitter regularly changes the rules around logged-in viewing, direct links, and promotion/ordering of posts in ways that create friction for people trying to engage in public outreach. ("this method worked yesterday" should refer to the software, not the communication, thank you very much!)

DonHopkins · on March 15, 2024

Possibly something about the way women and blacks and jews and gays and trans people and other minorities are being regularly and systematically and reproducibly attacked and harassed by neo-nazis and MAGA trolls who Elon Musk retweets and supports instead of discourages and bans that turns academics off from Twitter/X.

jltsiren · on March 12, 2024

Research software engineers are paid what they are worth. Typically a bit more than a postdoc, but not that much more. The problem is that the academia is a cash-starved environment, and everyone's work is worth much less than similar work in the industry.

Another problem is the PI-centric model. Most of the funding goes to individual labs. If a typical grant is $200k/year, you are not going to pay competitive salaries. And you're probably not going to hire a software engineer, because then you won't have anyone doing the actual research.

abought · on March 12, 2024

I think that's the rationale behind organizing separate teams as service organizations (mini contractors who serve multiple PIs).

Other option is research "centers" with multiple PIs. Not an option for most fields, but a few, like bioinformatics, can justify both the cost and the shared employee.

abought · on March 3, 2024

Strange but true: I've been to a number of professional happy hours that offered free alcohol, but didn't provide other beverages. It got to the point that I started bringing my own water bottle to networking events, just in case.

I'm a big fan of providing other beverage types. Being able to sip a soda etc from the same kind of container as everyone else goes a long way towards blending in.

chmod775 · on March 3, 2024

>Strange but true: I've been to a number of professional happy hours that offered free alcohol, but didn't provide other beverages.

Failing to provide non-alcoholic drinks would be a faux pas even in Germany. It is indeed strange. I'd go as far as calling it worse than college-party-level planning.

Is nobody expected to drive home after the event?

abought · on March 4, 2024

At networking mixers I've attended, often there are a limited number of drink tickets per person and a set event duration. But the drink tickets only covered alcohol, not other beverages. In some cases, other beverages just weren't an option at all. ("I guess you could find a water fountain? But why?")

Also, one of those recurring events was hosted at a startup that was, to be frank, known for "worse than college" level planning all around...

Eventually events started getting better at providing options. Probably a mix of several factors: the move to another city (local culture), career growth/level of people around me, and changing social patterns. (increasing interest in non alcoholic options/ more people willing to speak up)

abought · on March 2, 2024

The local, in-person meetup scene was an absolute treasure! It didn't really recover after COVID; I'm still trying to find where those networking opportunities moved to.

OccamsMirror · on March 2, 2024

Second this. Local meetups really didn’t survive COVID.

em-bee · on March 2, 2024

give it time, or go start something. in many places local meetups saw a huge revival since last summer

isbvhodnvemrwvn · on March 3, 2024

Around me most attendees are people looking for a job, with zero engagement in any other way.

philomath_mn · on March 4, 2024

I did _one_ virtual meetup. After the presentation we were paired into small breakout rooms. Two of the four guys in my room just gave a quick summary of their background and said they were looking for work -- they had nothing else to say.

(a) it ruined the networking time for everyone else and (b) if I had an opening, I'd be less likely to interview them than before.

em-bee · on March 5, 2024

i haven't been in this particular situation, but people who don't know what to say are all to common. i am not the most sociable guy myself, but i learned to get people talking by asking lots of question, searching for common interests, or simply anything interesting about them. it is a lot of work though, and after such conversations i tend to be exhausted. the irony is that if they didn't want to talk, but get me talking instead, all they would need is to ask a few open questions and then i'd be able to fill the space.

philomath_mn · on March 5, 2024

The other participant and I had a pleasant conversation, but the dynamic was weird for the other two who basically said "give me work" without saying anything else.

em-bee · on March 3, 2024

that sounds frustrating.

abought · on March 2, 2024

Location: Michigan

Remote: Open to hybrid/remote

Willing to relocate: No, but travel ok.

Technologies: Fullstack web + DevOps. Data analysis and workflows. Python. Flask, Django, Django REST framework. Celery, RabbitMQ. Terraform, Packer, Docker, Amazon Web Services (AWS). SQL (MySQL, PostgreSQL, AWS RDS, SQLite). NoSQL (MongoDB). Vanilla JavaScript/ES6 and frameworks including JQuery, D3, Ember.js, Vue.js. PyData stack (Numpy, Scipy, Matplotlib, Pandas, Jupyter), Snakemake, Nextflow, some R, MATLAB (scripting and GUI development). Some Unix/Linux administration. Unit testing and TDD with common frameworks and CI platforms (Pytest, xUnit, Mocha, GitHub actions, etc).

Résumé/CV: https://www.linkedin.com/in/abought/ (PDF on request) GitHub: https://github.com/abought

Email: abought+hnhiring [-@-] gmail [-.-] com

--

Synopsis: Research software engineer and full stack web developer, with experience in a variety of startup and R&D environments (academia, industry, nonprofit). I specialize in building collaborative tools that enable teams to create and understand large, complex datasets. I have lead or made major contributions to a number of widely used tools across multiple R&D fields of study, including features for data harmonization, visualization, and sharing. Most recently I am shifting more of my focus to the backend and working on the next generation of a tool for running a complex data annotation pipeline for large datasets in AWS. I've contributed to every part of the stack, plus some work with DevOps tools. I also maintain human subjects research certifications and have spearheaded successful federal security processes for public tools.

I work hard to help other members of my team be productive and drive continuous improvement. I'm pragmatic and flexible in choice of technologies, and like working with smart, caring people to solve problems bigger than one person can do alone. There's always something new to learn!

abought · on Feb 15, 2024

"Anyone in the training dataset"?

A big unanswered question in the age of AI: how does a system of law work when breaking one law is bad, but the product of breaking many laws is totally exempt?

We're starting to see the milder form of this in debates around authorship and copyright. But when your AI model requires a shockingly large quantity of clearly verboten material as input, what is one to make of the output?

abought · on Dec 2, 2023

I've also had luck with grocery stores. Especially the produce boxes: they have thick walls and are good for moving dishes.

From their side, they are saving the hassle of breaking down and disposing boxes. So for best results, ask them the preferred pick up time and stick to it. (there's a fine line between "less work disposing of boxes" and "tripping over stuff taking up space all day". If people feel appreciated, they can be remarkably kind.)

Oftentimes, deliveries come on specific days- for any store you ask, plan ahead slightly to work with their schedule. Not every store has the room or space to help, but some will try if they can. (Trader Joe's, for example, tends not to waste space on frivolities like storage. Or parking.)

Moving to a new place is a lot! I hope you find good friends and neighbors when you get there. Taking time to build roots before the new job gets hectic can make a big difference to quality of life later.

abought · on May 12, 2021

This looks interesting!

In the past, I've been a big fan of automatic documentation generators (jsdoc, openapi, etc), because keeping a markdown file full of function names and arguments up to date by hand was painful- but I don't like that those systems have little room for prose content like guides or tutorials.

Does Docusaurus support both types of information? The examples I've browsed so far seem to involve hand-edited API docs (example: Babel - https://raw.githubusercontent.com/babel/website/main/docs/pa...). I'd love to see a system that supported building and showing API docs and prose guides in one site, or at least allowed automated cross-linking in a way that could be kept up to date.

vaughan · on May 12, 2021

Combining TypeScript's API Extractor (Microsoft) with Docusaurus is great.

You can mix guides/tutorials with generated API docs.

For example, I really like having simple API docs in the `readme.md` file, but then also in-depth docs elsewhere.

MDX + remark really let you do anything you want. It's the ultimate documentation stack.

[1]: https://api-extractor.com/pages/setup/generating_docs/

acemarke · on May 13, 2021

Do you have any good examples of this in action and showing configuration?

I've been wrestling with this as we work on finalizing the new "RTK Query" APIs for Redux Toolkit. I'd love to have some auto-generated TS API docs embedded in the hand-written Markdown pages. My biggest questions are things like how to present reasonable details on some of our typedefs, which can get insanely complex and readers don't need to see all the internal sub-types.

I recently opened up an RTK issue asking for suggestions on TS API ref integration:

https://github.com/reduxjs/redux-toolkit/issues/1046

slorber · on May 13, 2021

You can embed doc generated by other tools in a Docusaurus site, as a plain page or an iframe. Some people embed Javadoc, OpenAPI or Redoc in Docusaurus.

You can also generate md to make that doc native. I've seen people generating docs from a GraphQL schema for example