Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious about how is a user of Singer supposed to perform transformations on data (e.g. aggregate records)?

Should they code something, to be plugged between taps and targets?

Do you intend to include such transforms in your solution?



We don't currently have use cases that require heavy transformations (see this blog post I wrote to explain why: https://blog.stitchdata.com/why-our-etl-tool-doesnt-do-trans...).

However, since Singer is built around piping data between applications, your suggestion - to code something that sits between taps and targets - makes perfect sense. The whole "flow" would look like:

$ tap-mydatasource | do-aggregations | target-mytarget

We'd be eager to hear from anyone who tries this approach!


The only thing I'd add from Chris's blog post is that in the workflow we tend to see is that most of the transformations tend to be done after loading into the destination. For example, in Redshift the transformations could be defined in SQL or Python UDFs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: