Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Power Query + Power Pivot + M. I don't use formulas in cells. The sheets are just a canvas for Pivot Tables, final tables, and charts connected to the data from Power Query and Pivot.

I deal with hundreds of API integrations involving various JSON, CSV, TSV, and XML files with mixed validity. My workflow: Notepad++ for a visual check -> Prototype everything in Excel. I give users a "visual", connect it to real data, and only then migrate the final logic to BI dashboards or databases.

Nothing else delivers results this fast. SQL, BI tools, and Python are too slow because they generally need "clean" data. Cleaning and validation take too much time there. In Excel, it's mostly just a few clicks.

PS: I spent 2 years (2022-2023) using LibreOffice Calc. I didn't touch Excel once, thinking I needed to break the habit. In the end, I did break the habit, but it was replaced by a pile of scripts and utilities because Calc couldn't do what I needed (or do it fast enough). The experience reminds me of testing Krita for 2 years (2018-2020) — I eventually returned to Adobe Photoshop (but that's another story).

PS2: About (Query + Pivot + BI). This allows you to process millions of rows (bypassing grid limitations). It also allows you to compress everything into an OLAP cube, taking up little space and working quickly with data.





Interesting. I'm not experienced in data cleaning. About Python vs Excel: Isn't manual cleanning of data in Excel prone to permanent error? Because:

- it's hard to version control/diff

- it's done by a human fat fingering spreadsheet cells

- it's not reproducible. Like if you need to redo the cleaning of all the dates, in a Python script you could just fix the data parsing part and rerun the script to parse source again. And you can easily control changes with git

In practice I think the speed tradeoff could be worth the ocasional mistake. But it would depend on the field I guess.


> - it's hard to version control/diff As I mentioned, this is only prototyping. After that, we move on to implementation in code, knowing what we want to see in the end and understanding the nuances of the data itself.

> - it's done by a human fat fingering spreadsheet cells No one is entering anything into the cells, please reread the message.

> - it's not reproducible. Like if you need to redo the cleaning of all the dates, in a Python script you could just fix the data parsing part and rerun the script to parse source again. And you can easily control changes with git And that's what I said above. That it takes longer. Why use git/python when I can do it in a few clicks and quickly see a visual representation at every step?

> In practice I think the speed tradeoff could be worth the ocasional mistake. But it would depend on the field I guess. Another sentence that shows once again that you haven't read what was written.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: