Id urge you to try both out again and verify results. The comment is fairly dismissive, yet both do work even behind a cgnat (albeit not under all conditions). If you do find a general solution to the problem, please do share.
I'm super interested in this topic. Recently (and still ongoing) I started on hashing out how to diff large datasets and what that even means.
I would love to get an understanding of how the HN crowd sees diffing datasets should be (lets say >1GB in size).
Are you more interested in a "patch" quality diff of the data which is more machine tailored? Or is a change report/summary/highlights more interesting in that case?
Currently I'm leaning more towards the understanding/human consumption perspective which offers some interesting tradeoffs.
I went the boutique consultancy route just last week and brought on a couple of people into the mix.
I pretty much agree that everybody kind of faces the same question and feelings. To be honest, I might have taken a small leap of faith as I pulled the trigger before securing a client. Though the only reason I felt comfortable with it is that I've been in the space enough to be pretty sure I can land a client in the first month or so (seems like it's happening).
The best advice you can have is 'talk to people'. Most starters think you're BSing them, but nearly everybody fails to leverage their network. You can't go indie fresh from college but after a couple years and some projects it's easy enough.
Talk, talk and talk some more. Don't be shy to send emails and chat people up on LinkedIn. Works wonders once you put yourself out there. Look at your contacts list now and you can surely find at least one that would be able to get you started with some work now or in the very near term.
The reason everyone keeps iterating on the same 'general/bland advice' is that it really is the bread and butter of it. Talk more, can't say it enough. Be honest, be respectful, don't spam, but don't be shy to talk to strangers in your line of work.
Not sure if this helps anyone, but just wanted to say it's easier than most of you think. You need a marketable skill, a minimal network and to talk. If you're doing honest work, things pick up on it's own.
Known pitfalls - there's more to running a consultancy than just talking and working, admin work takes a lot of time as well. Plan for the extras.
I'm curious how well this scales though. Are the kind of clients you're reaching out to small- to mid-size? I've spent a lot of time working at fairly large consulting agencies and the amount of time and resources that go into just pitching seem daunting to a newcomer.
I guess I'm in a good position comming from a Co which scaled from ~30 to currently ~220 before I exited. Nearly 70% is from a growing personal network from a single guy. Another 15% come from direct referals outside that network. The last 15 are a mix of random circumstances and some cold outreach.
Depending how you look at it ie. Building networks or just chatting people up can lead to very different outcomes. The above was mostly focused on the 'getting started' part. Though the 200+ setup is successfully run by a team of 7 or 8.
Things change as you scale and the approach that works on smaller mumbers does not work on largers and vice versa.
Feel free to ping me on email or otherwise if you'd like to chat.
I've had the wonderful opportunity to work on several projects where AI/ML was not just used as a buzzword and marketing gimmick.
The two types of applications I've seen so far generate real value (and thus have monetary value where you can actually earn) are:
- Automate existing processes to either reduce the amount of work needed to be done (feature extraction from images or audio, document parsing) or to introduce a higher level of resolution/response time. For example forecasts for the next day every day or live detection of audio/visual events.
- Generating models to extract signals from massive or complex data. Usually once you're done here you can revert to traditional methods based on the newfound insights. Rarely is there "magic" solutions to optimize away your problems. In general it proves more to be a tool in the box to do analytics than it is a solution.
Either of those create new value and you can put a price on it. There's decent opportunity once you understand what are appropriate use cases.
A fairly long time ago (3-4) years I was tasked to do something fairly similar (though running on Android as the end client). HLS was one of the better options but came at the same costs you describe here. However it was fairly easy to reduce the block size to be less to favor response vs resilience. Essentially you trade buffer size and bitrate switching quality for more precise scrolling through the video and faster start times.
I had to hack it quite severely to get fast load with fair resilience for my usecase as the devices are restricted in performance and can have fairly low bandwidth. Since you're looking at a relatively fast connection, simply reducing the chunk size should get you to the target.
As a follow up - I've spent a couple years working on a video product based on WebRTC. This either works for a PoC where you just hack things together or on a large scale where you have time and resources to fight odd bugs and work through a spectrum of logistical hoops in setting it up. So unless you plan to have a large-ish deployment with people taking care of it I would stick to HLS or other simpler protocols.
I would suggest you go for it!
A very good collegue of mine. He also finished vet school, went on to get a masters degree in bioinformatics and shaped up his DS/ML skills on his own time. He started in a local data analytics company and then transitioned to a full software development company where together we have brought in and started DS & ML efforts. His colourful experience definitely broadened his view and is now what I would label a senior guy and has the perspective and understanding of the business side of things.
In my eyes, Google pioneered the commercial application of this approach. The advent of BigTable and it's underlying colossus storage engine they literally pushed the advent of HDFS and all the BigData tooling from a decade ago.
Seems like this is a common theme amongst people working on side projects. My 5c is that you do two things to get to release:
1) Cut down to all but the essential, think what really means Minimum, Viable and Product
2) Have a task board - its a good motivation to watch how you burn down through it
3) Keep your eye on the higher goal - the boring or hard parts are a means to an end here
As far as personal experience - took me 6 years to launch with the bulk happening in 6 months where I've decided I WILL LAUNCH FFS.
Billing is a PITA and am actually dealing with it at this exact moment. It's crap, always will be, but my motivation is that this is what will aid making money which will keep the project alive, so yeah, means to an end.
Lastly, I will launch if it kills me - I'm still kicking but have probably spent way to much time and nerves on some stuff.
I guess take it as a challenge and look at it like this:
Only the ones that go through it all get to the end, all others just abandon it and fail. Be the one that sees it through.
I used to be interested a lot in these topics (Mainly Pi and Mersenne Primes) back when cloud was not yet a thing and HPC meant building or renting large clusters. Got me into a lot of fun Beowulf cluster stuff which was super fun. Then also all the coding and tuning and fun algorithms.
An interesting story, though a lot of the details I have forgotten by now, I remember back in that time the race was on to get like 1T or something like that digits and there was a Japanese guy that built some sort of super rig for $10-20k at the time and got to 1T only to be beaten by Yahoo to 5T in like a couple weeks from there. There I saw the future of computing in large IT companies having the infrastructure to host large clusters cheaply which were dominated until then by supercomputers and research institutions. Might have gotten some details wrong as it's been like 10 years since then but the idea still stands.