With Salesforce specifically, getting a stable and reliable event-driven stream. That's why we ended up adopting a hybrid approach of subscribing to pub/sub events, but also periodically polling the REST API in case pub/sub misses anything.
Salesforce's scattered docs don't make best practices super clear, so designing the syncs took us some time.
Good question. Our ultimate goal is to be the go-to for any two-way data syncs.
We think the Salesforce syncing market by itself is 500 million to 1B. If you start including all of the tools we have our eyes on (Hubspot, Monday, ERPs), the market size comfortably gets into the billions.
Going after Heroku Connect makes sense as a starting point, but we've got our sights beyond that.
Nice! Great point about the round trip - we do something similar for formulas and auto-generated fields like `Id`. That's awesome you built this in-house.
> Syncing data from Salesforce (to seed a database for example) is done via REST too. It works OK.
Have you thought about using the Bulk API for seeding? We started relying on that instead of REST, which helped us seed massive DBs much faster / more efficiently.
REST isn't too painful yet, but it will be in the not too distant future.
As you know, the REST API will deliver a maximum of 2000 records per request, so beyond a certain scale it's not really tenable in terms of speed & consumption of API calls.
So yes, Bulk API is probably going to feature soon.
1. Yes, on our roadmap for Q1! Getting that request a lot
2. We don't currently sync metadata
3. Our footprint on your Salesforce API depends on whether you're using polling or streaming, and then it depends on the cadence of your syncs or frequency of changes. You can see some data on best/worst case scenarios here: https://docs.usebracket.com/connecting/salesforce_api
4. Yup, we priced based on the amount of data kept in sync. You can see more here: https://www.usebracket.com/pricing
> How specific is your solution to Postgres? Could it be ported to another db engine?
Our polling approach is relatively database-agnostic. We just need to handle each DB's quirks with our transformers (e.g. dealing with MySQL's lack of BOOL field types).
Streaming is currently Postgres-specific. We're planning on rolling out support for MySQL next, after we've finished our Hubspot integration. Do you have a specific DB in mind?
> (And, how are conflicts resolved? In a huge system with millions of records coming from everywhere it can fast become nightmarish?)
The primary source wins any merge conflicts that happen within a sync period. With polling, it's pretty straightforward: at every poll, we see how each side has changed, and for any record pairings for which there were edits on both sides, we prefer the primary source.
With streaming, we employ a hybrid method, where we only poll when events occur in either Salesforce or Postgres. If at that poll, the same record has been edited on both sides since the previous poll, we still prioritize the primary source (Salesforce). You can read the step-by-step flow here: https://docs.usebracket.com/streaming#the-streaming-sync-met...
> The primary source wins any merge conflicts that happen within a sync period.
This is a very fancy way of saying that you just drop conflict and pretend they didn't happen. Syncing databases is very, very tricky. Conflicts are a big part of the trickiness.
Agreed on the trickiness! Our early users largely told us they preferred one source to take precedence in a conflict, and would rather set that general rule than review every conflict manually. But a handful have expressed interest in the latter approach, so it's on our roadmap to build.
Agreed! We’re big fans of consolidating in Postgres. We’re also hearing some downstream benefits to having things scaled in Postgres (makes reporting easier, allows the data team to use SQL rather than hitting an API, etc) from our users.
> Many clients I can think of this being most useful for would rather host it themselves, is that an option?
Right now you can self-host the associated datasets (like the Postgres event log table), but we're still working on allowing you to self-host the entire service. Stay tuned :)
Salesforce's scattered docs don't make best practices super clear, so designing the syncs took us some time.