More

ianyanusko · on Dec 13, 2023

With Salesforce specifically, getting a stable and reliable event-driven stream. That's why we ended up adopting a hybrid approach of subscribing to pub/sub events, but also periodically polling the REST API in case pub/sub misses anything.

Salesforce's scattered docs don't make best practices super clear, so designing the syncs took us some time.

ianyanusko · on Dec 13, 2023

Thank you! It means a lot to see your comment here

ianyanusko · on Dec 13, 2023

Hubspot and MySQL are both next on our list for deploying out of beta! I'll shoot you a message when they're out :)

ianyanusko · on Dec 13, 2023

Good question. Our ultimate goal is to be the go-to for any two-way data syncs.

We think the Salesforce syncing market by itself is 500 million to 1B. If you start including all of the tools we have our eyes on (Hubspot, Monday, ERPs), the market size comfortably gets into the billions.

Going after Heroku Connect makes sense as a starting point, but we've got our sights beyond that.

ianyanusko · on Dec 13, 2023

Nice! Great point about the round trip - we do something similar for formulas and auto-generated fields like `Id`. That's awesome you built this in-house.

> Syncing data from Salesforce (to seed a database for example) is done via REST too. It works OK.

Have you thought about using the Bulk API for seeding? We started relying on that instead of REST, which helped us seed massive DBs much faster / more efficiently.

motrm · on Dec 13, 2023

REST isn't too painful yet, but it will be in the not too distant future.

As you know, the REST API will deliver a maximum of 2000 records per request, so beyond a certain scale it's not really tenable in terms of speed & consumption of API calls.

So yes, Bulk API is probably going to feature soon.

Cheers Ian!

ianyanusko · on Dec 13, 2023

1. Yes, on our roadmap for Q1! Getting that request a lot 2. We don't currently sync metadata 3. Our footprint on your Salesforce API depends on whether you're using polling or streaming, and then it depends on the cadence of your syncs or frequency of changes. You can see some data on best/worst case scenarios here: https://docs.usebracket.com/connecting/salesforce_api 4. Yup, we priced based on the amount of data kept in sync. You can see more here: https://www.usebracket.com/pricing

_bry-guy · on Dec 13, 2023

Great, thank you! Syncing metadata may be a dealbreaker for me, but I'll be thinking of Bracket as we build solutions going forward. Cheers.

ianyanusko · on Dec 12, 2023

Nice, I hear you on "fun and aggravating" :)

Sounds similar to the use cases we're seeing, where it's not only easier to process/build on Postgres, but also saves you on the Salesforce API.

ianyanusko · on Dec 12, 2023

> How specific is your solution to Postgres? Could it be ported to another db engine?

Our polling approach is relatively database-agnostic. We just need to handle each DB's quirks with our transformers (e.g. dealing with MySQL's lack of BOOL field types).

Streaming is currently Postgres-specific. We're planning on rolling out support for MySQL next, after we've finished our Hubspot integration. Do you have a specific DB in mind?

> (And, how are conflicts resolved? In a huge system with millions of records coming from everywhere it can fast become nightmarish?)

The primary source wins any merge conflicts that happen within a sync period. With polling, it's pretty straightforward: at every poll, we see how each side has changed, and for any record pairings for which there were edits on both sides, we prefer the primary source.

With streaming, we employ a hybrid method, where we only poll when events occur in either Salesforce or Postgres. If at that poll, the same record has been edited on both sides since the previous poll, we still prioritize the primary source (Salesforce). You can read the step-by-step flow here: https://docs.usebracket.com/streaming#the-streaming-sync-met...

ranting-moth · on Dec 13, 2023

> The primary source wins any merge conflicts that happen within a sync period.

This is a very fancy way of saying that you just drop conflict and pretend they didn't happen. Syncing databases is very, very tricky. Conflicts are a big part of the trickiness.

ianyanusko · on Dec 13, 2023

Agreed on the trickiness! Our early users largely told us they preferred one source to take precedence in a conflict, and would rather set that general rule than review every conflict manually. But a handful have expressed interest in the latter approach, so it's on our roadmap to build.

ianyanusko · on Dec 12, 2023

Agreed! We’re big fans of consolidating in Postgres. We’re also hearing some downstream benefits to having things scaled in Postgres (makes reporting easier, allows the data team to use SQL rather than hitting an API, etc) from our users.

ianyanusko · on Dec 12, 2023

Sorry, forgot to respond to one piece:

> Many clients I can think of this being most useful for would rather host it themselves, is that an option?

Right now you can self-host the associated datasets (like the Postgres event log table), but we're still working on allowing you to self-host the entire service. Stay tuned :)