Hey I'm Victor!
My team and I built CentralMind Getaway just recently, an open-source tool that auto-generates secure and AI-optimized APIs from structured data. It's for those who don't want to expose direct SQL access to their databases to AI-Agents and spend time building these APIs manually.
What it does:
- Auto-generates APIs from your database schema & sample data using AI
- Filters out PII and sensitive data for compliance (GDPR, SOC 2, etc.)
- Optimized for AI with extra meta information for REST and MCP support
- Works with PostgreSQL, MySQL, ClickHouse, Snowflake, BiqQuery
- Bunch of plugins: Telemetry, Auth, Caching, RLS and others
Using ChatGPT and AI assistants over the past year, here are my best use cases:
- Generating wrappers and simple CRUD APIs on top of database tables, provided only with a DDL of the tables.
- Optimizing SQL queries and schemas, especially for less familiar SQL dialects—extremely effective.
- Generating Swagger comments for API methods. Joyness
- Re-creating classes or components based on similar classes, especially with Next.js, where the component mechanics often make this necessary.
- Creating utility methods for data conversion or mapping between different formats or structures.
- Assisting with CSS and the intricacies of HTML for styling.
- GPT4 o1 is significantly better at handling more complex scenarios in creation and refactoring.
Current challenges based on my experience:
- LLM lacks critical thinking; they tend to accommodate the user’s input even if the question is flawed or lacks a valid answer.
- There’s a substantial lack of context in most cases. LLMs should integrate deeper with data sampling capabilities or, ideally, support real-time debugging context.
- Challenging to use in large projects due to limited awareness of project structure and dependencies.
We're curious about your thoughts on Snowflake and the idea of an open-source alternative. Developing such a solution would require significant resources, but there might be an existing in-house project somewhere that could be open-sourced, who knows.
Could you spare a few minutes to fill out a short 10-question survey and share your experiences and insights about Snowflake? As a thank you, we have a few $50 Amazon gift cards that we will randomly share with those who complete the survey.
As per next steps: We plan to test multimodal ChatGPT with image data, perhaps passing the full screenshot of a dashboard with different charts, to improve the model's contextual understanding. As the main constraint when implementing with raw data is the prompt length, data displayed in a visual format may be more condensed and compact.
It is related to the "max_threads" setting of ClickHouse, and by default, it is the number of physical CPU cores, which is twice lower as the number of vCPUs.
For example, the c6a.4xlarge instance type in AWS has 16 vCPUs, 8 cores and "max_threads" in ClickHouse will be 8.
We built a Managed ClickHouse service to help exactly these difficulties with that technology. We are handling sharding, clustering, zookeeper, patching, updates without downtime, and Hybrid storage based on S3. https://double.cloud
- Firebolt (Hard fork of clickhouse)
- Altinity
- Gigapipe
- Hydrolix
- Bytehouse.cloud
- https://clickhouse.com/ ("coming soon")
- TiDB (Their columnstore is a fork of clickhouse)
I stopped tracking after this. I saw a few press releases go by announcing a few others as well which I lost now.
The official Clickhouse Inc. is surely going to be under pressure to pull features out of their open source offering over time to differentiate themselves.