More

sanketsarang · on Dec 4, 2021

I have a slightly different view of this. Here are the key reason why Indian's are succeeding:

1. Speaking the facts does not exist in India. Everyone sugarcoats. The successful see-through sugar coating. They are able to see through the true intentions of their peers, allowing them to be better leaders.

2. Indian's do Jugaad from birth. It is a need, as many don't even have stable electricity to light up a study lamp. They find creative ways to live their lives. This Jugaad at an early age gives them the "Power of Unlimited Thinking". A critical component in being a visionary leader. Finding solutions where no one thought it was possible.

sanketsarang · on Dec 4, 2021

This is a groundbreaking achievement by humankind. Even greater than sending people to the moon. Imagine, every time you ate pork, someone received a kidney transplant.

sanketsarang · on Nov 16, 2021

I did work on making a database myself, and I must say that querying 100TB fast, let alone storing 100TB of data, is a real problem. Some companies (very few) don't have much choice but to use a DB that works on 100TB. If you do have small data, then you have a lot of options. But if your data is large, then you have very few options. So it is correct to be competing on how fast a DB can query 100TB of data; while at the same time being slow if you have just 10GB of data. Some databases are designed only for large data, and should not be used if your data is small.

doppelganger1 · on Nov 16, 2021

The larger your data, the more that indexing and maintaining them hurt you. This is why they do much better at larger datasets vs small data sets. It’s all about trade offs.

To overcome this, they make use of cache and if the small data is frequently accessed, the performance is generally pretty good and acceptable for most use cases.

geoduck14 · on Nov 16, 2021

Did anyone else notice the surge of brand new accounts that are appearing on these discussions of Databricks with pro-Databrick opinions?

If we had access to IP address of the posters, I sure would be interested in looking at correlation among them.

doppelganger1 · on Nov 18, 2021

What about my comment above is pro-Databricks? Snowflake works the same way. So do most large scale DW insert Exadata, Netezza, etc...

Does anyone else notice people questioning common sense?

khc · on Nov 16, 2021

with most people working from home, not sure if this heuristic works.

disclaimer: works for databricks, but not on spark, and first time posting in this thread

sanketsarang · on Nov 13, 2021

Thanks Mark. Please do share your experience. Very keen on hearing how you find our project. We are still in beta, so lots of scope for improvement. We are open to any suggestions you might have.

sanketsarang · on Nov 13, 2021

Thanks for your question. Yes, we did research the space a lot before making AutoAI. Here is what we found:

PyCaret: Semi-automatic. You do the first run; then you figure the next set of runs. Ensemble models require manual configuration.

Tpot: Does a great job. Generates 4-5 lines of py code too. But does not support Neural Networks / DNN. So works only for problems where GOFAI works.

H2O.ai: They have an open-source flavor, but the best way to use it is the enterprise version on the H2O cloud. The interface is confusing, and the final output is black-box.

Now there are many in the enterprise category, such as DataRobot, AWS SageMaker, Azure etc. Most are unaffordable to Data Scientists unless your employer is sponsoring the platform.

AutoAI: This is 100% automated. Uses GOFAI, Neural Networks and DNN, all in one box. It is 100% White-box. It is the only AutoML framework that generates high-quality (1000s of lines) of Jupyter Notebook code. You can check some example codes here: https://cloud.blobcity.com

ledell · on Nov 13, 2021

Your list excludes most of well-known open-source AutoML tools such as auto-sklearn, AutoGluon, LightAutoML, MLJarSupervised, etc. These tools have been very extensively benchmarked by the OpenML AutoML Benchmark (https://github.com/openml/automlbenchmark) and have papers published, so they are pretty well-known to the AutoML community.

Regarding H2O.ai: Frankly, you don't seem to understand H2O.ai's AutoML offerings.

I'm the creator of H2O AutoML, which is open source, and there's no "enterprise version" of H2O AutoML. The interface is simple -- all you need to specify is the training data and target. We have included DNNs in our set of models since the first release of the tool in 2017. Read more here: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html We also offer full explainability for our models: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/explain.html

H2O.ai develops another AutoML tool called Driverless AI, which is proprietary. You might be conflating the two. Neither of these tools need to be used on the H2O AI Cloud. Both tools pre-date our cloud by many years and can be used on a user's own laptop/server very easily.

Your Features & Roadmap list in the README indicates that your tool does not yet offer DNNs, so either you should update your post here or update your README if it's incorrect: https://github.com/blobcity/autoai/blob/main/README.md#featu...

Lastly, I thought I would mention that there's already an AutoML tool called "AutoAI" by IBM. Generally, it's not a good idea to have name collisions in a small space like the AutoML community. https://www.ibm.com/support/producthub/icpdata/docs/content/...

pal_ankit · on Nov 15, 2021

Thank you for the feedback Ledell. And congratulations to you on the latest $ 100 million fund raise. Really great to see the space growing.

Avalaxy · on Nov 13, 2021

AzureML and its AutoML are not unaffordable. It's literally a free service. You only pay for any compute you may consume, and for that you only pay the bare VM price. But you don't have to, you can also use your local compute for training.

sanketsarang · on Nov 13, 2021

That is not true actually. You must purchase an Azure VM for training a model. Deploying trained models on your own infrastructure is permitted. Reasonably speaking we would have to call this an enterprise solution, as there is no way to get a trained model without paying Azure fees. Most models will require GPU, and it is not like we can run AzureML on the free GPU offered by Google Colab.

Avalaxy · on Nov 14, 2021

?

https://docs.microsoft.com/en-us/azure/machine-learning/conc...

Not sure where you get your information from. It says it right here, you can use "local computer" for training your model, and that includes AutoML. There is no cost to using Azure ML.

sanketsarang · on Nov 15, 2021

So what we understand from our study is the following:

1. You must have an Azure account with an API key for using their AutoML. An AutoML environment won't get created without a valid key. Both local and cloud runs mandate this.

2. Getting an Azure account requires a credit card on file and comes with a limited-time free trial. This is a big NO NO when it comes to software claimed to be for free use.

3. The free for life services do not call out AutoML anywhere. They do not claim the AutoML environment (a required step) to be free in any form. Check this: https://azure.microsoft.com/en-gb/free/

4. When they say "local" what are they referring to? Run locally on an Azure Notebook, or can I run this locally on my laptop? We have tried and failed ever to get this to run locally on a laptop. So it is not clear whether this is even possible, or the term "local" is misleading.

Have you managed to run Azure ML locally on a laptop, without requiring a connected Azure account?

Yes, if we run the AutoML from our laptop, it uses the API key to create a cloud instance. The data gets uploaded to the cloud and runs on the cloud. Results are thrown back to the local code. We would not call this a local run.

The question is, have you managed actually to use your computer's local resources for training? If so, please do share how this was possible. We would like to know how this was achieved.

sanketsarang · on Nov 12, 2021

Hi HN, we have seen a lot of AutoML frameworks out there. As a Data Scientist myself, I have refrained from using these because at the end of the day, you have to submit complete source code to your clients, not just a functioning model. That is why we created AutoAI. Given data and target (value to predict), it can automatically discover and fully train the best performing AI solution. Still, most importantly, it also goes on to produce high-quality Jupyter Notebook code. AutoAI does Whitebox AutoML. A much-needed feature for Data Scientists. Do give it a try, and let me know what you think.

sanketsarang · on Nov 3, 2021

The first thing I would do is check if your Github repository is showing the traffic source. If your primary documentation is on Github, then you would expect most people to visit it in order to figure out how to use your software.

TheMonarchist · on Nov 3, 2021

I think the increased traffic is due to something more popular now depending on OP's library. Their users won't come to its repository for documentation.

burning_hamster · on Nov 3, 2021

Yeah, that was my initial thought, too. However, why was it only a short spike then? A dev branch that got merged into master by accident? Also, none of the projects that depend on my repo (as listed on github) have a lot of traction.

burning_hamster · on Nov 3, 2021

Yeah, I did. There wasn't much there. A few more people than usual but well within the margin of error. The majority being referred from Google.

sanketsarang · on Nov 3, 2021

Hmm... not really found another way of doing this. Would be very keen on seeing what others are doing.

sanketsarang · on Oct 26, 2021

Yes absolutely. Jupyter Notebooks are a classic example of success in this space. Many Data Scientists use Jupyter over the browser for day to day work. Including me.

Advantages: Allows you to use server resources than be limited by your laptop compute capabilities. Log in anywhere, means I don't have to carry my laptop around. As long as I have a browser, I get the same environment to work. Mental peace that my work and data are always safe. I can drop my laptop in a swimming pool and yet I will not even lose the last character I typed on that presentation I was making.

Disadvantages: Yeah, I cannot access my system while on an aeroplane. Can't think of anything else honestly, unless you live in a zone where the internet is as bad as on aeroplanes.

All in all, prefer the browser-based virtual machine any day. The advantages outweigh the single disadvantage of the need for a stable internet connection.

sanketsarang · on Oct 3, 2021

We offer Jupyter Notebooks on the cloud. We share infra across users, thereby allowing for unlimited runtime on GPUs at a starting price of $75/m.

sanketsarang · on Oct 3, 2021

I am confused. Does it only capture comments created on the tool, or it captures comments from anywhere on the internet?

istarial · on Oct 3, 2021

Strictly the tool for now.

sanketsarang · on Oct 3, 2021

Got it. I see a classic launch problem coming your way. How do you attract the first set of content creators, when there aren't any content consumers? Have you given this a thought?