I have a slightly different view of this. Here are the key reason why Indian's are succeeding:
1. Speaking the facts does not exist in India. Everyone sugarcoats. The successful see-through sugar coating. They are able to see through the true intentions of their peers, allowing them to be better leaders.
2. Indian's do Jugaad from birth. It is a need, as many don't even have stable electricity to light up a study lamp. They find creative ways to live their lives. This Jugaad at an early age gives them the "Power of Unlimited Thinking". A critical component in being a visionary leader. Finding solutions where no one thought it was possible.
This is a groundbreaking achievement by humankind. Even greater than sending people to the moon. Imagine, every time you ate pork, someone received a kidney transplant.
I did work on making a database myself, and I must say that querying 100TB fast, let alone storing 100TB of data, is a real problem. Some companies (very few) don't have much choice but to use a DB that works on 100TB. If you do have small data, then you have a lot of options. But if your data is large, then you have very few options. So it is correct to be competing on how fast a DB can query 100TB of data; while at the same time being slow if you have just 10GB of data. Some databases are designed only for large data, and should not be used if your data is small.
The larger your data, the more that indexing and maintaining them hurt you. This is why they do much better at larger datasets vs small data sets. It’s all about trade offs.
To overcome this, they make use of cache and if the small data is frequently accessed, the performance is generally pretty good and acceptable for most use cases.
Thanks Mark. Please do share your experience. Very keen on hearing how you find our project. We are still in beta, so lots of scope for improvement. We are open to any suggestions you might have.
Thanks for your question. Yes, we did research the space a lot before making AutoAI. Here is what we found:
PyCaret: Semi-automatic. You do the first run; then you figure the next set of runs. Ensemble models require manual configuration.
Tpot: Does a great job. Generates 4-5 lines of py code too. But does not support Neural Networks / DNN. So works only for problems where GOFAI works.
H2O.ai: They have an open-source flavor, but the best way to use it is the enterprise version on the H2O cloud. The interface is confusing, and the final output is black-box.
Now there are many in the enterprise category, such as DataRobot, AWS SageMaker, Azure etc. Most are unaffordable to Data Scientists unless your employer is sponsoring the platform.
AutoAI: This is 100% automated. Uses GOFAI, Neural Networks and DNN, all in one box. It is 100% White-box. It is the only AutoML framework that generates high-quality (1000s of lines) of Jupyter Notebook code. You can check some example codes here: https://cloud.blobcity.com
Your list excludes most of well-known open-source AutoML tools such as auto-sklearn, AutoGluon, LightAutoML, MLJarSupervised, etc. These tools have been very extensively benchmarked by the OpenML AutoML Benchmark (https://github.com/openml/automlbenchmark) and have papers published, so they are pretty well-known to the AutoML community.
Regarding H2O.ai: Frankly, you don't seem to understand H2O.ai's AutoML offerings.
H2O.ai develops another AutoML tool called Driverless AI, which is proprietary. You might be conflating the two. Neither of these tools need to be used on the H2O AI Cloud. Both tools pre-date our cloud by many years and can be used on a user's own laptop/server very easily.
Lastly, I thought I would mention that there's already an AutoML tool called "AutoAI" by IBM. Generally, it's not a good idea to have name collisions in a small space like the AutoML community. https://www.ibm.com/support/producthub/icpdata/docs/content/...
AzureML and its AutoML are not unaffordable. It's literally a free service. You only pay for any compute you may consume, and for that you only pay the bare VM price. But you don't have to, you can also use your local compute for training.
That is not true actually. You must purchase an Azure VM for training a model. Deploying trained models on your own infrastructure is permitted. Reasonably speaking we would have to call this an enterprise solution, as there is no way to get a trained model without paying Azure fees. Most models will require GPU, and it is not like we can run AzureML on the free GPU offered by Google Colab.
Not sure where you get your information from. It says it right here, you can use "local computer" for training your model, and that includes AutoML. There is no cost to using Azure ML.
So what we understand from our study is the following:
1. You must have an Azure account with an API key for using their AutoML. An AutoML environment won't get created without a valid key. Both local and cloud runs mandate this.
2. Getting an Azure account requires a credit card on file and comes with a limited-time free trial. This is a big NO NO when it comes to software claimed to be for free use.
3. The free for life services do not call out AutoML anywhere. They do not claim the AutoML environment (a required step) to be free in any form. Check this: https://azure.microsoft.com/en-gb/free/
4. When they say "local" what are they referring to? Run locally on an Azure Notebook, or can I run this locally on my laptop? We have tried and failed ever to get this to run locally on a laptop. So it is not clear whether this is even possible, or the term "local" is misleading.
Have you managed to run Azure ML locally on a laptop, without requiring a connected Azure account?
Yes, if we run the AutoML from our laptop, it uses the API key to create a cloud instance. The data gets uploaded to the cloud and runs on the cloud. Results are thrown back to the local code. We would not call this a local run.
The question is, have you managed actually to use your computer's local resources for training? If so, please do share how this was possible. We would like to know how this was achieved.
Hi HN, we have seen a lot of AutoML frameworks out there. As a Data Scientist myself, I have refrained from using these because at the end of the day, you have to submit complete source code to your clients, not just a functioning model. That is why we created AutoAI. Given data and target (value to predict), it can automatically discover and fully train the best performing AI solution. Still, most importantly, it also goes on to produce high-quality Jupyter Notebook code. AutoAI does Whitebox AutoML. A much-needed feature for Data Scientists. Do give it a try, and let me know what you think.
The first thing I would do is check if your Github repository is showing the traffic source. If your primary documentation is on Github, then you would expect most people to visit it in order to figure out how to use your software.
I think the increased traffic is due to something more popular now depending on OP's library. Their users won't come to its repository for documentation.
Yeah, that was my initial thought, too. However, why was it only a short spike then? A dev branch that got merged into master by accident? Also, none of the projects that depend on my repo (as listed on github) have a lot of traction.
Yes absolutely. Jupyter Notebooks are a classic example of success in this space. Many Data Scientists use Jupyter over the browser for day to day work. Including me.
Advantages: Allows you to use server resources than be limited by your laptop compute capabilities. Log in anywhere, means I don't have to carry my laptop around. As long as I have a browser, I get the same environment to work. Mental peace that my work and data are always safe. I can drop my laptop in a swimming pool and yet I will not even lose the last character I typed on that presentation I was making.
Disadvantages: Yeah, I cannot access my system while on an aeroplane. Can't think of anything else honestly, unless you live in a zone where the internet is as bad as on aeroplanes.
All in all, prefer the browser-based virtual machine any day. The advantages outweigh the single disadvantage of the need for a stable internet connection.
Got it. I see a classic launch problem coming your way. How do you attract the first set of content creators, when there aren't any content consumers? Have you given this a thought?
1. Speaking the facts does not exist in India. Everyone sugarcoats. The successful see-through sugar coating. They are able to see through the true intentions of their peers, allowing them to be better leaders.
2. Indian's do Jugaad from birth. It is a need, as many don't even have stable electricity to light up a study lamp. They find creative ways to live their lives. This Jugaad at an early age gives them the "Power of Unlimited Thinking". A critical component in being a visionary leader. Finding solutions where no one thought it was possible.