More

kartoolOz · 2025-06-29T00:17:11 1751156231

It's very hyper-parameter dependent, and in my testing didn't provide comparable performance to maxsim.

kartoolOz · on Jan 27, 2025

"कर्मण्येवाधिकारस्ते मा फलेषु कदाचन | मा कर्मफलहेतुर्भूर्मा ते सङ्गोऽस्त्वकर्मणि" - Bhagvad gita, chapter 2, verse 47.

You have a right to perform your prescribed duties, but you are not entitled to the fruits of your actions. Never consider yourself to be the cause of the results of your activities, nor be attached to inaction.

kartoolOz · on July 23, 2024

WebArena does this really well, called the "accessibility_tree" https://github.com/web-arena-x/webarena/blob/main/browser_en...

kartoolOz · on March 5, 2024

The Info extraction and Question Answering metrics are far worse than transformers though.

They also say that in the blog "However, both Based and Mamba still underperform the strongest Transformer baseline, sometimes by large margins. This is consistent with our “no free lunch” observation above"

kartoolOz · on Dec 6, 2023

Alphacode 2 technical paper claims to solve 43% of problems (77 problems from 12 codeforce competitions) performing 85%ile on all human participants.

Caveat is deep in the technical paper,

1) generates 1m candidates from N different prior models (fine tuned on previous codeforces)

2) throws away 95%ile candidates (doesn't fit the test case + no compile)

3) groups semantically similar candidates

4) scores candidates from each group (Based on another scoring model, probably latency + descriptions etc)

5) Picks top 10

Makes 10 submissions and finally gets the score ..

sure this is how humans solve problems ... totally awed by AGI /s

futureshock · on Dec 7, 2023

10 submissions is rather arbitrary. Why not just do 1 billion submissions? AI magic!

kartoolOz · on Dec 6, 2023

Technical report: https://storage.googleapis.com/deepmind-media/gemini/gemini_... Nano-2 is 3.25b, and as per figure 3, nano-2 is roughly 0.6-0.8 as good as pro, and ultra is 1.05-1.3 as good as pro.

Roughly that should put gemini ultra in the sub 100b range?

kietay · on Dec 6, 2023

Those calculations definitely do not scale linearly

kartoolOz · on Dec 2, 2023

Hi, Thanks for Open sourcing the code! I was trying to reuse the code especially the dynamic quantization per channel (int8 on gpu) but couldn't get it to work, i also checked out torchao package but it looks like it has dependency on the nightly channel and SAM's dynamic implementation with triton has other issues, is there any clean implementation of int8 dynamic post-training quantization that you can point too ?

chillee · on Dec 2, 2023

What’s the issue with getting int8 dynamic quantization to work? As in, you’re unable to get it to quantize or to run with speedups?

kartoolOz · on Aug 1, 2023

Would see great improvments in retrieval accuracy by finetuning e5-base-v2 or the newer leaders on mteb benchmark.

kevinlu1248 · on Aug 1, 2023

Definitely. I prefer the sentence-transformers ones since they have been fine-tuned on codesearchnet. I'm also really excited about the latest gte models by Alibaba, their smallest model is the size of MiniLM L6 but beats MPNet.

kartoolOz · on May 6, 2023

Vespa.ai is pretty crazy too, a bit unkown we run a huge vespa cluster serving 1k+ queries with <100ms latency ...

kartoolOz · on April 3, 2023

Hindu - is someone who practices Hinduism and identifies as such, its not a language, that's Hindi.

somethingsaid · on April 3, 2023

Sorry, my bad. I should have double checked. Thanks for pointing it out.