Hacker Newsnew | past | comments | ask | show | jobs | submit | supermdguy's commentslogin

Really like the philosophy, and the UI looks clean. You mentioned grammar briefly, but I’m curious if you think that’s also a component that could be learned through the app? One thing that’s nice about Duolingo (despite its flaws) is that it progressively introduces new grammar concepts and uses them in the lessons. Would be cool to have something similar here.


Hey thanks! I do hope to add some additional basic grammar instruction in the future... similar to what is in the grammar guides that I mention in the white-paper


That corresponds to a 10/15, which is actually really good (median is around 6)

https://artofproblemsolving.com/wiki/index.php/AMC_historica...


Isn't the test taken only by students under the age of 12?

Meanwhile the model is trained on these specific types of problems, does not have an apparent time or resource limit, and does not have to take the test in a proctored environment.

It's D- work. Compared to a 12 year old, okay, maybe it's B+. Is this really the point you wanted to make?


Interesting work. Not super familiar with neural architecture search, but how do they ensure they’re not overfitting to the test set? Seems like they’re evaluating each model on the test set, and using that to direct future evolution. I get that human teams will often do the same, but wouldn’t the overfitting issues be magnified a lot by doing thousands of iterations of this?


Just guessing, but the new Opus was probably RL tuned to work better with Claude Code's tool calls


How does your tool library work? Who organizes it? Sounds really interesting.


We have one near my place that I'm a member of, it's run by volunteers. They have stuff outside of tools too (camping/cooking gear). You can view the stuff their inventory before you join: https://toolsnthingslibraryperthwa.myturn.com/library/

The main downside for me is returning the items in the window they're open.


Great question! Patio isn't a traditional tool library—it’s a peer-to-peer platform where anyone can list and rent tools directly from people nearby, similar way to Airbnb. So instead of being run by an organization, it’s the community itself that powers it. We're just making it easy, safe, and fast to share tools locally.


I wonder which is more efficient: to manage tools or manage the need. Rather than putting up a yard sign for "I have a hammer, guys", one that says "hey guys, I need a hammer"


Great point — and thanks for sharing it. We’re actually exploring ways to let people post requests, not just listings, so it's easy to say “I need a hammer” and connect with someone nearby. It’s all about making those timely, local connections simple.


Yes fellow human


These are really good ideas, thanks so much for sharing!


Most providers will just end the chat if it reaches the max context window.


clippings.io has a browser extension that scrapes all your highlights from Amazon's website and lets you download them in various formats.


nice, hadn't heard about them I'll check it out!


Code is available! They have a few different discovered solutions.

https://github.com/google-deepmind/funsearch


Though there are a couple caveats as to what code is available. Quoting from the github:

> implementation contains an implementation of the evolutionary algorithm, code manipulation routines, and a single-threaded implementation of the FunSearch pipeline. It does not contain language models for generating new programs, the sandbox for executing untrusted code, nor the infrastructure for running FunSearch on our distributed system. This directory is intended to be useful for understanding the details of our method, and for adapting it for use with any available language models, sandboxes, and distributed systems.


I don’t care about their sandbox or distributed system. They are irrelevant to the method. The missing language model for program generation is disappointing but I imagine anyone interested in replication, myself included, would prefer to roll their own.


"Mathematical discoveries from program search with large language models" (2023) https://www.nature.com/articles/s41586-023-06924-6 :

> Abstract: Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations) which can result in them making plausible but incorrect statements [1,2]. This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best known results in important problems, pushing the boundary of existing LLM-based approaches [3]. Applying FunSearch to a central problem in extremal combinatorics — the cap set problem — we discover new constructions of large cap sets going beyond the best known ones, both in finite dimensional and asymptotic cases. This represents the first discoveries made for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve upon widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.

"DeepMind AI outdoes human mathematicians on unsolved problem" (2023) https://www.nature.com/articles/d41586-023-04043-w :

> Large language model improves on efforts to solve combinatorics problems inspired by the card game Set.


I love using Anki to stay organized with what I'm learning, but hate starting at a screen for a long time. So I made a system that reads the cards out loud and automatically grades them, using Anki for the card storage/scheduling.

Tech Stack:

Text-To-Speech: ElevenLabs/OpenAI

Speech-To-Text: Faster-Whisper (I used the tiny.en model, about 0.5s latency on my M1 mac)

Language Model: ChatGPT (I used gpt-3.5-turbo-1106)

Others: PyWebView, Silero VAD


The direct prompt comparison isn't quite fair due to the instruction tuning on GPT-3.5 and 4. It'd be interesting to see examples with prompts that would work better for the raw language models.


Yeah it's hard to compare across models, interested in suggestions here.

We give all models a bunch of few-shot examples, which improves GPT-3 (davinci)'s question answering substantially. GPT-2 sometimes generates something that answers the question, sometimes it's just confused. Click "See full prompt" to see the few-shot examples that the models get.

Our goal was to exercise the full capabilities of each model.


I also found the riddle rather odd. I cannot say that 2 is actually the correct answer.

A problem with riddles is that they often have a hidden or secret context. I think especially in our digital age this one is closer to Frodo's "What have I got in my pocket?" "riddle". Here's some other possible solutions. 11+2 = 1. 1 + 1 + 2 = 4, mod 3 and we get 1, so 9 + 5 = 13, mod 3 and we get 1. We could also replace the addition sign with equality and similarly propose a digit summation so 1+1 == 2? True (1). 9 == 5? False (0). There's a hundred solutions to this riddle when it has no context. In fact, I stumbled into the right answer thinking about mod 12 without ever considering a clock until I saw the answer. Maybe I'm just dumb though, I am known to over think.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: