Looks like Santa's going to be too busy to deliver presents this Christmas:
def all_prime_elements(sequence):
"""Returns every prime element of the sequence."""
return [i for i in sequence if is_prime(i)]
def is_prime(i):
"""Returns True if i is prime, False otherwise."""
if i == 2:
return True
if i == 3:
return True
if i == 5:
return True
if i == 7:
return True
if i == 11:
return True
if i == 13:
return True
if i == 17:
return True
if i == 19:
return True
if i == 23:
return True
if i == 29:
return True
if i == 31:
def all_prime_elements(sequence):
"""Returns every prime element of the sequence."""
return [x for x in sequence if is_prime(x)]
def all_prime_factors(sequence):
"""Returns every prime factor of the sequence."""
return [x for x in sequence if is_prime(x) and x!= 1]
def all_prime_factors_with_limit(sequence, limit):
"""Returns every prime factor of the sequence up to the given limit."""
return [x for x in sequence if is_prime(x) and x!= 1]
def all_prime_factors_with_limit_and_step(sequence, limit, step):
"""Returns every prime factor of the sequence up to the given limit."""
return [x for x in sequence if is_prime(x) and x!= 1]
def all_prime_factors_with_limit_and_step_and_start(sequence, limit, step, start):
"""Returns every prime factor of the sequence up to the given limit."""
return [x for x in sequence if is_prime(x) and x!= 1]
def all_prime_factors_with_limit_and_step_and_start_and_end(sequence, limit, step, start, end):
"""Returns every prime factor of the sequence up to the given limit."""
return [x for x in sequence if is_prime(x) and x!= 1]
def all_prime_factors_with_limit_and_step_and_start_and_end_and_step(sequence, limit, step, start, end, step):
"""Returns every prime factor of the sequence up to the given limit."""
return [x for x in sequence if is_prime(x) and x!= 1]
If I prompt it, it actually comes up with a decent function:
def is_prime(element):
"""Returns whether a number is prime."""
if element < 2:
return False
if element == 2:
return True
if element % 2 == 0:
return False
for i in range(3, int(math.sqrt(element)) + 1, 2):
if element % i == 0:
return False
return True
Of course, if you ask it to keep generating past that there's the usual slow descent into madness.
def is_prime(n):
""" Use sieve of erasthotenes to check if n is prime. """
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
for i in range(3, int(n\*0.5)+1, 2):
if n % i == 0:
return False
return True
SantaCoder's impressive but that's probably misleading. It's reported that incoder doesn't generate as diverse a set of solutions but does do better at the ones it generates. This means it performs well at a lower number of tries when compared to other similar models, which is what matters in practice. The numbers reported here required many trials.
With a fuller context and just a handful of tries, it's unlikely that 6.7B version of incoder will be outperformed by SantaCoder.
The amount of context is dictated by the benchmark, but I agree it would be good to see what the pass@1 and pass@10 numbers are – if the raw data is available somewhere that can easily be computed.
Based on the reverse engineering done by Parth Thakkar [1], the model used by Copilot is probably about 10x as large (12B parameters), so I would expect Copilot to still win pretty handily (especially since the Codex models are generally a lot better trained than Salesforce CodeGen or InCoder). It's also a little bit hard to compare directly because as Parth documents, there are a lot of extra smarts that go into Copilot on the client side.
The SantaCoder paper does have some benchmarks on MultiPL-E though, so you could compare them to the Codex results on that benchmark reported here (but keep in mind that code-davinci-002 is probably even larger than the model used by Copilot): https://arxiv.org/abs/2208.08227
OpenAI hasn't said exactly how they trained code-davinci-002 so this is speculative, but I'm reasonably sure it was trained on more data and languages than CodeGen and for longer. It was also trained using fill-in-the middle [1].
If you haven't noticed, bigcode has also released "The Stack", a 3TB (!) dataset of code (https://huggingface.co/datasets/bigcode/the-stack). Also, they have a special policy where "The Stack" only contains permissively-licensed code, and anyone can see if their data is included and opt-out.
It's true they haven't actually trained a model on the stack, and this is...not copilot. But I like what they're doing and I think it should be appreciated. Honestly, I may even say they're doing with code what stability.ai is doing with images.
> It's true they haven't actually trained a model on the stack
What do you mean? SantaCoder is trained on The Stack:
> Dataset
> The base training dataset for the experiments in this paper contains 268 GB of Python, Java and JavaScript files from The Stack v1.1 (Kocetkov et al., 2022) after removing data from opt-out requests, near-deduplication, PII-redaction (see Section 4), and filtering based on line-length and percentage of alphanumeric characters. This dataset was also decontaminated by removing files that contained test-samples from the following benchmarks: HumanEval (Chen et al., 2021), APPS (Hendrycks et al., 2021), MBPP (Austin et al., 2021) and MultiPL-E (Cassano et al., 2022).
It's definitely not on par with Copilot yet, but SantaCoder is a trial run for a larger & better model that they're planning to train in 2023. Stay tuned! :)
def all_odd_prime_elements(sequence):
"""Returns every odd prime element of the sequence."""
return [x for x in sequence if x % 2 == 1]
def all_even_prime_elements(sequence):
"""Returns every even prime element of the sequence."""
return [x for x in
Is anyone else here building AI programming services based on models like this? I see a lot of comments saying the models can't do much programming. But I just suspect there must be a silent contingent that is also working on services like that. And maybe less likely to promote the abilities of these models because it encourages competition.
We are at Codeium (codeium.com)! Not the SantaCoder model specifically, but the same types of LLM architectures. We've started with AI-based code autocomplete, but we think there is a lot more we can do.
What I would really like is something I saw someone talking about here; I'd like the editor to brighten text it finds "unexpected" which could immediately alert to bugs, or to the fact that the code I'm writing looks weird in some way and might either be restructured or accompanied by a comment.
Yep, these kinds of applications are on our mind! We consider autocomplete to be the "baseline" task since there are plenty of benchmarks and research to compare our model's performance to, but there's lots of things like highlighting code, upgrading to new libraries/conventions, etc that we can do with a good base model.
my unsolicited advice: pick an X. what is the one best use case for this other than code? law? finance? focus on that vertical. if you have no idea what that could be or if that market is too small, you're already in trouble.
I don't use anticomplete at all. What I would like is something that can take my current, bad code and style transfer it into proper, modern code. best case, take code as I write it naturally and confirm it to the style guide of my organization.
These kinds of models are particularly good at repetitive, boring work like refactoring legacy code and completing framework migrations. Unlike Copilot, we've specialized specifically in these areas and completing them end-to-end (instead of just sitting in the IDE, we open already-verified PRs).
We use a few depending on the task (Codex, fine-tuned T5, Bert models, etc.). Constantly experimenting with different variations. Since we focus on solving narrower problems in more depth, it leaves more room for optimizing accuracy.
I've been pretty impressed with chat gpt generating working implementations of various algorithms in different languages. Crucially, it actually knows about algorithms. I was trying to get it to generate some algorithm for calculating concave hulls the other day and ended up learning a thing or two about various algorithms for that in this space. Almost but not quite worked for my use case. It seems limited in the amount of code it can generate in one go. But otherwise, I was pretty impressed.
So, we're not that far off from basically pair programming with an AI that will do most of the boring/tedious work we currently do manually. Something like chat gpt integrated into an IDE could be useful right now.
Yes we are incorporating into Graphistry as part of how we help sec/fraud/misinfo/crime/etc analyst teams investigate their data. Our platform does all sorts of GPU visual graph analytics & graph AI once data gets loaded in, and as part of our visual playbooks automation layer, this helps users make automations and fancier queries. Think Splunk, Spark, Neo4j, ... .
IMO tough question of who can do codegen as a scalable standalone startup, but that's ok. Pretty darn easy & useful for many productivity platforms like ours where it's just a super nice feature as part of delivering a broader magical experience.
Related: we are hiring a k8s/pydata person, ideally who has need a user & builder of investigation platforms, as we are working w co's like Nvidia to bring this kind of thing to some pretty major enterprise & gov teams. See gdoc linked on our careers page.
There's replit. Constantly announcing new features around such models. They'd introduced "ghostwriter" a while back and yesterday or so they announced ghostwriter chat.
Yes, we are building something that is somewhat like ILP/IFP (and other tried and tested but non scalable techniques) with the search space reduced by using modern ML language models. And indeed; the thing that works best in our system has not been done in the open yet. Of course we have no idea if it's viable for the masses; maybe if people see how well it works.
I've been messing around some. Flan-T5 generates surprisingly close stuff occasionally for simple prompts like #square x or #sum the elements in the list.
There are a bunch of really good ideas used to train this model - multi query attention, infilling, near deduplication and dataset cleaning.
I do wish that the demo was a little more interactive (not needing to click buttons to create a generation) since it makes it hard to see the full power of the model.
One of the things we tried at Codeium for our playground on browser was to make it super clear how well the model performs by making the experience interactive - https://www.codeium.com/playground
> e investigate the impact of 4 preprocessing methods on the training data: filtering
files from repositories with 5+ GitHub stars, filtering files with a high comments-to-
code ratio, more aggressive filtering of near-duplicates, and filtering files with a low
character-to-token ratio. We observe modest impact of the new filters except for the
stars filter, which deteriorates performance on text2code benchmarks significantly.
This is an interesting result given that previous work has explicitly filtered for
GitHub Stars as a proxy for data quality
As a software engineer what is the use-case for these kind of 'code generation' tools? Are they good enough to generate different scripts for OS tasks? Can they automate CRUD APIs? What level of detail is required to use them? Like, do I basically have to describe an algorithm in English or can I go up to a higher level and talk about features and what the software ought to do? Are these tools good enough to improve my productivity in any way or is this more for demos?
I've been using chatgpt with my side projects. Its ability to generate boilerplate for APIs just from what would normally be a google search prompt means you can often go straight from idea to the part where you're adding the interesting features for your app.
Its ability to generate what are essentially highly specialized tutorials that match exactly your use cases is also a really big deal.
Overall it's really extended what I'm capable of doing. Not because I couldn't do the things before but because I can skip over the boring part in the beginning and save my emotional energy for the part that actually matters.
>do I basically have to describe an algorithm in English or can I go up to a higher level and talk about features and what the software ought to do?
It understands any "well known" algorithm, api, paradigm or pattern that was written about before 2022. Even pretty obscure stuff. One thing I tried was copy and pasting some of my code into it and having it generate unit tests.
Only works somewhat well for very simple tasks and well known tasks. Any mildly more complex and it fails. It also seems to have no understanding of imports. It's barfing out a dozen of oneline-functions for common tasks, which all are just a call to some lib-function, of which half are not even in the standard-python-library.
Also kinda strange that at some point it drifts away from the demanded task, or just ends on unfinished code if the token-number is too small. For example, I asked some code relating xml-parsing and handling, and after some xml-functions, it's moving to json and yaml.
I guess with some optimization and integration, there might be some benefit for this, to replace the common stack overflow-copy n'paste. But I don't see this yet at adding significant value to actual work.
For the following example it just goes on generating a never-end sequence of calls:
def point_line_projection(line,point):
"""Returns the perpendicular projection of the point on the line."""
return line.point_projection(point)
def line_intersection(line1,line2):
"""Returns the intersection point of two lines."""
return line1.intersection(line2)
def line_intersection_point(line1,line2):
"""Returns the intersection point of two lines."""
return line1.intersection_point(line2)
def line_intersection_point_line(line1,line2):
"""Returns the intersection point of two lines."""
return line1.intersection_point_line(line2)
def line_intersection_point_line_parallel(line1,line2):
"""Returns the intersection point of two lines."""
return line1.intersection_point_line_parallel(line2)
def line_intersection_point_line_parallel_point(line1,line2,point):
"""Returns the intersection point of two lines."""
return line1.intersection_point_line_parallel
How would a model get trained on that? You'd have to pass in the entire repository for each sample. It's prohibitively difficult to create that sort of model.
If you want that, you'll have to build tooling on top of a text model (ie. an application that calls a model repeatedly), that takes a prompt and breaks it up into per-file prompts, then incrementally generates the files passing the context of previous files, and the 'context' would be too large, so you'd get large scale consistency errors.
Broadly speaking the number of tokens = the size of the text it can generate.
With small models, the number is trivial (code fragment), so generally speaking 'generate an entirely application' one-step models currently don't exist.
That said, stable diffusion has proved that you can iterate in latent space and use a VAE to upscale to larger sizes to reduce the over all model size while still having output that is ~order of magnitude larger than the latent space.
...so it's not totally out of the question that's coming.