Ask HN: Could you suggest a fast library for converting documents into a sparse matrix representation (e.g., COO or CSR) in any programming language? I'm guessing C beats most of the implementation? But there is also the issue of efficient n-gram hashing/indexing.
Personally I normally work in Cython when it needs to be fast. I find this more productive and more readable than trying to guess what numpy operations will be fast. So I would be doing:
cdef void get_tokens(uint64_t* content, Doc doc) nogil:
for i in range(doc.length):
token = &doc.c[i]
if Lexeme.c_check_flag(token.lex.flags, IS_STOP):
content[i] = token.lex.lower
more interesting question: if some of those "digits" are so hard to recognize even by humans then how can we ever label them with the one true "correct" answer?
Do we know a person who draw those digits and ask "what artist had in mind when making this masterpiece" ? And even then someone might have been trying to draw the "2" but end effect looks more like "3".
I think that some of the test cases simply don't have definitive answer and trying to reach 100% accuracy is just misguided effort.
Many points raised here are valid concerns and should be addressed by the data scientists when designing the system.
Let's just not forget about the positives here: there is a company that openly uses term "A.I." and raised a lot of money. Am I the only one here waiting for the AI Winter to be over [1]? Maybe it's not the best A.I., yet, but they got the funding. Personally, I think that's great and I hope for others to come out of the closet.
The AI winter has been over for a couple years, we're in the middle of a torrid summer right now (triggered by the deep learning craze). There is a rush in the VC world to throw millions at any new AI startup, especially if it has "deep learning" in its pitch deck.
So far it has paid out, because large tech companies (Google, Facebook, Baidu, etc.) have been on a deep learning startup acqui-hiring spree. It might not last, though. Any company that bases its marketing on "AI" and subsequently fails to deliver is bringing us closer to the next AI winter.
Brendangregg, do you know if anyone tried to use this platform to predict failures or DDOS attacks ahead of time? Or it is not feasible with this API? Thanks.
We've used it to predict failures as well as a data source to predict scale up and scale down events. It hasn't been used for DDos prediction, but I see no reason why it couldn't.
Not Brendan, but ... we do a bunch of outlier and anomaly detection using Atlas to notice slow degradation in cluster performance based on outlier nodes and auto-execute them.
To Ryan Kirkman and Thomas Davis: could you make some of the data freely available to the Natural Language Processing community?
If this is something that interests people, then it will be worth the time to automate this process. I suspect that there are blog entries, twitts, facebook updates that already answer these questions. Now we have to mine them. "Gold standard" set of Q/A would make it easier.
EDIT: But I hope there will be a way of getting Q/A in a bulk, right? Thanks
NOTE TO SELF: This could be a data mining or text mining web app. Check if the interest grows in time.
I think no one is like Ed. There maybe be 25% of programmers that have a long standing experience, even with the new tools, but they don't share wisdom like Ed does.
Ed why can't you have your own company? Why can't you be the one who is hiring or at least teach others how to hire. I bet you can smell a talent from 1000 miles away - even if someone did not finish a good school; the world of software engineering would make so much more sense.
[EDIT:] I imagine a job interview; an interviewee says: "I didn't program this but I did program something more difficult"; interviewer makes a frowny face; Ed comes round the corner, puts his hand on the interviewer shoulder and says: "He did not call you stupid; he just said he is ready for the challenge - he has the experience".
1. He doesn't take a job unless it's challenging. They don't post on HN or go to conferences because they are busy with so much work and life (hence no wisdom sharing).
2. If he is a freelancer (which his post makes him sound like), than he does own his own company. It's just not a software startup looking for VC or articles in big magazines.
3. He's not hiring because he wants to program, not do HR. Also, Michael Jordan can't pick players for shit and he's the best BB player in history. Skill in a job doesn't mean you can find talent, and the level that he would jump and say a person has talent is probably the level where said person doesn't need recognition to know their skill.
4. If a team does a good job outlining job requirements and interviewing, they too can smell talent from 1000 miles away. But interviewing every applicant with a 2.0GPA from a bottom tier school takes a lot of time and most often does not reveal a hidden gem. Regardless of what their potential is.
I'm like Ed! Except I do engines, libraries, network and backend code, including embedded stuff. But I don't have a resume; don't apply for jobs; don't need to look for work because it looks for me.
I've owned a company but its lonely, so now I earn less but work as a peer in somebody else's company.
I own my own company too, and the loneliness is a very serious issue. I miss having someone else to bounce ideas off of. And often it feels like I'm the only one that cares whether I'm working during a given 1-day period or not, which is very difficult for me sometimes, particularly when the work isn't very challenging.
I think it's interesting people would assume so. Good programmers don't always make good managers. Or business owner. The same reason why not every good carpenter runs a business and manages a team of people. Highly skilled experts are probably disciplined enough to freelance or work as a contractor. But the skills necessary to grow an enterprise are often a very different set of experiences outside of what they do for a living.
Or they simply aren't interested. They are happy with what they do.
On the other hand, seasoned programmers do make great interviewers. And a good interview often include them in the process. Their insights are very valuable.
Also as a sidenote, if there is no senior/lead developer during your interview process. Or the people do not appear to have a clue. Run. Unless they are going to hire your as the senior/lead, and they admit they lack the knowledge.
S.A. is talking about general-purpose AI (position 2 in the RFS). This means processing natural language. There is a lot of progress but it's just slow so it's almost invisible.
Also it's a very difficult field of science. Now you need to be proficient in AI, machine learning, computational linguistics, linguistic corpora research, cognitive sciences, statistics, and sometimes physics if the text changes over time. Of course, you also need to be a good programmer. This combination of skills is very rare. Thus, very slow progress.
I suggest to start with well defined practical problems. For example, no one seems to do much with user generated reviews. There is some sentiment analysis but that is just a binary text categorization problem - not even close to general purpose AI.
It would be much more interesting to show a seller a time ordered stream of clustered reviews that depict only the most representative review for each cluster. This way a seller can see how his/her fixes/changes impact user reviews. Also it would be a great source for features and bug fixes requests. This is an ideal testing bed for clustering, novelty detection, categorization and mild inference. The inference is required because of sparseness of data.
This would create a good data set for a more general purpose AI. We would have reviews and text documenting changes and improvements of a new version of a product. Now the computer could start learning the dialog between users and product developers. Then, we are just one more step from statistical inference based question-answering system. Not a brute force system like "Watson" or a hand crafted rule base system like "Siri".
[EDIT:] I was thinking more about a decision support system that can recommend product changes. But in a way that maximizes customer satisfaction and minimizes the cost of implementation. The dialogue between past changes and customer reaction would give us the surface that needs to be optimized. This would generalize well to other domains where there is a text for request and a text for response - just to name one: clinical text in healthcare (position 5 in the RFS).
I have stated this previously to the AGI community and think that the way to go is that QA recommendation engines will be the first killer app for AGI. Not recommendation like the ones you see now with the "others who bought...", but ones that look more like "concierge" QA services.
From what I understand from speaking with Selmer Bringsjord, Bloomberg has an outstanding internal QA system, so there is progress, the trouble is that it's all behind corporate firewalls.
There was a silly little online game that came out a few years ago called Akinator [1] that would "guess" a public personality and did so by "learning" based on user inputs - very naiive implementation of CTL but gets the gist of how you can implement a mock AI to get damn good results.
If you did a little delphi to stack the initial deck of results, say for a car buying QA recommendation service, I think you could have a pretty powerful tool that could be replicated across services.
Maybe it's also good idea to revisit the reason why these systems are not widely in use. They all operate under so called "closed-world" assumptions. It means that the knowledge about the world is very limited.
Once the limitation is lifted the system has to deal with non-monotonic reasoning (http://en.wikipedia.org/wiki/Non-monotonic_logic) and that leads to multiple inheritance problem (http://en.wikipedia.org/wiki/Multiple_inheritance). Unfortunately, the multiple inheritance problem has, so far, only NP-hard solutions (http://ijcai.org/Past%20Proceedings/IJCAI-89-VOL-2/PDF/047.p...). [EDIT: think of a command "eggs are in a fridge"; to find eggs in a fridge you need to know that they are inside a container; you need to know its shape and how to open it; if your software knows that and inherits this knowledge each time when asked for eggs, then it will brake when the eggs are not in the box.]
We are in need of software that deals with large networks of human knowledge. Then we can take SHRDLU to the next level. Otherwise we are stuck with Will Smith movie (http://www.imdb.com/title/tt0343818/).
[EDIT2: there are many theoretical "suggestions" but very little software or practical internet applications]
Why are close-world assumptions bad for all cases? There might be cases limited to a domain where the computer does not need to know about the world too much. Consider, the language of geometry problems given in high school math books. Surely, to understand that, one doesn't need to have world knowledge, but only ideas about what intersect means, and so on. This way, SHRDLU can be modified to understand geometry problem statements. The same way, it could also work for other domain-limited applications. A general solution might have to go to probabilistic systems, as current research suggests.
Although it's not a discussion of the specific limitations of a closed-world system, Jeff Hawkin's 2004 book "On Intelligence" touches on many of the shortcomings of such systems, especially with respect to general problem solving and understanding. Even if you don't agree with all of his ideas, I would still recommend reading it if you have an interest in computer understanding.
Why is the fact that it is NP-hard a fundamental problem for real-world systems? Sure, it is hard to solve in the mathematical sense, but in the real world it appears as though approximate algorithms for practically all well-known NP-hard problems exist (check wiki pages), and are very effective and efficient.
Why can't that problem be solved by simply creating multiple closed worlds, each with a set of initial conditions, similar to a text adventure (eg. "you are standing in a room. Some eggs are resting on the counter top" etc.)
This should help you keep your strengths intact regardless of your current situations ... who is to say that next endeavor will be better than the previous one.
Here is a suicide prevention lifeline for those in US:
1-800-273-8255
[EDIT: suicide is a very complex disease that is far from being understood. I'm sure that prof. Yoshiki Sasai had to deal with much, much more than just two retracted articles.]
Relatedly, if a friend approaches you with suicidal feelings, never ever ever try to talk them straight out of it. It is quite likely that they will consider you as simply not understanding them, and shut you out - possibly pretending that all is fine, when it really isn't.
Always try to genuinely understand their feelings and reasoning. Treating suicide as something that is always and inherently wrong, is more likely to hurt them than to help them.
Make sure that they understand that you're willing to listen without judging, even if you might not always immediately understand their reasoning, and even if you might not agree with them.
In a few cases I knew, it seemed the warning signs were that the person mentioned things like "yeah I would never kill myself" seemingly out of the blue. And then one went and registered to buy a gun. (Which was eventually used for the act). You are right that these are described more as "feelings" not just direct verbal statements of the intent to commit suicide.
* have you thought of a plan for how you would do it?
* have you taken any actions on that plan?
The farther down they get that list, the more concerned you should be. emergency intervention if they Yes all four.
Many people have dark thoughts but have a fear/preservation instincts that protects them from rash inclinations. But once someone starts acting on their thoughts, they need close attention from someone quick and strong enough to intervene. As noted, though, knowing someone's state requires their willingness to honestly share it, so caregivers need to be trustworthy and be trusted.
Giving someone hope and a way forward can help them walk back from the edge, but being manipulative or deceptive or coercive can push them over.
I took an emergency suicide first aid course a couple years back, and to this day still carry around the pocket-sized "workflow" in the wallet in the hopes I never need it.
That been said, one of the most important things I took from the course was the need to have the person with the experimental thoughts to agree to stay safe. That is, to agree not to harm themselves for a pre-allotted timeframe. You’re not convincing them not to do it, but instead delaying their feeling of having to do it now. This time gives you both the ability to seek resources to help each other through the crisis.
As said above, be supportive of the person. Hear them out. Pushing them away or making light of the situation is not something that that will benefit anyone. You may not agree with them, but you must understand in order to help them through this period in their lives.
Be honest and tell them you don't know how to help - but agree to seek out help with them. Perhaps they just need someone to show that they are cared for...
tl;dr get the individual who is experiencing suicidal thoughts to agree not to harm themselves for six hours (a completely arbitrary number) so that you both can seek resources for help. DO NOT try intervention unless you are trained to do so - you could make things much much worse...