Hacker Newsnew | past | comments | ask | show | jobs | submit | BLP4YC's commentslogin

This post shows how textual information about technological inventions can be used to visualize those inventions (what they can do, how they are related etc.)

It is based on an app that I am working on (see this post: https://news.ycombinator.com/item?id=22847429)


Please, tell me. Kidding, but also not. Honestly, I am still unsure. I started working on it because I came across those features in different papers (if I remember correctly - by accident) and thought they were cool. So I started working on it.

But "creators and inventors" goes in the direction I was thinking. Additionally, innovation consultants, tech. advisors, "policy makers" or people who have to decide on technology but lack technical expertise in certain fields, engineers...


If I had to guess legal due diligence (acquisition, filings, processing-filings), competitive intelligence, large corporations with established R&D or IP efforts, financial intelligence, etc — would all be markets worth looking into.

Creators & inventors tend to want a lot of freebies in my experience.


Thanks! I have been experimenting with NLP for two/three years now and this finally something somewhat useful. So your comment makes me really happay!

Thanks for your feedback regarding the main page. I also like the design, but not because I made, but because the developers behind the templates did such a great job: https://startbootstrap.com/themes/sb-admin-2/ + https://blackrockdigital.github.io/startbootstrap-landing-pa...

I probably should give them credit more prominently (right now, I do it only in the source code).

Regarding the "try now": I have never thought that this will be an issue. On the contrary, I thought that different data across different features will enable people to better understand the variety of the product. But what you are saying makes absolute sense. I have added your points to my todo.

Again, thanks a lot for your feedback - it really helped.


Agree with the feedback on being able to take a raw inputs such as even a manually selected set of documents such as wiki-pages, patents, research papers, etc — and see both the high-level workflow and even step-by-step the transformations and related code.

Clearly, that’s asking a lot both in terms, work and IP disclosure, but I am guessing what the average user would want, that is they have a specific need based on existing documents that the want ingested for analysis. Maybe I am wrong, but agree when doing a quick look, aside from the raw data & code, what I thought to look for to try and understand what was really there.


To avoid any confusion: I did not come up with most of these features. They are based on these papers (hopefully I have not forgotten any):

* Exploring technological opportunities by linking technology and products: Application of morphology analysis and text mining (Byungun Yoon a,⁎, Inchae Park a, Byoung-youl Coh b) * Technology opportunity discovery (TOD) from existing technologies and products: A function-based TOD framework (Janghyeok Yoon a, Hyunseok Park b, Wonchul Seo c, Jae-Min Lee d, Byoung-youl Coh d,⁎, Jonghwa Kim a,⁎⁎) * Investigating technology opportunities: the use of SAOx analysis (Kyuwoong Kim1 · Kyeongmin Park1 · Sungjoo Lee1) * Identification and monitoring of possible disruptive technologies by patent-development paths and topic modeling (Abdolreza Momeni a, Katja Rost b,⁎) * A New Product Growth for Model Consumer Durables (Frank M. Bass) * Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks (Janghyeok Yoon • Kwangsoo Kim) * Innovation hotspots in food waste treatment, biogas, and anaerobic digestion technology: A natural language processing approach (Djavan De Clercq a, Zongguo Wen a,⁎, Qingbin Song b) * TrendPerceptor: A property–function based technology intelligence system for identifying technology trends from patents (Janghyeok Yoon, Kwangsoo Kim)


When looking at a paper, how would you decide if it was of use to you?


It is a combination of:

How interesting I find the paper's ideas.

Would what the paper proposes work with large data sets and can what the paper proposes be (fully) automated.

Can I implement it (do I understand it enough, do I have access to the data)?


Curious, appears you used the same system for technical analysis of crypto ISOs valuations; if true, what if anything did you learn that to you was the most surprising as it relates to emergent tech know valuations?


Yes, absolutely right. The core tech. between crypto-valuations and tech-valuations stayed the same. It was basically: Python + spacy (plus a lot of different supporting libraries).

Some surprising aspects:

For tech-analysis, you can accomplish a lot by using a rule-based parsing because the source texts (e. g. patents) have the same sentence structure. In fact, several researches have shown that patent text follow a certain text structure (e. g. SAO, i. e. Subject-Action-Object).

For crypto, this was for more difficult as the text structures were all over the place.

Also, crypto-analysis ("back then") was very messy because it was difficult to find a trustworthy data set. With technologies you can confine it to Wikipedia, patents, scientific papers. There is still a lot to analyze, but at least you have a somewhat official data set.

Also with crypto you have far less data points per company/token/coin which makes it hard for a machine to not disregard it as noise.

Similarly, with tech-evaluation it seems that - because you get so much data from one document (e. g. one patent) you can often disregard a (big?) portion of it and still end up with good results.

Additionally, it seems to me that crypto-analysis was supposed to be far more numbers-heavy (how much funding etc.) and thus the tolerance for error was relatively small. E. g. if you miss one funding (out of three) you can change the company's valuation up to 50%. This happened to me basically all the time which was super frustrating.

The last surprising fact was how difficult and complicated keyword extraction is. For crypto evaluation I just went with relative word frequency (the more a word appears in a text the more important it becomes, assuming it does not appear in all the documents). However, as I have learnt with tech-evaluation, there are maybe four of five strategies for keyword extraction. And this is still an area where I have not found a solid solution for my NLP-case.

Finally, after all the reading that went into building researchly (which is relatively little) I have realized that I know significantly less about NLP than I have initially thought. It still fascinates me what kind strategies/algorithms people come up with.


For keyword extractions, since you’re already using spaCy, you might take a look at textacy, specifically:

textacy.ke.utils.most_discriminating_terms

Documentation is here:

https://chartbeat-labs.github.io/textacy/build/html/api_refe...

Code is based on this research:

King, Gary, Patrick Lam, and Margaret Roberts. “Computer-Assisted Keyword and Document Set Discovery from Unstructured Text.” (2014). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458...

——

Thanks for the depth of your replies, love mapping information, finding patterns, “seeing the future”... though more often than not, even if you knew for sure some aspect of the future in a predictable way, still very hard to make use of it, mainly enjoy the topic as more as a info geek more than anything else.

Again, thanks for sharing!


What it does: Uses NLP to extract technology-related information form patents, Wikipedia etc. and then analyzes these technologies.

Technology-related information are: functions (what the technology can do) and properties/components (of what is this technology made up)


Care to ELI5?


Analytics javascript can assign you an Unique Identifier string that can be passed along in a form where you fill out your phone number. They then tie your "session" or "user" (depending on how they have it set up) to a cookie they have placed on your machine.

They then leverage a partnership with a large ad network, to see where they found that cookie, as well as other cookies, from other sites in their network. They then correlate where they have seen essentially your user footprint (collection of cookies and other identifiers from your machine) stored throughout their network, to put you in some kind of demographic or intent-based bucket.

When you include data from mobile apps that are tracking your location, this can get very scary. There is a small leap required in tying a mobile ID (your phone's unique identifier) to web traffic, but it happens. Obviously all depending on the level of data you "share."

They then run every user in these buckets through a machine learning algo, that tries to predict things. Income, gender, spending habits, life events, etc. They can determine where you live easily, zip codes, compare that to census data, etc. It's all a part of predictive modeling.

That is the tracking portion.

The other portion is just tying a simple customer service rating to your phone number.

So imagine you piss of several customer service reps, live in a low income part of town, and rarely spend money. You might be on hold for quite some time...


Thank you for this, that's a very nice summary, pitched at a very understandable level.


"Coinbase sits on a razor blade of questionable ethical practices." Nice coincidence: Just on Tuesday (31.07.2018) Coinbase reported that they have hired a chief compliance officer (https://ca.reuters.com/article/technologyNews/idCAKBN1KL1BH-...)

"The regular response from people is "it is crypto money what you expect?" Although we should point out that it is general investing "knowledge" that one can lose everything, this must _never_, as you correctly pointed out, happen due to technical issues. I am just now realizing that it is actually truly crazy how ok people are w/ losing money due to such incidents.


definitely better than my "leaving behind".


yeah, what use has 100% uptime if you cannot use it reasonably. sometimes I am unsure if such tweets originate from not knowing or not _wanting_ to know.


I get your point. Although (as I have touched slightly upon in the article) several crypto-based donation systems exist and some companies are already working on crypto-based lending.

BTW: the only term I can think of is "leaving behind"


Ideally, we would be able to provide modern financial services backed by first world institutions to underdeveloped countries. An example of this came through a podcast I listen to [1].

For the most part, I am skeptical of cryptocurrencies since they need to solve stability and identity first before tackling credit. An additional problem might also be usability; private-public key pairs should be abstracted away to the end user. As for donation systems, I don't think they have the same network effect as credit systems.

[1] https://www.npr.org/sections/money/2016/09/07/492988779/epis...


"need to solve stability and identity first before tackling credit." that makes sense, yes.

"private-public key pairs should be abstracted away to the end user" what do you mean by that?

"I don't think they have the same network effect as credit systems" What kind of network effects to do you see w/ a credit system?


1. Having something equivalent to account with a username and a recoverable password.

2. Credit systems where you loan money to other people based on other's deposits has a multiplier effect [1].

[1] https://en.wikipedia.org/wiki/Money_multiplier


Now I understand you first point.

Did not know about the Money_multiplier. Cool info, thanks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: