Hacker Newsnew | past | comments | ask | show | jobs | submit | huac's favoriteslogin

So storytime! I worked at Twitter as a contractor in 2008 (my job was to make internal hockey-stick graphs of usage to impress investors) during the Fail Whale era. The site would go down pretty much daily, and every time the ops team brought it back up, Twitter's VCs would send over a few bottles of really fancy imported Belgian beer (the kind with elaborate wire bottle caps that tell you it's expensive).

I would intercept these rewards and put them in my backpack for the bus ride home, in order to avoid creating perverse incentives for the operations team. But did anyone call me 'hero'?

Also at that time, I remember asking the head DB guy about a specific metric, and he ran a live query against the database in front of me. It took a while to return, so he used the time to explain how, in an ordinary setup, the query would have locked all the tables and brought down the entire site, but he was using special SQL-fu to make it run transparently.

We got so engrossed in the details of this topic that half an hour passed before we noticed that everyone had stopped working and was running around in a frenzy. Someone finally ran over and asked him if he was doing a query, he hit Control-C, and Twitter came back up.


> Could somebody explain why so much effort is being put into quant strategies, when it seems that real-world information gathering would be a much easier way to gain an edge over others?

I used to be part of a research group that sold the so-called "alternative data" you're describing to 30 or so hedge funds in the NYC area, including several of the largest. The example I like to give is that we knew well ahead of time that Tesla would miss on the Model 3 because we knew every vehicle they were selling by model, year, configuration, date and price with <99% accuracy. I still occasionally sell forecasts like this and the methodology is straightforward enough that even a solo investor can consistently beat the market if they know how to source the data. But I've mostly lost faith in this technique as the sole differentiator of a fund's alpha.

Some funds, like Two Sigma, have large divisions with a very sophisticated pipeline for this kind of analysis. They do exactly what you describe. For the most part it works, but there are several obstacles that keep this from being the holy grail of successful trading:

1. First and foremost, this analysis is fundamentally incomplete. You are not forecasting market movements, you're forecasting singular features of market movements. What I mean by that is that you aren't predicting the future state of a price; if the price of a security is a vector representing many dimensions of inputs, you're predicting one dimension. As a simple example, if I know precisely how many vehicles Tesla has sold, I don't know how the market will react to this information, which means I have some nontrivial amount of error to account for.

2. This analysis doesn't generalize well. If I have a bunch of information about the number of cars in Walmart parking lots, the number of vehicles sold by Tesla (with configurations), the number of online orders sold by Chipotle, etc. how should I design a data ingestion and processing pipeline to deal with all of this in a unified way? In other words, my analysis is dependent upon the kind of data I'm looking at, and I'll be doing a lot of different munging to get what I need. Each new hypothesis will require a lot of manual effort. This is fundamentally antagonistic to classification, automation and risk management.

3. It's slow. Under this paradigm you're coming up with hypotheses and seeking out unique and exclusive data to test those hypotheses. That means you're missing a lot of unknown unknowns and increasing the likelihood of finding things that other funds will also be able to find pretty easily. You are only likely to develop strategies which can have somewhat straightforward and intuitive explanations for their relationship with the data.

This is not to say the system doesn't work - it very clearly works. But it's also easy to hit relatively low capacity constraints, and it's imperfect for the reasons I've outlined. You might think exclusive data gives you an edge, but for the most part it does not (except for relatively short horizons). It's actually extremely difficult to have data which no other market participant has, and information diffusion happens very quickly. Ironically, in one of the very few times my colleagues and I had truly exclusive data (Tesla), the market did not react in a way that could be predicted by our analysis.

The most successful quantitative hedge funds focus on the math, because most data has a relatively short half-life for secrecy. They don't rely on the exclusivity of the data, they rely on superior methods for efficiently classifying and processing truly staggering amounts of it. They hire people who are extraordinarily talented at the fundamentals of mathematics and computer science because they mostly don't need or want people to come up with unique hypotheses for new trading strategies. They look to hire people who can scale up their research infrastructure even more, so that hypothesis testing and generation is automated almost entirely.

This is why I've said before that the easiest way to be hired by RenTech, DE Shaw, etc. is to be on the verge of re-discovering and publishing one of their trade secrets. People like Simons never really cared about how unique or informative any particular dataset is. They cared about how many diverse sets of data they could get and how efficiently they could find useful correlations between them. The more seemingly disconnected and inexplicable, the better.

Now with all of that said, I would still wholeheartedly recommend this paradigm for anyone with technical ability who wants to beat the market on $10 million or less (as a solo investor). A single creative and competent software engineer can reproduce much of this strategy for equities with only one or two revenue streams. You can pour into earnings positions for which your forecast predicts an outcome significantly at odds with the analyst consensus. You can also use your data to forecast volatility on a per-equity basis and sell options on those which do not indicate much volatility in the near term. Both of these are competitive for holding times ranging from days to months and, with the exception of some very real risk management complexity, do not require a large investment in research infrastructure.


Do you know why they batch the pledges into several payments and not just one big payment?

Of course it's a bold statement. But if I wasn't bold, I wouldn't have started university at age 13, set three world records for calculating pi (a stunt, I admit), ranked in the top six mathematics undergraduates in North America, received a $100k+ scholarship to Oxford University (not the Rhodes, unfortunately -- their mistake), received a doctorate in computer science from said university, and become the security officer for the FreeBSD operating system.

There is a very fine line between authorized data, technically public but implicitly unauthorized data, and illegally obtained, unauthorized data. Here’s an example of each in the financial sector, from my personal experience:

1. Financial account aggregators and “budget apps” like Min monetize their business, in part, by selling huge amounts of data to the financial sector. Sometimes companies like Second Measure take raw data from companies like Yodlee and clean it, then resell it. Nowadays there is an entire industry of alterative market research that has had all sorts of participants, from Foursquare (locations) to Spark (email enhancement). This is technically authorized, because it’s in the TOS. The users effectively contribute their own data.

2. I developed an extremely accurate, reasonably generalizable method of forecasting vehicle production at several companies that relies on implementing a VIN searching algorithm in conjunction with legally required NHTSA recall lookup portals hosted by each manufacturer. This data is what you’d call unauthorized, because no entity explicitly endorses your use of it. For example, several colleagues and I knew well ahead of time that Tesla would miss on production of the Model 3s because they were utterly unrepresented in our data. But this data is public, so it’s fine to use from a legal and compliance standpoint. It was lucrative data specifically because it had a high signal for revenue, yet was hitherto unused and unidentified.

3. I once found, in the course of looking for legally usable data, an actual security vulnerability disclosing all users of a publicly traded QSR’s online delivery service, along with their phone numbers, email addresses and last four digits of credit cards. This is both unauthorized and illegal, because the data is contaminated with personally identifiable information and it clearly requires a vulnerability (not just scraping) to acquire.

I’ve seen overzealous data vendors accidentally slip from #2 into #3, which is really bad for all concerned. It’s not a great look for the vendor, who will likely be fired, and it represents a breach for the company who owns the data and its users. Any firm that has purchased the data will likely be contamined and be forced into a trading lockdown of that security for a period of time by compliance.

My real concern is that illicit data like this is used in machine learning research. Machine learning is already pretty frustrating - it’s common for me to find research from a conference that I’m simply unable to replicate because the training or experiment data is not available (this is annoyingly the case with A/B experiment optimization research put out by giant companies in particular). I worry that this trend of accepting machine learning research without any requirement for total data transparency will incentivize researchers to conduct their experiments using illicit data that doesn’t need to be sourced.


Been working on accounting systems in RPG and COBOL since ~1992. I also know C/X86ASM/Pascal/Delphi/VB/Fortran. Never bothered with C++ that much; played with Java a bit but Oracle irritates my bowels so moved away from that.

As mentioned in the article it's good work; but it is also not easy work. You tend to go through cycles of being pushed out to brought back under extreme emergency at any costs to get stuff working. Only for the cycle to repeat. Companies never think of the old guys as the ones to implement the new system - that's a job for the "enterprise experts" - I can't even keep track of how many "rewrites" I've seen in my life fail because of this.

We are the dinosaur club; but it's a club that pays extremely well (high 6 figures a year without working too hard if you are talented and have a good client base and reputation), but like fossil fuel one day it will all be gone ;)


I have intimate personal experience with the FCRA. Sadly I don't have an hour to talk about it at the moment, but ping me any time. Short version: it's one of the most absurdly customer-friendly pieces of legislation in the US, assuming you know how to work it. There exist Internet communities where they basically do nothing but assist each other with using the FCRA to get legitimate debts removed from their credit report, which, when combined with the Fair Debt Collection Practices Act, means you can essentially unilaterally absolve yourself of many debts if the party currently owning it is not on the ball for compliance.

The brief version, with the exact search queries you'll want bracketed: you send a [debt validation letter] under the FCRA to the CRAs. This starts a 30 day clock, during which time they have to get to the reporter and receive evidence from the reporter that you actually own the debt. If that clock expires, the CRAs must remove that tradeline from your report and never reinstate it. Roughly simultaneously with that letter, you send the collection agency a [FDCPA dispute letter], and allege specifically that you have "No recollection of the particulars of the debt" (this stops short of saying "It isn't mine"), request documentation of it, and -- this is the magic part -- remind them that the FDCPA means they have to stop collection activities until they've produced docs for you. Collection activities include responding to inquiries from the CRAs. If the CRA comes back to you with a "We validated the debt with the reporter." prior to you hearing from the reporter directly, you've got documentary evidence of a per-se violation of the FDCPA, which you can use to get the debt discharged and statutory damages (if you sue) or just threaten to do that in return for the reporter agreeing to tell the CRA to delete the tradeline.

No response from the CRA? You watch your mail box like a hawk for the next 30 days. Odds are, you'll get nothing back from the reporter in that timeframe, because most debt collection agencies are poorly organized and can't find the original documentation for the debt in their files quickly enough. Many simply won't have original documentation -- they just have a CSV file from the original lender listing people and amounts.

If you get nothing back from the reporter in 30 days, game over, you win. The CRA is now legally required to delete the tradeline and never put it back. Sometimes you have to send a few pieces of mail to get this to stick. You will probably follow-up on this with a second letter to the reporter, asserting the FDCPA right to not receive any communication from them which is inconvenient, and you'll tell them that all communication is inconvenient. (This letter is sometimes referred to as a [FOAD letter], for eff-off-and-die.) The reporter's only possible choices at that point are to abandon collection attempts entirely or sue you. If they sue you prior to sending validation, that was a very bad move, because that is a per-se FDCPA violation and means your debt will be voided. (That assumes you owe it in the first place. Lots of the people doing these mechanics actually did owe the debt at one point, but are betting that it can't be conveniently demonstrated that they owe the debt.)

If the reporter sends a letter: "Uh, we have you in a CSV file." you wait patiently until day 31 then say "You've failed to produce documentary evidence of this debt under the FDCPA. Accordingly, you're barred from attempting to collect on it. If you dispute that this is how the FDCPA works, meet me in any court of competent jurisdiction because I have the certified mail return receipt from the letter I sent you and every judge in the United States can count to 30." and then you file that with the CRA alleging "This debt on my credit report is invalid." The CRA will get in touch with the debt collection company, have their attempt timeout, and nuke the trade line. You now still technically speaking owe money but you owe it to someone who can't collect on the debt, (licitly [+]) sell it, or report it against your credit.

I just outlined the semi-abusive use of those two laws, but the perfectly legitimate use (for resolving situations like mine, where my credit report was alleging that I owed $X00,000 in debts dating to before I was born) is structurally similar. My dropbox still has 30 PDFs for letters I sent to the 3 CRAs, several banks, and a few debt collection companies disputing the information on my report and taking polite professional notice that there was an easy way out of this predicament for them but that if they weren't willing to play ball on that I was well aware of the mechanics of the hard way.

[+] Owing more to disorganization and incompetence than malice, many debt collection companies will in fact sell debts which they're not longer legally entitled to. This happened to me twice. I sent out two "intent to sue" letters and they fixed the problem within a week.

[Edit: I last did this in 2006 and my recollection on some of the steps I took was faulty, so I've corrected them above and made it a little more flow-charty.]


Part I

My offer (from Art Bass, then head of Flight Operations in part because he was, as the FAA required, a pilot) and offer letter said that (A) there would be a stock plan, (B) I would be part of the stock plan, (C) the plan would be based on salary in which case I would be quite high up, (D) the Board was considering the stock plan now and results were expected in two weeks, (E) if the plan were not equitable then the first plane out of Memphis would be full of ex-Federal Express employees.

With that, I joined, kept teaching my courses in computer science at Georgetown until the courses were over, at home got a time sharing terminal, a CP67/CMS account, etc., and dug into writing the software to schedule the fleet.

Some Board Members, including one with a lot of experience at American Airlines, doubted there could be a schedule. So, the Board wanted to see a schedule, say, for the full, planned fleet of 33 planes serving the full, planned list of 90 US cities. Some crucial funding, some equity, some loans on the planes, were being held up waiting for the schedule. The company was at risk.

I wrote the software, finished my teaching, drove to Memphis, and rented a room.

So, with the Board having doubts and the company at risk, one evening Roger Frock and I used my software to develop a schedule for the 33 planes and 90 cities. We printed out the schedule, had copies made, and handed them around.

Board Member General Dynamics had sent two representatives, one an aeronautical engineer and one a finance guy, to provide, say, adult supervision; those two guys went over the schedule fairly carefully and announced "It's a little tight in a few places but it's flyable" (pretty good from just a little fast work from Roger and I); CEO Fred Smith's reaction at the next senior staff meeting was "Amazing document. Solves the most important problem facing the start of Federal Express.". The funding was enabled. FedEx was saved. Pretty good from typing in 6000 lines of PL/I in six weeks while also teaching two courses!

PL/I is a nice language -- good on data types, data manipulations, data conversions, data structures, scope of names, exceptional condition handling, storage management, debugging, etc. E.g., its Based structures, can serve as a poor-man's classes in object oriented programming.

Later the Board wanted to see some revenue projections. I didn't want to get involved, but no one had more than wishes, hopes, dreams, intentions, etc. So, I started with the common high school question, what do we know? Well, we knew the present revenue or, if you will, number of packages per day. From our initial long term planning, we knew what revenue we expected with 33 full airplanes serving 90 US cities. So, in some at least a somewhat meaningful sense the desired projections were an interpolation between those two facts we did know.

Then the question was, how will the interpolation go? Well, why will the revenue grow? Sure: The revenue will grow as it has been so far, customers to be hearing about FedEx from current happy FedEx customers. E.g., maybe a customer to be gets a package via FedEx. So, we're talking word of mouth advertising or viral growth.

So, the rate of growth in revenue per day or packages per day should be directly proportional to (A) the number of current customers and (B) the number of customers to be. That is, the rate of growth should be proportional to both (A) and (B), that is, to their product.

So, all downhill from there: At time t, let y(t) be the revenue, say, per day, at time t. Let t = 0 correspond to the present. So, we know y(0). Let b be the revenue per day with a full system, that is, 33 full airplanes and 90 US cities. That is, we know both y(t) and b.

So, from freshman calculus, the rate of growth is the first derivative of y(t), that is, d/dt y(t) = y'(t). So, from the proportionality, we have that there must exist some constant k so that

y'(t) = k y(t) ( b - y(t) )

So, we have an initial value problem (we know y(0)) for a first order, linear ordinary differential equation.

Okay, how to get a solution? Easy, just need freshman calculus, not even a course in differential equations. And, yes, there is a closed form solution, right, with some exponentials.

Right, the solution is the famous logistics curve sometimes seen as doing well tracking, say, the growth of TV set ownership in the early years of TV. So, my derivation, as just above, can be seen as an axiomatic derivation (maybe rediscovery, maybe original) of the logistic curve. The solution may remain an okay, first-cut approach to understanding viral growth.

So, I showed my work to Senior Vice President Mike Basch, likely the one most responsible for getting the projections for the Board, and he liked my work. So, on a Friday afternoon we picked several candidate values for the constant k and drew the corresponding graphs of the revenue projections. We used my HP calculator, reverse Polish notation, stack machine, etc. -- HP should run an ad! We picked a value of k that gave what seemed to be reasonable projections and declared the problem solved.

The HP? It's still in my center desk drawer. Checking, right, it's an HP-35. My wife and I paid $400 for it.

The next day, a Saturday, at about noon, I was in my office likely working on fleet scheduling and got a call from Roger Frock asking if I knew anything about the revenue projections. Saying I did, he asked if I could come over to the HQ and explain.

So, I got into my Camaro hot rod (396 big block, etc.), and drove over. Yes, I brought my HP-35.

As I arrived, at one of the old WWII wooden hanger buildings, people were standing around and not happy. Our two guys from General Dynamics were standing in the hall with their bags packed and not happy.

Roger led me to a table with the graph, picked a point in time, and asked me to calculate the value on the graph. So, with my HP-35, I punched the buttons, stopped, slowed down, cleared the HP-35, started again, slowly and carefully punched the buttons again, and got the value on the graph. I did that for several points for the graph, and then everyone started to get happy.

It turned out that the Board meeting had been that morning; Mike Basch was traveling; I'd not been invited to the Board meeting; the graph had been presented; the two guys from General Dynamics (GD) had asked how the graph had been calculated; and everyone else at the meeting dug in trying to answer. They worked for hours with no results. Finally the two guys from GD lost patience with FedEx, returned to their rented rooms, packed their bags, got plane reservations back to Texas, and as a last chance returned, with their packed bags, to the FedEx HQ to see if anyone could explain the projections.

Somehow Roger Frock had guessed that I'd done the projections, called me, and got me to the Board meeting just in time.

It was close, but I'd saved FedEx a second time.

Right: Some people in FedEx would rather have seen FedEx go under than invite me to the Board meeting. We're talking some severe cases of jealousy, bureaucratic infighting, attacking the guy down the hall instead of the competition outside of the building, goal subordination as in organizational behavior, etc., right? Bummer.

Right: Apparently I was the only person at FedEx who still understood freshman calculus. Gads. And I never even took freshman calculus, taught it to myself from a book, and started with sophomore calculus.

I never got any thanks for saving the company the second time.


Part II

I'd been at FedEx for over a year. I had been commuting every few weeks home to Maryland for a few days at a time -- not good. There had been no more about the stock that had been supposed to come in "two weeks". The company had some problems, e.g., had nearly gone out of business due to not inviting me to the Board meeting. Also the basic planning was way off -- the planning said that we could fly the planes around half full and nearly print money, but we were flying the planes packed solid, had doubled the rates, and still were losing money -- bummer.

So that I could be a good bread winner in my marriage and for our kids if we could have some, as we hoped, I wanted something valuable no one could take away from me, a Ph.D. for my career and/or stock.

So, I'd gotten accepted for an appropriate Ph.D. at Brown (Division of Applied Mathematics), Cornell, Princeton, and Johns Hopkins.

The oil crisis hit. Saving money, especially on jet fuel, became a biggie. So, I was working on that. I was getting a lot of flack from others, especially my manager.

Finally I called a meeting to explain what I was working on, three projects. My manager said I couldn't do that because he was busy and couldn't come. I told him, fine, then don't come.

He came. So did Fred, Roger Frock, Art Bass, the top 15 or so people in FedEx. My manager was sitting next to Fred and kept objecting to what I was saying until Fred told him to cool it.

One of my problems was to use deterministic optimal control theory to say how to climb, cruise, and descend the planes.

A second problem was to use 0-1 integer linear programming set covering to develop schedules that would save on OpEx and maybe also CapEx.

A third problem was how to buy fuel during a trip from Memphis and back. So, broadly the idea was to buy extra fuel where it was cheap and carry it to the next stop or two where fuel was more expensive. We were getting fuel for $0.16 a gallon in Memphis but paying up to $0.55 cents a gallon (in Nashville). So, that's a case of what has long been known as fuel tankering. But doing that interacts with how to climb, cruise, and descend the airplane, not being late in the schedule, loads, weather, air traffic control, etc. And typically a lot of the cheap fuel gets burned off just from trying to carry it, and how much gets burned off has a lot to do with the flight plan. And any such decision to buy extra fuel is a bet on the future of the trip back to Memphis, that is, a bet against the random package loads, weather, air traffic, etc.

So, how the heck to solve that? And, for various reasons, couldn't get a solution from carrying a computer on the plane and, really, not even from using a computer on the ground after landing. I'd found a way!

So, Fred put me under Senior VP of Planning Mike Basch and, thus, made me Director of Operations Research.

But the fall came, and I had to decide actually to leave for graduate school or not. With no stock, not a lot of thanks, with a lot of scars from being attacked, still away from my wife, the company still at risk, I decided to go to graduate school. I liked FedEx, the challenges, the work, etc., but making the stockholders rich, with me not one of the stockholders, while wrecking my marriage and passing up the chance for a Ph.D. that might help my career and that no one could take away from me looked not good. If I couldn't get stock with the company still at risk and worked to make the company valuable, then what hope would I have of getting stock in the company I'd helped make valuable before getting any stock?

I went home to Maryland. At the last moment, Fred wanted me back in Memphis. He and I met with Mike Basch, and Fred said, "You know, if you stay, then you are in line for $500,000 in Federal Express stock?". Heck no; I didn't "know" any such thing; I had had and accepted such promises before, "two weeks", and after 18 months, saving the company twice, and with three projects to do much more for the company, all there was were more such promises, not on paper that a lawyer could do something with, no thanks -- "Fool me once, shame on you. Fool me twice, shame on me.".

Sure, that $500,000 would be ballpark $50 million to $500 million today. And apparently some people did get some stock. But there that last day, Fred still was just not putting it down on paper.

Since then I ran all this past a lawyer who concluded, "Legally FedEx owes you nothing. Morally they owe you everything.".

So, here on HN, maybe I definitely should tell this story as I have so that others can benefit so that more promises of stock can become ownership of stock.

Of course, there is a lot more to getting wealthy from stock in a startup than what I've outlined here.

Broad Lesson: The broad lesson for people in startups with promises of stock, become very well informed and be very careful.

My reaction: Do my own startup. Doing it. Need to get back to it. It'd be fun to make more money than Fred! I have a shot! Back to it!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: