Twitter thread:

While I was at Google X, I worked on an effort to build a robot hivemind––a network of robots with collective intelligence. Our work was part of a broader strategy to gain a data advantage. Here’s what I learned about “data network effects”.
full-3

About a month ago, my partners @martin_casado and @peter_lauten argued in a great blogpost that data network effects are mostly bullshit. I could not agree more! Every one of their insights maps directly to some aspect of my experience at X.

First, some context: The ultimate goal of our robotics project was to leverage the combination of Machine Learning and Distributed Systems to enable robots to learn from each other’s experience in order to effectively operate in unstructured environments with limited sensor data.
close

One important sub-problem that my team and I found ourselves having to solve is teaching a robot how to open a door using this approach. This may sound easy (and it is, if you’re doing it for just a single door), but it’s a hard problem to solve under all circumstances.

This is because there is a very long tail of variations in lighting and resultant shadows; the color, texture, and reflectivity of the door and handle; the weight of the door; the type of the handle and the specific forces required to operate it; and so on.

There are even slight variations between different “identical” robots that are running the same software. They inevitably have different sensor/camera calibrations and motor characteristics. A true solution to the problem has to be general enough to contend with all of this.

Our approach was to depart from traditional techniques in robotics that favor modularity (e.g. one subsystem determines the pose of the handle, a completely independent one controls the arm of the robot, and yet another system then operates the gripper, etc).

Instead, we set out to train a single deep neural network to do everything, end-to-end. The goal was to get the network to go directly from the robot’s raw sensor data all the way to the final output: 7 torques to apply to the robot’s 7 joints for 20 ms.
image-3

After 20 milliseconds, the neural network runs again (this time with slightly different sensor data, for the robot has since moved). It thus produces seven new torques. And so, in this manner, the network guides the robot toward opening the door 20 ms at a time.

To do this, it must internalize a model of everything from the high level look of a door (and its handle) in different conditions, down to the proprioceptive “feeling” of interacting with the handle and imparting on it the right forces so that the door opens.

And, to make this more fun, we wanted to get the same neural network (with the same weights) to work across different kinds of doors with different handles, and across distinct “identical” robots, each with diverging camera and sensor calibrations.

This required a lot of data. And, collecting it required running multiple robots simultaneously, each learning from trial and error on a different door with a unique handle.

We wanted to get the robots to learn from each other’s experience in real time and to use the data produced from each individual robot’s experience to train a single, global neural network that is shared across all robots. We wanted to build a hivemind :)

We called this Collective Reinforcement Learning. The proof of concept was a collective effort with Adrian Li, @mkalakrishnan, @YevgenChebotar, and @svlevine. Check out the full paper here!
full-paper-cropped

But, what does all of this have to do with data network effects and product defensibility?

Well, at the surface it might seem like this one case (of all cases!) should benefit from strong data network effects because each robot that is added to the network learns (immediately and continually) from the experience of every robot that precedes it.

If you believe, as we did, that each robot would contribute to the collective a differentiated set of experiences, then the marginal utility offered by the network to a new user bringing her robot online should scale superlinearly with the number of robots in the network.

That right there is a network that gets more valuable for the next guy who joins, the more people are already on it, right? That’s a network effect by definition, no? It’s Metcalfe’s Law! (:

And indeed, all of that is partly true. The nuance here, as Martin and Peter would observe, is that this effect can actually be better explained by the scale of the collected data than by the fact that the robots are incidentally connected to one another in a network.

If you could somehow collect the exact same data but without the network, the result would be the same. The network is an implementation detail — it’s just the mechanism by which the data is collected and updates to the hivemind model are distributed.

This is different from the dynamics of a social network where every edge connecting one user to another really does play a functional role in increasing the network’s value. Our robot hivemind benefits far more from data scale effects than it does from data network effects.

And, as it turns out, scale effects tend to be less powerful (and less defensible) than network effects. Why?

Well, at least in principle, the utility curve that best models a network effect is a true superlinear curve. Social networks do indeed continue to get more valuable with each additional user that joins. And each new user brings to the network more value than the last.

This superlinear effect tends to hold indefinitely regardless of how big these social networks get. The same cannot be said for scale effects. The utility curve that best models a scale effect tends to be an S-curve. At some point marginal returns to scale flatten out.
Frame

This is because scaling any one part of a system tends to cause other parts to have to scale disproportionately. At some point, some physical limit begins to challenge the assumptions that originally held the system together and causes returns to scale to slow down.

The reality of scaling our hivemind is that, as the network grows, the new data that is contributed by each additional robot becomes ever more redundant with the data that has already been collected by other robots. It thus no longer offers as much useful signal.

And so, at scale, the amount of additional data that must be collected in order to yield the next useful datapoint begins to increase. This slows down returns to scale and allows trailing competitors to narrow the gap.

A related problem that is especially important when dealing with the physical world is that systematic errors (biases) in how data is collected might have no effect at a small scale. But, at a larger scale, they often accumulate and begin having a real impact on models’ performance.

This is only compounded by the fact that scaling a robotic fleet tends to coincide with the loosening of quality controls in data collection (or with an increase in infrastructure and QA costs). This creates an additional source of bias (or cost) slow down returns to scale even more.

As Martin and Peter point out, there is nothing about data that inherently confers defensibility. Any data advantages must be connected to a more holistic product, technology, or business story.

Scale effects, on the other hand, can and do confer some defensibility but only during the accelerating portion of the S-curve. In the world of robotics, how significant this effect is comes down to the shape of the data distribution.

The shape of the distribution (along with how important it is for you to cover it for product success) dictates the size and quality of your minimum viable corpus. It also determines the likely timespan of the S-curve. So what did this look like for us?

Well, unstructured environments in robotics tend to make for long tailed distributions. And, it’s especially true in robotics that success 95% of the time is disproportionately more useful than that success only 85% of the time. In fact, 85% is often just as good as 0%.

As a point of comparison, our proof of concept hivemind required four robots training simultaneously on different doors for two hours to achieve a 95%+ success rate across all robots under variations in handle, lighting, and door positioning.

A final and all important factor for defensibility, which Martin and Peter conclude with, is whether you are able to secure proprietary data sources. Doing so ensures that your competitors cannot readily follow your rise through the S-curve.

A big part of our effort was devoted to leveraging data obtained from robots in simulation. This is a hard research problem because deep learning algorithms have a knack of exploiting simulator flaws to “cheat” in simulation in ways that would be infeasible in the real world.

In the end, a data advantage in robotics will tend to confer you defensibility only to the extent that you’re able to secure an (ideally proprietary) minimum viable corpus of data that allows you begin riding the accelerating portion of the S-curve before everyone else.

A fully productized version of our robot hivemind would be no exception. It would temporarily benefit from data scale effects conferred by superior data collection infrastructure. Long term defensibility, however, would come down to more than just a data advantage.

Once that S-curve runs its course, defensibility would depend on differentiated tech (maybe learning from simulation), go-to-market strategy (perhaps targeting developers), and an excellent user experience. But not “data network effects”. Those are mostly bullshit.

Join the discussion on Twitter here!