Skip to main content

What's your black triangle?

There is a great post from 1994 about black triangles in the context of building large and complex systems. You might think that 1994 sounds like the dark ages, but the principles outlined in that post do stand the test of time. Here is my favorite excerpt from that post:

What she later came to realize (and explain to others) was that the black triangle was a pioneer. It wasn’t just that we’d managed to get a triangle onto the screen. That could be done in about a day. It was the journey the triangle had taken to get up on the screen. It had passed through our new modeling tools, through two different intermediate converter programs, had been loaded up as a complete database, and been rendered through a fairly complex scene hierarchy, fully textured and lit (though there were no lights, so the triangle came out looking black). The black triangle demonstrated that the foundation was finally complete the core of a fairly complex system was completed, and we were now ready to put it to work doing cool stuff. By the end of the day, we had complete models on the screen, manipulating them with the controllers. Within a week, we had an environment to move the model through.


When I build large systems with lots of moving pieces, I figure out the black triangle analog for that specific problem. I ask myself questions like:
  • What is the smallest amount of work I would need to do that validates that the core pieces of the architecture will solve the problem? 
  • Is there a way to order the work I am doing, so that the most risky and unknown parts are validated first? 
  • Are there parts of the system that I can omit in the first version, but add very easily later? 

All this sounds very abstract, so let's look at a concrete example from one of my recent projects, Sloth.

The core purpose of sloth is to simulate failure conditions to verify large and complex distributed systems. We started with latency as the first failure condition to simulate. That led me to thinking that the first thing I needed to do was figure out a way to inject slowness between a back-end system and clients that are talking to it.


My company had a home grown RPC framework that is widely used by all teams. I considered adding support for injecting slowness though code changes in that framework. Doing that would make for a pretty solid MVP. But, that would restrict the scope to make it work only for that service framework. It wouldn't apply to other use cases like adding latency between an application and a database, or other backends like rabbitmq or kafka that are not microservices.


The first black triangle lesson here is to invalidate early. It is okay to discard approaches that seem reasonable to try, but wouldn't let you prove the whole idea. So, I didn't spend time trying to change the RPC framework to support latency injection.

After looking around some more, and talking to some other co-workers, I learned about TC. TC is an arcane unix command that allows you to manipulate TCP packets, by adding packet loss or latency at the tcp packet layer. Pretty much all backend systems we used, like databases, queues (rabbit or kafka), http services, and RPC frameworks are built on top of TCP for remote communication. It became evident that building something on top of tc would work. I would be able to show that it would work for all sorts of backends.

The first thing I did before writing any real code, was to play around with TC in the command line. I used tc to add latency to various ports on my machine. After that, I wrote some bash scripts using tc, that parameterized the port, network interface and latency value. I used existing services that were easy to spin up, like memcached or mongo. I started these services on my machine, and then used the shell scripts to add latency.

Through the first set of shell scripts, I could add different amounts of latency to outbound traffic on different ports associated with the services I started up. I also wrote scripts to remove latency rules that were added. This was important to do because we wanted to be able to reverse any latency rules added without needing restarts of the clients or servers.

Those few shell scripts were the first full realization of the black triangle for sloth. I didn't have a fully working system yet, but I validated the hardest part of the architecture first. The rest of the work took about two weeks to get to an MVP from that point on. The rest of the time I spent was on details like:
  • Making a daemon in golang
  • Storing rule configuration in consul 
  • Calling out to various TC commands from the daemon to add or remove latency rules.
  • Error handling and recovery 
  • Adding a REST API 
The next time you have to build a new complex system, think about the black triangle. What is the equivalent of the black triangle for your problem? 






Comments

  1. Excellent post. Super helpful. Thank you. I need to redo everything now Dohhhhhhh.

    ReplyDelete

Post a Comment

Popular posts from this blog

What are your future plans? Why are you *still* a developer?

If I had a nickel for every time someone has asked me that question I'd have enough change for a year? Inspired by my friend's post here , I thought I'd write about how I ended up doing what I do now. Then I thought about it some more and decided to write about something else. Is it important to know where you are going in life? If you aren't moving forward in your career does it mean that you are doing something wrong? What does "moving up the ladder" even mean? I am going to attempt to answer these questions for myself. Its almost 5 years since I began my professional career. It has been great so far, lots of ups some downs as well. However, once in a while when I get the title question it still throws me off. It is usually my parents or well meaning relatives, sometimes friends that ask this. I have nothing much to say to them except "I enjoy what I am doing right now, haven't really thought about the future". But the truth is - I have thou

Craigslist, wrong calls and me

I have the worst luck when it comes to getting wrong calls on my phone. Why do I say that? It started a few weeks ago with a phone call - "I'm calling to ask about the Nissan altima you have on sale". I replied saying wrong number yada yada etc. After the third call like this, I finally asked them where they found the ad and it turned out to be Craigslist. One google search later I found the ad listing my number, probably a typo. Contacted them via Cragislist and asked them to correct it. End of story, right? Two days later, I got ANOTHER call, here is how it went. Caller: Maam, Im calling about the toyota four runner for sale Me: You mean the Altima, that was a mistake in the ad. I am not the seller and it has been corrected in the ad online. Caller: Is your number 602-*******? That's what's on there.. Me:ARGGH Another google search, and I find a DIFFERENT Craigslist ad in another city, listing my number. What are the chances of this happening to the same perso

Don't teach programming by asking students to program - What?

Saw this article in one of the ACM blogs that cited a recent study in an educational psychology journal on how teaching introductory Cs students to program by having them write code actually has a detrimental effect. More details here . I absolutely disagree with the author that this study has implications in CS! First of all it cites a study done with teaching algebra so it is a different domain. I think that the problem is that there is too little programming going on rather than too much. I also wonder of anyone has considered that an introductory programming class probably has a mix of two very different types of students - one just exploring CS as one of many career options, the other the kind that took their dad's computer apart in the garage and started programming at age 8. The latter kind still have to take the introductory class because of degree requirements but now it gives you a class sample that's skewed and hard to make any conclusions about. If everyone that to