Screencast: Riak Client Multi-node Connections

The Riak client for Ruby (riak-client) was released a few weeks ago and it includes some really useful features for working with Riak from your Ruby applications.

This second screencast demonstrates the multi-node or “cluster” connection feature in the client, and the effect that has on performance and reliability. Co-starring in this video is Riak Control, Riak 1.1’s new web-based administration tool.

Riak Ruby Client 1.0: Multi-node connections from Sean Cribbs on Vimeo.

Screencast: Riak Client Serializers

The Riak client for Ruby (riak-client) was released a few weeks ago and it includes some really useful features for working with Riak from your Ruby applications. Here’s my first screencast about those features, which describes how to use custom serializers. Enjoy! (Watch on Vimeo for the best experience.)

Riak Ruby Client 1.0: Serializers from Sean Cribbs on Vimeo.

Webmachine vs. Grape

Back in December, I gave my Resources, For Real This Time talk for the third time, this time at NYC.rb. After the talk, I got into a very emphatic discussion with Daniel Doubrovkine and John “JJB” Bachir about the differences between Webmachine’s approach and Grape’s approach and their relative strengths. Daniel followed it up with an interesting blog post titled Grape vs. Webmachine. I’ve had some time to think it all over and so I figured it was about time I wrote a response.

Daniel poses the question “Should you build your next RESTful API with Grape or Webmachine?” Before I address his question (and the inherent assumptions therein), I want to tell you a bit more about Webmachine and why it is fundamentally different from the prevailing approaches.

Protocols are contracts

If you Google ”define: protocol”, two definitions appear:

  1. The official procedure governing affairs of state or diplomatic occasions.
  2. The established code of procedure or behavior in any group, organization, or situation.

Merriam-Webster gives some additionally detailed definitions:

  1. a code prescribing strict adherence to correct etiquette and precedence (as in diplomatic exchange and in the military services) <a breach of protocol>
  2. a set of conventions governing the treatment and especially the formatting of data in an electronic communications system <network protocols>

Another way of saying this is that protocols are contracts or conventional manners of speech and behavior. To violate that contract is to be misunderstood, worse, to offend or to cause unintended actions. Granted, computer protocols may have lesser social consequences than social protocols, but if we don’t speak them properly, our programs won’t work.

Protocols are FSMs

The classical way to implement a protocol participant (that is, a client, server or peer) is a finite state machine (FSM). Why? Protocols are usually defined in terms of “in this situation, do that” or “react to this condition by doing that”. Many of those assertions are dependent on one another, meaning that they are not even relevant if other assertions have not been made previously. To illustrate this better, imagine the protocol of two heads of state meeting. Their meeting might go through these steps:

  1. Arrive at the same location.
  2. Shake hands and introduce other participants.
  3. Enter the meeting space.
  4. Negotiate an issue.
  5. Leave the meeting space.
  6. Arrive and speak at the press conference.
  7. Shake hands again.
  8. Depart the press conference.

First, this is a discrete set of steps that must be followed in the order given. It wouldn’t make much sense to negotiate the issue (which might have its own internal protocol) before you shake hands and enter the meeting space, or to discuss the negotiations at the press conference before you’ve done any negotiation. Second, if one part of the protocol fails, other steps in the protocol may never occur! Imagine that upon arrival, the other head of state refuses to shake your hand or even look at you; you might abort the meeting altogether.

Like protocols, in finite state machines, there are also discrete steps (states), and conditions that allow transition from one state to another. A transition may lead to another internal state, or an end state in which processing is terminated. Finite state machines are the essential way to implement protocols.

And interesting side-effect of this coherence between protocol and FSM is that they are duals of each other. The FSM is an implementation of the protocol, and the protocol’s states and assertions can be derived from the FSM. It’s the kind of thing that researchers interested in provability and mathematical formulations of software get really excited about.

So what does this have to do with Webmachine and Grape?

HTTP happens to be a protocol with a simple syntax but very rich semantic possibilities. If your application “misspeaks” HTTP, it might still be partially understood (the syntax may still be grasped), but the other party might miss out on some crucial subtlety your application wants to convey or might take an unexpected or undesirable action as a result.

Despite HTTP’s flexibility (laxness?), it’s still important to speak the protocol as fluently as possible. Building a better Web is just as much about the brick and mortar (the HTTP protocol) as the paint and trim (“Web Standards” in the browser).

Webmachine tries to do just that. Its core is an FSM of the server side of HTTP. The end states are response status codes (e.g. 200 OK or 404 Not Found). The transition conditions come from the “MAY”, “MUST”, “SHOULD” language in the HTTP/1.1 RFC 2616 as well as the less formal aspects of the specification. The FSM determines which transitions to take based on facts about the request and facts about the resource being requested. Because the FSM is a dual of the HTTP protocol, we at Basho have taken to calling Webmachine “an executable model of HTTP.”

This is where Webmachine fundamentally differs from Grape and other existing frameworks:

  • It implements an FSM that is a dual of the protocol, not an ever-varying stack of middleware.
  • It focuses on determining facts about the resource, not performing actions.

This is what I mean when I say that Webmachine is declarative (functional?) rather than imperative. By being declarative and focusing on the facts about your resource rather than “what do I do when I get a request”, a whole lot of complex and error-prone aspects of the protocol are hidden from the developer, and more importantly, done in a deterministic way every time.

In contrast, Grape and most other Rack-based frameworks encourage you to (perhaps unwittingly) redefine HTTP semantics for every application. In my opinion, this is not just error-prone, it is wasteful. Why should you have to define what GET means everytime? You want to focus on the resources your application exposes, not implementing the protocol all over again. This is why Webmachine encapsulates those decisions (FSM!) and includes sensible defaults so that you only have to focus on the decisions and behaviors (transitions!) that your resources need to modify. You focus on what your resources are, rather than what they do.

REST, For Real This Time

Daniel is by no means the only or greatest offender, but I take strong objection to his use of “REST”. He says,

Grape is a DSL for RESTful APIs.

Simply exposing your service over HTTP and not treating it like RPC is not sufficient to be called “RESTful”, you must satisfy the “Hypermedia Constraint”. Daniel admits

…you have to be disciplined about those API methods - they should represent resources, not RPC service endpoints.

…but does not address Hypermedia. I could go into great detail about why the typical HTTP-based API is not REST, but that has been done by some really great people who have said it much better, Roy Fielding, Jon Moore and Nick Sutterer. Do check out their presentations and blogs.

A note on “DSLs”

Rubyists, we have a fetish for so-called “DSLs”. It’s time for an intervention.

In reality, what we call DSLs in Ruby tend to be thin wrappers around the fluent-builder pattern with a dash of instance_eval and class_eval to remove block arguments and necessary uses of self. (One lightning talk at RubyConf humorously called gratuitous use of the pattern “Playskool MyFirstDSL”.) Grape, and its elder cousin Sinatra, follow this pattern. On the surface, it seems to promote clean, concise, readable code. But at what cost? What complexity is hidden? Does it actually help you write better code, faster and more reliably, or are you in the end working around the DSL to do what you want?

So this is where I take big issue with Daniel’s argument:

I would grant Grape an advantage over favoring the API consumer, since it focuses on the expressiveness of the API.

That warm fuzzy the developer gets when writing an application with Grape is not correlated to the experience of the consumer of the API. It is indeed a strength that Grape can generate API consumer documentation from the code, but as Moore and Sutterer demonstrate, a truly RESTful service is mostly self-documenting.

Maybe it’s the fact that Webmachine(-Ruby) is a fairly faithful port of the original Erlang version, but when authoring it I felt disillusioned with metaprogramming magic. Instead of including a module and executing some class methods to decorate your Resource class, you use simple inheritance and override methods. Internally, modules only exist as namespaces and to separate functional concerns of the same class (see Webmachine::Decision::Conneg or Webmachine::Resource::Callbacks), they are never used to decorate or modify the behavior of the class they are included in. Webmachine::Decision::FSM uses a loop to walk the decision graph, where individual state methods either return a Symbol for the next state or a Fixnum that is the response status code.

That said, others have been working on higher level abstractions on top of Webmachine, ones that include “DSLs”. Whether they will provide more value or simplicity over the existing abstractions Webmachine provides has yet to shake out.

So which should you use?

I think if I were still doing web APIs via Rails or Sinatra, Grape would be an extremely attractive alternative to those, having a lower barrier to entry than Webmachine. It’s a great library and very well written. For an application that exposes very simple semantics, the amount of code you need to write in Grape is small, and you don’t need to have any awareness or understanding of Webmachine’s decision flow, and you can get consumer documentation nearly for free.

On the other hand, I have been just as productive in Webmachine (both Ruby and Erlang) and now that I think more in terms of resources instead of actions, it feels more natural. I want to be able to add those extra semantics just by declaring a few methods, without worrying as much about whether I did it right. I want to avoid the cross-cutting, double-blind mentality of the middleware pattern promoted by Rack.

What next?

Like Webmachine has done for the server side, I think we can also do for the client side and for intermediaries (which act as both clients and servers). We can encapsulate the client side of HTTP into an FSM and expose its decisions in a clean way to applications. We can build client and server-side libraries that make working with Hypermedia APIs simpler (Nick’s Roar project is a good start).

MongoDB and Riak, In Context (and an apology)

There has been quite a bit of furor and excitement on the Internet this week regarding some very public criticisms (and defenses) of MongoDB and its creators, 10gen. Unfortunately, a ghost from my recent past also resurfaced as a result. Let me begin by apologizing to 10gen and its engineers for what I said at JSConf, and then I will reframe my comments in a more constructive form.

Mea culpa. It’s way too easy in our industry to set up and knock down strawmen, as I did, than to convey messages of objective and constructive criticism. It’s also too easy, when you are passionate about what you believe in, to ignore the feelings and efforts of others, which I did. I have great respect for the engineers I have met from 10gen, Mathias Stern and Kyle Banker. They are friendly, approachable, helpful and fun to socialize with at conferences. Thanks for being stand-up guys.

Also, whether we like it or not, these kinds of public embarrassments have rippling effects across the whole NoSQL ecosystem. While Basho has tried to distance itself from other players in the NoSQL field, we cannot deny our origins, and the ecosystem as a “thing” is only about 3 years old. Are developers, technical managers and CTOs more wary of new database technologies as a result of these embarrassments? Probably. Should we continue to work hard to develop and promote alternative data-storage solutions? Absolutely.

Making it constructive

For better or worse, many people consider MongoDB and Riak to be competitors. In reality, there are very few similarities between the products. Then why are they in competition? I personally believe this is because we have largely targeted our products at the same group of developers, those who work on web applications. So let’s take a moment and clarify the primary differences — both for understanding the technologies themselves and for unmuddying the current hoopla.

If I were asked why someone would use MongoDB, there are two clear reasons in my mind:

  1. MongoDB is fast. Say what you will about its durability (the context of my comment from JSConf) and the global write-lock (a consequence of its design, unfortunately), both writes and reads tend to be of low latency. Why? They are mostly in memory (via mmap).
  2. MongoDB has very friendly APIs for developers. This is its biggest strength in my mind. Despite other things you would want to address before going to production, developers love to think of their data as lightly-structured documents. It just makes sense.

In contrast, Riak’s strengths appeal more to operations folk, and developers who are cognizant or experienced in production operations:

  1. Riak is distributed and replicated at its core. There are no special nodes or services to run to scale out, every node you start and join acts equally among the cluster.
  2. Riak has a strong focus on availability and durability in the face of failure. It will gladly sacrifice raw speed and consistency for the sake of staying available to your write load and making sure your writes get to disk.

These differences are fundamental design decisions and have associated trade-offs. Because MongoDB’s design focus is to be a fast single-system database, other elements of its scale-out story are necessarily more complex — sharding, replica sets, etc. Because Riak’s focus is on distributed fault-tolerance and reliability, it necessarily sacrifices raw single-system performance. That’s not to say that MongoDB can’t scale out to large clusters well, or that Riak performs poorly in production, it is simply a recognition of the sacrifices necessary when designing a database system that addresses specific needs.

Could Urban Airship have used Riak instead of MongoDB for their bounded, in-memory dataset? Maybe. Would it have worked better for them than MongoDB? That is really difficult to tell.

Bringing it back around

Now, if I’m so buddy-buddy with the 10gen guys, why did I say such an inflammatory thing in the first place? At Basho, we spend a decent amount of time evaluating and comparing other technologies so that we can understand where we stand in the market, to learn from others’ perspectives, and to address the concerns and demands of potential customers. Naturally, this means we have examined MongoDB closely. MongoDB’s visibility, popularity, and developer-friendliness are things to be respected, even if we criticize the engineering decisions made by 10gen.

Shortly before JSConf, I had personally spent some time finding out ways to demonstrate that MongoDB will lose writes in the face of failure, to be used in a competitive comparison. Let’s just say that I was successful in doing so, despite recent improvements that 10gen has made. Unfortunately, I am not at liberty to share the results, nor do I think it would be constructive to this discussion. I’m sure 10gen has its own collection of competitive comparisons that are designed to shed a positive light on their product in contrast to Riak, it’s just how business works.

We also both know our system’s weaknesses and are working hard to fix them. 10gen’s most recent releases have demonstrated this fact, as I believe Basho’s recent releases have as well. (Have you tried out Riak 1.0? It’s awesome.)

So what now?

The honeymoon phase of NoSQL is over. Will 10gen make the hard decisions it needs to make MongoDB is easier to scale out and have greater durability, while maintain its reputation for snappy performance? I believe they will. Will Basho improve Riak’s developer-friendliness and raw performance, while maintaining its reputation for simplicity and reliability in operations? I have no doubt.

So instead of gloating over each others’ failures, let’s toast to the challenges and all become stronger, more proficient, and more successful as a result.

Ripple Hackathon - Day 3

I started writing this last night, but just got time to publish it.

The Ripple Hackathon has finished! Today felt even more on-focus than yesterday, but also less frantic/stressed. We were in “the zone” for sure. Here’s what we accomplished today:

  • Duff finished off the “stored key” associations, including incorporating conflict resolution on them! (I’ll be writing up a discussion of the new conflict resolution API later.) He also fixed a small inconsistency in an internal API around associations.
  • Nathaniel teased apart some specs that were invoking save or create on Document models but weren’t in the integration suite, and he also lent me a lot of help on improving the build on Travis.
  • Myron implemented an awesome decoupled serialization API for Riak::RObject. You can now register your own serializers for Ruby objects that you assign to the data on a RObject. So if you want to implement a custom serialization routine for application/x-my-content-type, you can do it now.
  • Kyle continued work on refactoring the Riak::Client code so that you can specify multiple hosts among which to multiplex requests at runtime. This will include configurable load-balancing strategies (starting with a default round-robin strategy).
  • I reviewed a number of works-in-progress from the other developers but spent most of my time bringing our Travis build toward green. There are still some outstanding runtime/build environment/worker-related issues on Travis, but in the last few builds we’ve had at least one Ruby version passing in the build.

Retrospective

This was a phenomenal week; I can tell because we kept a kanban on one of the whiteboards and the list in the “Done” column grew quite long in the three days (or you can just look at the commit list and changes). More than just the things completed, I believe we developed a rapport and understanding that can only be done in-person. We also have a clearer way forward, and some momentum behind key features that have been waiting to be started.

Thanks to all the committers who attended, and a very special thanks to the people at Basho who helped make this happen:

  • Mark wrote the proposal and finances for the event, handled ordering lunch each day, and took notes and pictures of the event.
  • Maureen, our office manager, handled booking hotels and other necessary logistics on very short notice.
  • Marisa, our VP Finance, ran the numbers faster than should be humanly possible to meet accelerated timeframe.
  • John and Justin let me take 3 days off the normal schedule to come to San Francisco and run this.
  • Tony made everyone feel welcome and arranged an awesome dinner on Tuesday.