Collaborative Editing via OT (GitHub)

TLDR

Implementing realtime collaborative editing with operational transforms (OT) was fun. However, the difficulty of the solution is pushing me towards learning about alternate solutions like CRDTs and WOOT.

I'm excited about applying realtime collaboration to domains other than text, so the relative complexity of OT is a serious detractor.

Why operational transforms?

Operational transform (OT) is one of the core technologies enabling realtime collaborative text editing in Google Docs. It is the most well-known and implemented solution for realtime collaboration.

In addition to enabling smooth real-time interaction, OT makes it possible for apps to seamlessly operate in high-latency or even offline situations. Actions that a user takes while the app is offline can be optimistically applied and later synchronized with the server -- all with the guarantee of eventual consistency.

Perhaps the most exciting benefit is the fact that OT is essentially a version control system. Continuous integration and tools like git have revolutionized the software industry. Realtime collaborative systems can bring those same kinds of benefits to other domains.

For a deep description of how operational transforms work, check out Daniel Spiewak's description of the algorithm.

Difficulties

"Unfortunately, implementing OT sucks. The algorithms are really hard and time consuming to implement correctly. Wave took 2 years to write and if we rewrote it today, it would take almost as long to write a second time."

Joseph Gentle, a developer on Google Wave

While OT isn't prohibitively difficult to implement for a simple text system it can get complicated fast: you have to model your domain in terms of operations (i.e. insert or delete) and describe in very precise terms how each kind of operation affects related operations. Implementation is further complicated by the fact that ordering matters so much.

Other algorithms like Conflict Free Replicated Data Type (CRDT) and WithOut Operational Transform (WOOT) aim to alleviate these complications while retaining the overall benefit of collaborative realtime systems. CRDTs rely on operations which are necessarily commutative, dramatically simplifying the ordering of operations. Another method involves using a position map to describe how the positions in a document are shifted between states.

Implementation

js/ot/text_operations.js

In OT, a document is represented as a series of operations (i.e. insert text here, delete text there).

When conflicting operations arise, each operation is transformed in a method similar to git merge: a new operation is created which can be applied directly to a client's state without that client needing to change its history.

In my implementation (and in most practical implementations), the algorithm is centralized: the server dictates which operations should take precedence during a conflict. While OT intends to be a distributed algorithm, it is much easier to implement/understand when centralized.

js/ot/orchestrator.js

The most complicated part of OT (and the most difficult part to test) is the orchestration/communication between the clients and servers. What should they communicate and how much state does each need to track?

Part of the difficulty is that the client and server do not have full knowledge of each other; it would incredibly expensive for the server to track the history & state of each client. In addition, the clients/servers do not share the same history. Because OT acts like a git merge rather than a git rebase, operation histories are never re-written and thus can diverge wildly.

In general, the solution here is to make everything as easy as possible for the server. The client optimistically applies local operations and buffers them. It then flushes that buffer (to the server) only after it's been synchronized with the state of the server. This way, the client only ever sends operations which are easy for the server to understand. For a more complete explanation, Daniel Spedwak's write-up is fantastic .

All this, combined with the strictly in-order nature of operations, leads to a complicated, finicky system. Luckily, once this portion of the algorithm is written, it's very generic.

export type Client<O,S> = {
  uid: SiteUid,
  state: S,

  // The client ops not yet sent to the server.
  buffer: ?ChildedOperation<O>,

  // The client op which has been sent to the server, but not yet ACKd.
  prebuffer: ?ParentedOperation<O>,

  // The prebuffer + buffer form a 'bridge' between the server state
  // and the client state.

  requestQueue: Array<ServerRequest<O>>,

  ...
}

export type Server<O,S> = {
  uid: SiteUid,

  state: S,
  history: Array<FullOperation<O>>
}

js/ot/entry.js

This demo runs entirely in the browser: the network is simulated. Requests from the clients & servers are manually broadcasted (with random delay) with simple function calls. Simulating the system in this way makes the demo & testing dramatically easier.

async function asyncBroadcastServerRequest (serverRequest: ServerRequest<O>) {
  // take the request from the server, send it to each client
  let clientPromises: Promise<*>[] = clients.map(async (client) => {
    await asyncDelay()

    // take the responses from each client, send them back to the server
    let clientResponse = orchestrator.clientProcessRemoteRequest(client, serverRequest)
    for (let clientResponse of clientResponse) {
      await asyncDelay()
      await asyncBroadcastClientResponse(clientResponse)
    }
  })

  await Promise.all(clientPromises)
}

Porting this simulated system onto an actual node.js server would be relatively simple.

PS: I'm looking for a new job! I like collaborative systems, (typed) Javascript, interaction/UI, and graphics. Check out my portfolio and send me an email if you're hiring! (kenrick dot rilee at gmail)