Spatial interfaces for conversations and software

18.37, Thursday 23 Jul 2020

Zoom* is pretty good for 5 people because it works as a single conversation, this being the canonical conversation group size with associated psycho-physical limits. And it’s pretty good at 150 because it works like a presentation. But it’s pretty poor for 25.

* I reckon I’ll start using zoom as a generic for all group video calls, doing double-duty noun and verb, like hoover for vacuum cleaner/cleaning.

So what about 25 people? I’m excited about this new software MakeSpace because it tackles that problem in a fundamental way. As a participant, you place yourself on a 2D canvas, and then the sound is spatialised: if you’re near someone, you’re loud to each other; you get quieter when you’re further away. This allows for multiple simultaneous conversations and moving between them.

MakeSpace also has some other powerful primitives like

  • being able to drag documents and web browsers onto the canvas, so you can collaborate by gathering around an object.
  • rooms are simply boxes drawn on the Flatland that keep the sound in, and this allows for labelling different rooms for different conversation, like a real world school or office.

The website has a ton of examples, clearly illustrated.

MakeSpace isn’t yet open for general access. But if you want to give spatial interfaces a go as a way to socialise, both Online Town and Rambly are video chat webapps modeled on top-down old-school computer games with spatial sound.

And don’t forget spreadsheet parties, as previously discussed.

Back in August 2019, John Palmer wrote an illustrated review + concept paper on this topic: Spatial Interfaces. It is smart, idea RICH, and worth digging into:

Suppose I work at a company and I want to find out, “Who is everyone at my company meeting with right now?” With only Google Calendar at my disposal, this task is a nightmare.

Now try to answer the same question with [2D virtual office software]. “Who is everyone at my company meeting with right now?” All of a sudden, it’s extremely easy. You just look at the rooms.

Palmer’s follow-up piece, Spatial Software, in April 2020 has a ton of examples of real software. I’m especially intrigued about the spatial metaphor not as a way to socialise, but as a way of hacking memory and psychology.

Nototo is a spatial note-taking app. It lets you build an ever-expanding, topographical map containing your notes and writing. The app is designed this way to take advantage of another aspect of spatial interfaces: our brains remember spaces better than raw information. In this regard, Nototo is like a software manifestation of a memory palace.

The physical world is baked deep into human cognition. It always amazes me that passing through doorways genuinely causes a memory lapse – but it shouldn’t amaze me because of course it does: Entering or exiting through a doorway serves as an ‘event boundary’ in the mind, which separates episodes of activity and files them away.

Which is only natural. Of course your brain wants to be fully prepared to absorb new information when you enter a new context, so it flushes everything that came before.

And I mean, why not take advantage of our hard-wired physics of information to make software easier to use?

ANALOGY FOLLOWS +++ Our brains are similarly hard-wired to assume light comes from above, which is why shadows “underneath” cause 2D shaded shapes to pop (see #20 Fool Yourself into Seeing 3D in Mind Hacks). And this is why Susan Kare’s 3D button design in Windows 3.0 - in 1990! using only 16 colours! - was such genius. You don’t need to learn that’s a thing you can push. It just looks like a thing you can push!

So yeah. Dunno. Still thinking about space as an interface metaphor.


I’m intrigued about personal spatial interfaces - like that note-taking app - but I’m not convinced. I’d like to try it.

I don’t think I’d enjoy organising my notes on a map. I’m a highly associative thinker, but that doesn’t seem to me to happen visually. I mean – thinking hard appears to make use of my visual system: when I’m thinking hard about how to organise an essay, for example, I can’t see what’s right in-front of my eyes, so the two processes must be rivals for the same underlying grey matter. But the subjective experience of it isn’t visual.

Generally: I don’t see pictures behind my eyes unless I’m trying hard to imagine something, in the same way that I don’t have an internal monologue unless I’m planning how to write a sentence. So I would find an on-screen map-like organisation of my notes to be an interruption to my thinking somehow.

BUT, I do seem to think spatially in at least some ways. When I’m writing a talk, I pull a half dozen books from my shelves and stack them next to me on the table or sofa. I might never consult them, but the proximity creates a kind of gravity of thought somehow? Maybe a self-imposed psychic style transfer?

I guess my equiv for this in software is the way I paste loads of notes into the bottom of a doc before I start writing at the top? Proximity again. Abstract spatiality.

Mad Hatter:

Back to how to have multiple simultaneous conversations and picking up again on audio – this time only barely spatialised.

I remember running across a paper in 2003 about a prototype which did this automatically for telephone conference calls. In the following, “floor” is the jargon for a conversational group.

In face-to-face interactions in such social groups, conversational floors change frequently, e.g., two participants split off to form a new conversational floor, a participant moves from one conversational floor to another, etc. To date, audio spaces have provided little support for such dynamic regroupings of participants, either requiring that the participants explicitly specify with whom they wish to talk or simply presenting all participants as though they are in a single floor. By contrast, the audio space described here monitors participant behavior to identify conversational floors as they emerge.

The Mad Hatter system monitors the speech of all participants:

  • if people are turn-taking, it assumes they’re in conversation, and mutually ups the volume;
  • if people are speaking over each other, it assumes they’re not in conversation with each other, and mutually cuts the volume to 20%;
  • it updates this assessment every 30 milliseconds.

The result is that multiple conversations can occur in parallel, and participants can move between them, on the exact same audio-only telephone conference call.

It would be intriguing to revisit this work in the light of the popularity group video calls in 2020.

Here’s the paper:

Paul M. Aoki, et al. The mad hatter’s cocktail party: a social mobile audio space supporting multiple simultaneous conversations. CHI ‘03: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, p425-432 (2003)

Follow-up posts:

If you enjoyed this post, please consider sharing it by email or on social media. Here’s the link. Thanks, —Matt.