On conversational UIs
17.02, Tuesday 16 Jun 2015 Link to this post
When trumpets were mellow
And every gal only had one fellow
No need to remember when
‘Cause everything old is new again
Stand back folks. I’ve not spent any time editing and now I’m going out. This is stream of consciousness, and it’s long.
There’s that bit in the great article on Chinese mobile UI trends about how there are no websites, there’s just messaging. And not only that, some weird mish-mash of talking robots and customer service people:
Many institutions that otherwise would have native apps or mobile sites have opted instead for official accounts. You can send any kind of message (text, image, voice, etc), and they’ll reply, either in an automated fashion or by routing it to a human somewhere. The interface is exactly the same as for chatting with your friends
You know, and why the hell not. I have one language to use with apps (pointing, tapping, swiping) and another with my friends (chatting). Why not chat with my apps too?
So as Benedict Evans - mobile and technology analyst extraordinaire - points out, messaging is the new app platform:
[In WeChat, in China] You can send money, order a cab, book a restaurant or track and manage an ecommerce order, all within one social app. So, like the web, you don’t need to install new apps to access these services, but, unlike the web, they can also use push and messaging and social to spread.
The other piece of the puzzle here, Evans continues, is the smartphone notifications panel:
That pull-down panel aggregates activity from everything on your phone, and Google and Apple have made notifications actionable and given them payloads. … More and more, one’s primary interaction with any app, social messaging or otherwise, is a little pop-up with a button or two.
So I’ve long been interested in the idea that “next actions” should float away from their apps and come together in a single place… SNAP was my 2008 take on this.
But I guess the 2015 twist is that everything old is new again, and we’re dealing not just with actionable notifications, but robot-generated text that we can have an actual conversation with.
Which is Twitter’s fault.
Now imagine it wasn’t just an activity feed, but you could talk back.
A big bit of the current excitement is the rise of Slack for workplace comms and its embrace of bots. Which takes us to Ben Brown’s insanely incredible insight: What happens when you start automating workplace processes?
What if there was a meeting runner bot that automatically sent out an agenda to all attendees before the meeting, then collected, collated and delivered updates to team members? It could make meetings shorter and more productive by reducing the time needed to bring everyone up to speed.
We’ve just been through an era where management has been regarded as the essential scarce resource of a business, and operations and technology are functions to be outsourced to fungible workers like so many cogs. But what if the core business resource is human ingenuity, and it’s management that can be turned into software… automated and optimised?
Digit is an automated savings tool:
Every few days, Digit checks your spending habits and removes a few dollars from your checking account if you can afford it.
The kicker: You communicate with it via text message (“Great, I’ve moved $10.00 to digit”), they have no plans for an app. And what’s interesting to me is that it has adaptive behaviour… and maybe because of the text message interface, this Digit review semi-anthropomorphises the software:
At first, Digit was really cautious with my money … But over the next couple weeks, as my balance recovered from holiday spending, it got a bit more ambitious
Software isn’t “cautious” or “ambitious”, those are qualities of alive beings. But maybe it serves us to think so.
Walkadoo is a walking game that encourages activity; you communicate with it by text message. Related: Autom is a robot weight-loss coach with big blue eyes. You lose more weight because you regard it as having a mind.
Rhombus is an e-commerce platform for shop to chat with customers by text message… and also accept payments, within the conversation. Related: Twitter’s in-feed Buy Now button which is a game changer if widely rolled out.
One of the problems with text interfaces is text entry. Keyboards suck, especially on mobile devices. Typing also introduces a discoverability question: How do you know what words are valid right now, or the right grammar to use? How do you make complex statements?
In the game Lifeline (iPhone/Apple Watch) you’re texting an astronaut called Taylor who is marooned on a moon. It works in real-time… when Taylor hikes to a location an hour away, you won’t hear from her till she gets there. You text back, but it’s not free text entry, you only get two options at a time. Works well on the smartwatch too.
Despite the constrained responses, it still feels conversational. Enough that the first time I killed Taylor - by freezing to death based on advice I’d given - well, ooof.
Lifeline was prototyped in Twine, the visual programming language for writing interactive fiction. See also Inform 7 where the code-behind-the-fiction reads like a book but every word has its code-meaning too, like casting a spell in a stupid universe, like talking to a golem. e.g.:
The dark doorway is an open door. “The doorway looks ordinary enough, but it’s so difficult to be sure with the unaided eye.” The dark doorway is scenery. The dark doorway is not openable. The dark doorway is west of Longwall Street and east of Turret Roundhouse. The dark doorway is trapped.
Another take on text input:
The web-based game A Dark Room (also available for iPhone) is astounding. Half text adventure, half point-and-click. There’s something about clicking
stoke fire and the words turning into a progress bar while the fire burns down… The communication of the element of time.
Hangkeys is a neat hack – it’s a custom iPhone keyboard that makes it super easy to play Hangman over SMS, Whatsapp, or other text message services.
Meet by Sunrise is a custom smartphone keyboard that integrates with your calendar: Instead of typing times and locations, you tap them instead.
Matt Galligan imagines tap-able buttons in text messages:
What if instead of installing an app, we might instead allow a service to chat with us via iMessage?
Writing code… Swift allows emoji for variable names. There’s something interesting about this. Variable names like “a” or “theCounter” or “dimensions” are meaningful… but what about underlying feelings they carry? The “ii” counter of a tight inner loop always has a zing for me, it’s the twang of high-tension power lines.
So can emoji carry more meaning, or meaning along a different axis? What could we use this for – instead of “houseAddress” just have a picture of a house; instead of saying errorDescription just use smiling pile of poo emoji.
I was noodling with conversational UIs in 2002, back when AIM was a thing. What preoccupied me then - and what interests me most now - is how to make an automated conversation with a bot not boring, and (more importantly) not shit.
So one of the things I found before: The more the bot acts like a human, the more it will be treated like a human. And the more that happens, the more likely the bot will have to say “I don’t know what you mean”… which is lame.
Our guiding light is The Jack Principles from the 1995 quiz show video game You Don’t Know Jack. In short, how to direct the conversation so a user will never be in a position to ask a question the machine can’t answer.
(Tom Armitage collected a link to this and more when he ran a chatbot project at Berg back in 2010.)
Something else surprised me about authoring conversations, something almost contradictory…
- Leading a user through a conversation by the hand is great… once. It’s like a phone tree: a branching tree of questions and multiple choice answers where nothing can go wrong. But if the user wants to change their answer, compare different possibilities, or run through the conversation a second time: It’s tedious and frustrating.
- But the alternative is a wide open space: The user can say anything, and it’s up to the bot to interpret and respond. However that introduces a discoverability problem. You can be chatting to your bot about what’s on TV tonight, completely unaware that it also knows what movies are showing nearby. Or let’s say you do ask about movies, and then you say “well how do I get to the theatre?” and suddenly it’s dumb again. Using Siri is often like that.
I guess what I’m asking is how does a user have a theory of mind about a bot - a conception of its stance, intentions, domain of knowledge, etc - and how is that communicated.
My take back in the day was to organise knowledge into domains, within which a tree structure would be possible but avoided. To summarise how it worked:
- I built an AIM bot called FingertipTV. You could traverse it like a tree: “now” (reply: “a list of channels and what’s on now”), “bbc1” (reply: now and next for BBC1), “9pm” (reply: what’s on BBC1 at 9pm and the following show)
- Resolution varied. “now” would result in a list of every channel but just the show titles (and you could hit a number to get the full description). Getting more precise with the channel and “9pm” would show a full show description.
- Once you’d learnt the vocabulary, everything allowed shortcuts: The user could gradually become virtuoso. So the previous exchange could be replaced with “bbc1 9pm”
- There was a limited amount of history. So saying “bbc1” then “later” would show what was on BBC1 later… but saying “bbc1”, “9pm”, “channel4” would show Channel 4 right now. The 9pm itself was forgotten. Use any excuse to limit the memory of the bot, because that constrains what the user has to hold in mind
- I added functionality to search movie listings… but encapsulated that in another AIM bot called Cinebot. If you asked about movies to FingertipTV, you’d get the reply “I can’t help you but my friend can,” and Cinebot would reach out and start a new conversation. Now you’d met Cinebot, and could add their name to your buddy list. Small, understandable, almost-stateless bots that talk to you and to each other
My point, I guess, is that a new medium needs a new grammar and conversational UIs are definitely a new medium.
For one – they’re intrinsically social. If I’m chatting with a bot in iMessage about what movies are on nearby, shouldn’t I be able to turn that into a group chat with my partner? And does the bot conduct two separate conversations, one with each of us, or assume we’re both searching for the same movie?
We’ll need app frameworks to help author these bots, and the frameworks will make assumptions about how conversations work in small groups.
And you know what, we’re still in the PC era, the era of personal computing. We don’t really know how to use computers in small groups, how to use interfaces collaboratively.
Another big question for me is what happens when we have many of these bots…
How do traverse multiple knowledge domains, discovering features and adapting how we speak as we go? Is it going to be like FingertipTV and Cinebot, loosely coupled domains of differing knowledge and vocabulary? Or more like Siri where the same voice can tell you everything from “directions home” to “how long do dogs live” (to pick two examples that it’s giving me right now).
Maybe it will be like Twitter, where everyone I follow is in a single stream – but I miss what they’re saying? Or like my apps on my phone, where many apps have their own activity stream… and they fight so hard to get heard that they constantly spam my notifications panel?
My head goes somewhere quite speculative,
and that’s text adventures.
Dan’s story, Text Only:
What now? >
You go North
The batteries on your Discman are almost depleted.
Or Julian Dibbell’s memoire My Tiny Life, a tale of LambdaMoo, a collaboratively built text environment inhabited by objects, bots, and people.
There’s a strength in spatialising information – of arranging these bots - these domains of knowledge and differing patterns of interaction - into a web, or on a map. Some bots are closer to other bots.
So part of me wonders… what if I saw the activity feed from my smart home all together in one place, and then when I went north, say, that would be the activity feed from my social networks. Or instead, in my smart home I’d find my TV, and inside that we could have a chat about what’s being tivo’d tonight.
I don’t want to be too literal. But maybe we need an architecture to arrange all these bots and feeds and conversations, etc. And while our experience of the conversations will vary (we’ll be friends with different bots) the architecture will be shared: Arbitrary, but shared. A cyberspace:
A consensual hallucination experienced daily by billions of legitimate operators … [a] representation of data abstracted from banks of every computer in the human system.
And don’t throw the past away
You might need it some other rainy day
Dreams can come true again
When everything old is new again
When everything old is new again
I might fall in love with you again