The latest beta of iOS 13 came out, and there’s a feature called
FaceTime Attention Correction which, on video calls, silently manipulates the image of your face so that you’re looking the other person directly in the eye. Which on first blush to me sounded cool (eye contact is good! Maybe?) but on further thought made me do a weird face.
(Currently the camera and the screen are slightly offset, so even when you’re looking at the picture of the other person, the camera sees your eyes as looking slightly away — and so they see you as looking slightly away.)
So I tweeted about the new feature with some hyperbole:
Whoa. iOS 13 will ARTIFICIALLY RE-POINT YOUR EYEBALLS in video calls so you're looking right at the other person instead of where you're actually looking, which is the screen. Hey Apple, so long as we're doing this, how about fixing my hair and maybe also the bags under my eyes — which is how you have to talk now to get RTs. (As at this moment: 140 retweets and 383 likes.)
Some responses and my thoughts follow:
This is kind of amazing and I think it is really well done, But as an autistic person I also find it discomfiting. One of the reasons I like video calls is that there is no expectation to meet the other persons eye.
This was one of several responses from an autistic perspective, and the concern really resonates with me. Phones have become pretty much mandatory at this point to participate in society, and for them to subtly prefer a particular model of self — that’s all kinds of problematic.
I very much do not want to live in a world which discriminates against or erases different ways of being.
From an autistic perspective, this is just a whole deeply visceral world of “nope”.
Please do not edit my online communication style to make it more neurotypical, I already have to do that enough in meatspace, thanks
Consent is another issue: sure “Attention Correction” is a setting you can turn on and off, but if everyone does it, will it really be an option?
And what about the consent of the other? Is there an icon to show that they’re speaking with an “attention corrected” person, or one that has their hair computationally styled, or their voice enhanced to sound more persuasive? Etc.
You know Zoom has a pretty filter? Your skin will look dewy fresh.
(Zoom being the business world’s new hotness in terms of video conferencing. Which is fair because it’s great.)
It’s important to remember that Attention Correction exists on a spectrum of image correction. But the Zoom pretty filter came as a surprise to me — I’m pretty sure I knew about it once, but it hadn’t seemed important enough to remember.
So perhaps what’s happened is I had mentally categorised video calls as a whole as “unmediated” and Attention Correction is reminding me that it they are very much mediated — more fool me for forgetting I guess — and we will have to develop personal skills and social norms to tell authentic and inauthentic apart?
We’ve gone through this process in, for example, email: “real” emails are text only, from our friends, don’t have a sig. “Unreal” emails use placeholder names, sales-y language, graphics, have an unsubscribe footer, etc. Our expectations for “real” include polite correspondence, turn-taking, no hidden agenda, for example. When these categories are violated, such as the recent fuss regarding the highly funded Superhuman email client which includes hidden tracking images, i.e. applying standard “unreal” email norms to “real” email conversations, outrage results.
We have similar tells — some enforced by regulation, and some that we develop through critical thinking — with TV. There’s a difference between programming and adverts, for example. In programming, there’s a difference between fiction and reality TV. And even with reality TV, we have language to discuss and understand exactly how real it is. What’s that phrase? Structured reality.
So from this perspective, maybe what Attention Correction represents is that this kind of mediation of realtime video is inevitable, and what we need is enough cues and tells and shared language to build up our categorisation instincts.
Prediction: within 3 years you won't even need the camera to make video calls.
Enough training to match my intonation to the expressions of my Memoji, and yes — all the pieces are there.
In case you’re interested, gaze correction has been a long-running project for Microsoft Research, e.g. link
I hadn’t seen the particular research Tom points out, but because of my digging around Glancing back in the day, I have folders full of papers about computers and gaze…
One paper that particularly comes to mind is Ishii and Minoru (1992), ClearBoard: A Seamless Medium for Shared Drawing and Conversation with Eye Contact, CHI ’92. In this work, two collaborators were linked over a shared screen and a video conference. The video call was presented, translucent, overlaid on the shared desktop screen and applications, and reflected.
The result being that you can see where the other person is looking at on the desktop, and they can see where you’re looking too: that is, when they look at a picture on their version of the shared desktop, their gaze on your desktop points at the very same picture. And in the study, this greatly improved ability to work together.
From the paper:
The importance of eye-contact is often discussed in the design of face-to-face communication tools, However, we believe the concept of gaze awareness is more generalized and is a more important notion. Gaze awareness lets a user know what the partner is looking at, the user’s face or anything else on the shared workspace. If the partner is looking at you, you can know it. If the partner is gazing at an object in the shared workspace, you can know what the object is. Eye contact can be seen as just a special case of gaze awareness.
We think the notion of gaze awareness will be an important goal of the next generation of shared drawing tools. It can not be easily obtained in conventional meeting environments, and only CSCW [Computer Supported Cooperative Work] technology can provide it.
There is a ton of research into the gaze from the time, and — like the term CSCW itself — we’ve lost momentum bringing this into the user interface. We’re still in the era of the Personal Computer. The “collaborative” aspect of computing remains (to me) only a thin veneer on the PC. And the challenges we face in the future will only be met by working together…
It’s not just work, it’s all kinds of communication. In real world groups, gaze is used to request priority or give way. Visibility of the gaze of others directs group attention (another recently under-studied area).
So I’m excited because it feels like we’re opening up collaboration, gaze awareness, and group attention once more.
However: I’m uncomfortable with the re-writing of the gaze as performed by the Attention Correction feature. I would feel considerably happier about it if there was a camera behind the screen so the result was meaningful gaze awareness without the post-truth undermining of “real” video.
Despite my discomfort… the possibilities of eye contact in video! I would love to see a simplified reimplementation of ClearBoard from that paper, only using FaceTime. For example, could two people have a shared space as if we were both drawing on the glass window of the screen? This would work incredibly well on the screen of the iPad.
Or… Could we make a translucent FaceTime call, to allow for gaze awareness, and overlay on it a Google Doc, so we could discuss paragraphs with the non-verbal cues of the gaze, and avoid stepping on each others toes with those multiple edit cursors by watching each other’s eyes? Would collaboration be more effective? I bet it would. Apple, Google, give me a call…
Unpopular opinion: Every little hack like this is getting us a bit closer to the (long-predicted, now largely derided) Death of Distance - which will have enormously positive effects on the economy and society when it finally happens.
Positive statements like this were relatively rare in the responses to my tweet. And while I share the sentiment… the implementation and context give me equal concern.
This feels like a nope. Why should my phone decide where I should be looking? Auto-correct for facial expressions is a whole new weird world of darkness. (And maybe where the animoji training data has been going?)
Oh ... I mean, is this actually deep fake as a product?
Quite a few people (men FWIW) have replied to this to say it doesn’t seem creepy to them, but the first rule of “is this creepy?” is not “Do this seem creepy to ME?” but “Does this seem creepy to someone with less power or status or more vulnerability than me?”
Rachel Coldicutt’s response sums it up for me.
Auto-correct for facial expressions is Attention Correction is a nutshell. Not only because auto-correct has both positive and negative consequences, but also because — in this case — an idea of “correctness” in face-to-face communication is invented, and the idea that there is or should be “correctness” here is something I would push back on very strongly.
Coldicutt’s final point, which is to bring in power, is the most important point in all of this: looking through the lens of power is where discussion of this feature should begin and end.
And so my question is this:
since the category of “unreal” (deep fake, fictional, mediated) video is here to stay, and only going to grow, and knowing that gaze awareness is important and, yes, something that should be available to design with; listening to the many concerns and always sensitive to the dynamics of power and vulnerability; how could Apple present this Attention Correction feature differently today (it may be nothing more than displaying an icon on the receiving end) in order to help us develop the best cues and social norms to not only minimise damage, but to best position us for an inclusive, collaborative, technology-positive future?