Morgan Ames - DMP 2003

After much indecision in the last week as I struggled to choose between the half-dozen or so interesting projects that A.J. suggested, I've decided to do a study on remote usability this summer. It was between that and an intriguing but nebulous study on perceptions of simulated urban environments, which may have depended too much on outside sources. Anyway, today I wrote up an ambitious timeline for myself, as part of a "research plan":

Timeline (as of June 4):

Soon: logistics

make sure this is covered under a human subjects protocol already, or get one in
find out who I'll be testing on
fill out this document more: interview questions, the study, data analysis, research questions (esp. for UrbanSim interface)
decide on compensation for participants

3-4 weeks, as interface is coded: prepare for study

skim (at least) the rest of the related papers
write "related work" section of paper
write out consent forms, interview questions (as surveys?), protocol cues for studies, etc.
do mock-up or roleplaying pilots to polish user study process and questions for interviews
contact potential participants
do other tasks: transcribe for A.J., maybe help Tasha with traffic light study, maybe code for interface, etc.

1 week: run pilots

schedule tests with participants
run a few pilots with UrbanSim developers
study results of pilots, adjust the study and questions as needed from these results

1-2 weeks: run study

run introductions and first round of tests with participants
do preliminary write-up of data
run second round of tests and conclusions with participants
do preliminary write-up of data
write "description" section of paper (describing experiment, participants, etc.)

1-2 weeks: analyze results

do transcriptions as needed
collate interview results
correlate between "hard" data and interview results
write "results" section of paper

Rest of summer:

schedule tests with remote-only participants (if time)
run tests with remote-only participants (if time)
analyze results of remote-only tests (if time), add to "description" and "results" sections above
write "discussion" section of paper
present results somewhere?

I had a long meeting with A.J. and Alan, a professor in CSE, yesterday, and slightly revised my timeline but left most of it intact. It'll be interesting to see how the incremental paper-writing approach goes. I haven't seen it done before - papers often seem to be written in a rush in the weeks or days before the deadline. Hopefully it'll discipline me and focus my research.

I like A.J.'s approach of "working backwards" - thinking about the end goals of the research (what conferences? what contributions?), then mapping out how to get there. I also think it's fantastic to write a research plan in as much detail as possible, before the research starts. Mine is an impressive 8 pages (without related paper summaries) and covers:

Goals

High-level research questions

Motivation

Background research

Participants

Study design

Data analysis

Paper outline ideas

Timeline

Today was full of talks, meetings, and parties. First, Michael Batty of the Royal College of London talked about modeling crowds - fractal growth, constrained random walks, pheromone trails, etc., then a case study of the Notting Hill Carnival. Then I went to the HCI meeting, which was mostly spent discussing the direction of various UrbanSim projects, all of the various fields involved in UrbanSim, and whether the research projects counted as "basic" or "applied" in their respective fields. Lunch followed, and the debates continued. After that, I had a scant two hours to ponder research plans and revise my DMP site before the annual CS undergrad picnic, where I talked with other women in the program about summer brown-bags for women in CS and watched others play volleyball, frisbee, four-square, and croquet. As I was leaving that, Batya invited me to an end-of-year Information school party, where I met her daughter and Melody Ivory's son. Fun stuff. Maybe on Monday I'll actually get some work done. :~)

I'm excited to join the computing in developing countries reading seminar this summer! I've missed the Berkeley reading group I'm in. More about this later, I'm sure.

It's great to see how other research groups interact. Every Tuesday I've been going to a "coding" meeting (in the Value-Sensitive Design sense, not the writing source code sense) for the Room with a View project, with Batya and Peter and various others. This group seems very "seasoned" and philosophical. Last meeting we discussed what distinctions in the data was most important psychologically, and ended up talking about Lakoff and Searle and all sorts of other stuff. And at the HCI meeting I mentioned above, they debated for a long time about what category the projects would be in, which seems like it would be met with impatience in other research groups I've interacted with.

I've had to deal a lot with CS politics this week, while looking for a screen-sharing program for the remote usability studies - I got lots of practice writing diplomatic e-mails and such. :~)

First, I was interested in NetRaker since I used it last fall for a survey, but NetRaker staff quoted me a $15,000 price tag for the whole system ... so I composed a carefully-worded message for Alan to send to prof. James Landay at Berkeley, one of the main people at NetRaker, explaining that we were a research group and couldn't afford the full package, and asking if we could just use the screen-sharing capability, and how much that would be. Alan added in "and if you want to work with us on the project, that'd be great!", which I had mixed feelings about - James is a bit intimidating. :~) But I figured it'd be good to get over that anyway. Well, the "academic" price of NetRaker is still $1000 for 6 months, and includes much more functionality than just the screen-sharing we were looking for. Moreover, if we used NetRaker we'd have to give NetRaker access to our data. So I investigated other screen-sharing programs, and wrote a message to James explaining why we decided to use Glance.net - at $20/month - instead.

Meanwhile, the VP of Engineering at Glance.net writes me, saying he has affiliations with a guy who works on UrbanSim, and is excited that we're using his product and would be happy to field questions or get feedback on how it worked for us. So I wrote a thank-you style e-mail back to him.

Other than all of that, I've finished off the background reading for remote usability. There was one paper by a pair of guys that was repeated almost verbatim 5 times (with minor variations in title and form), and incorporated 2 more times into bigger papers! There's a name for this that escapes me - anyway, it's generally looked down upon, from what I've heard. :~) I just found it amusing that this guy took it to such extremes. I mean, two can be overlooked, three is questionable, but seven?

Next tasks on the docket, which I can hopefully complete before this weekend (when David's visiting!), are writing up a "Related Work" section based on all of these papers, and an "Introduction" to my research, pretending that both are going into a paper. Then I'll write out interaction scripts and make consent forms for the pre-pilot usability studies I'll run next week. I'm still finalizing exactly how the study should be run - for example, should we have a local observer for the remote study to see how much the participant doesn't communicate to get another metric for comparison, or will that distort the interaction? and should the follow-up questions be given as an interview to provide more chance for open-ended feedback, or as a survey to make it easy to collate the results (and avoid transcription)? Hopefully the pre-pilot will iron these out, even though I'll have to run it on some arbitrary interface, since the UrbanSim one isn't coded yet.

I've been a bit worried all weekend about a preliminary study I found as I was writing the "related work" section that seems pretty close to what we want to do this summer. It never hurts to repeat the study - that's what science is all about - and it'd certainly be interesting if our results were different. But ... how close is too close? What aspects could we explore that they didn't, to make our study more compelling? This other study had only eight participants - four local, four remote - and measured the number of usability problems found in a web browser the participants hadn't used before, and "participant satisfaction" with the experiment - we can probably expand on that, but how? Maybe explore the trust issue, or a more explicit "think-aloud" method, or expert interfaces ... I discussed the issue with Alan, Peter, Batya, David, and Nathan over lunch today, and have some good ideas for how to frame the study in a way that will be "novel" enough. I'll have to discuss it more with A.J. when she returns on Wednesday.

I'm also worried that I won't have much of an interface to test by the end of the summer. With the current timeline for development, the "maps" section of the interface, which is the part we'd like to test, won't be finished until July 21, leaving a scant two weeks for the bulk of the study before my summer is over. This is the usual problem of having to rely on external forces for progress. :~) Maybe with A.J. and Janet and Tasha and me and the other undergrads in the lab all coding, we can speed it up, but ... well, we'll see.

Meanwhile, I've booked a flight to Phoenix, where I'll attend a conference workshop on remote usability with A.J. next Monday, and now I have to write the workshop position paper. My "introduction" and "related work" sections, and the conversations over lunch, will help, but I need to flesh out the study design part. I'm a little intimidated writing the position paper, but excited to go to the workshop - I've never been to a conference workshop before. A.J. mentioned that it may be interesting to question the workshop attendees on their experiences with remote usability, for another facet of the research.

I attended a workshop on remote usability methods with A.J.. The workshop was part of the annual conference of the Usability Professionals Association, this year titled "Ubiquitous Usability," and took place in Scottsdale (suburb of Phoenix), Arizona. The goals of the workshop were to identify best practices and areas for further investigation (that's where we come in :~)) in remote usability. The workshop was divided into four sections, punctuated by coffee and lunch breaks: introductions, methods of setting up and running tests, pros and cons of remote usability, and summarizing the results of the previous three sessions. Ten usability professions from as far away as Germany and Switzerland attended the workshop, including Jakob Nielsen. One participant was going to do a study similar to ours with business partners this summer, and may be interested in sharing data.

The results of the workshop were encouraging - it seems that we are on the right track with our plans for the remote studies. Remote studies are good because you can reach more people more cheaply, you can overcome language barriers more easily by having a translator on the line, and the setup better matches real-world conditions. However, you do lack contextual information, you rely on good English and communication skills from the participant, there's more setup involved for the participant, and it's harder to recover from crashes. For open groups of users, there are the additional issues of trust, deception, privacy, security, and authentication. It was the general consensus that you can pay the participant less for remote studies, but you are buying less of their attention.

The workshop participants gave lots of tips for doing remote usability studies. Telephone and screen-sharing seemed to be the most popular and most reliable technique for doing synchronous remote studies without a remote satellite lab. The feeling was that they gave the most information for the least effort - and easy setup is important. Popular screensharing programs were WebEx and Netmeeting (with some complaints about the latter). Some shared their desktop with the participant instead of vice versa, when data was sensitive, setup was difficult, and/or high-resolution graphics wasn't required.

All participants agreed that it's important to rehearse the study well and run pilots, and it's good to have run similar local studies in the past to be able to proceed and recover gracefully. One workshop participant included a picture of himself with the online consent form, so study participants could put a face to the voice. It's important to explicitly set expectations of the participant, even more explicitly than is usually done with local studies, and to work to establish and maintain a feeling of rapport with chatting and such. Testing should last 1 to 1 1/2 hours at most. To control the flow of information to the participant, send tasks via email, chat, or url links. After the study, give instructions for uninstalling any software that was used on the participant's computer, and make sure to send any reward yourself.

All agreed that the field needs more validation - more studies that compare local and remote studies, of all types. There were many more specific questions and areas for investigation, such as how to test hardware remotely, how to do low-fidelity prototyping remotely, or how to identify deception from the study participants.

After returning from Phoenix, I spent the last week making preparations for the remote studies - writing scripts and webpages to display tasks, email consent forms, and input surveys - but I don't feel like I've accomplished much. I think I'm in the research doldrums right now, where I'm not sure - or am worried and procrastinating - about what's coming next. I'm not sure what A.J. thinks about the project right now - worthwhile, questionable, going well, shaky? I should be better able to validate my work in my own mind, but as usual it's difficult! I was thinking about doing a pilot today, but the people I'd recruit were busy so it didn't work out. I'll see if I can get David to do one tomorrow, since he's flying in tonight. He's good at being brutally honest about my designs, for better or for worse! :~)

Anind, my research advisor at Berkeley, is in town, and I talked with him this afternoon. It's too bad I'm in the doldrums right now - it would have been good for him to see me excited about my research, since during the last school year there were many periods of uncertainty when I couldn't envision where my research should be going ... so he's used to that. :~) But it was good to talk about my progress this summer and discuss possible projects next fall. We also decided to submit a poster on Healthy Cities, my research project last semester, to the UbiComp conference. He was interested in my comments about how Batya's and Peter's group was different from other research groups, and wanted me to give a rundown on value-sensitive design next fall. It'd be great to get some VSD going at Berkeley.

I think the main difference between Batya's and Peter's group and other research groups I've seen is that Batya's and Peter's group has an overarching concern for the effects of technologies on people, and how technologies integrate with their "values." It seems that many computer scientists doing research in ubiquitous computing or in human-computer interaction in general like the whiz-bang aspects of the field and aren't as interested in the methodical study of people's interactions with (and perceptions of) technologies, beyond making the interface usable. I think I'd like to explore more of this in graduate school.

Whew, I got the extended abstract for my UbiComp poster in yesterday! It's the first paper I've written mostly on my own, and it's so ... surprising, in a way, to have produced something that looks so professional, and so much like the papers I frequently study! I finished a first draft last Wednesday, heard back from my advisors yesterday afternoon, and spent several hours furiously editing and talking with my advisors over IM (unfortunately missing the sustainability and development meeting this week) before turning it in around 5pm. What an adrenaline rush ... I think it's not the way I'll generally want to finish paper edits, but sometimes necessity prevails.

In other news, we're still waiting for the interface to be finished before scheduling user tests. The current estimate is July 21 or thereabouts. I've read a few more papers and have done lots of revising of the study documents, and next week I'll probably run some more pilots - I've done two, but they were on a different task, since I can't test UrbanSim yet - and polish the study method. Next Monday A.J., Alan, and I will meet with James Landay at last, to discuss the study, and hopefully we'll get to meet with Judy Ramey soon, also - she wrote a paper on the "think-aloud" protocol we'll be using, and has done lots of studies.

The meeting with James last Tuesday went very well! It was nice to make the change from "James's student" to "fellow researcher," and to get over the intimidation factor associated with being a student. James gave us some good feedback on what to record, what to report, and how to make the study more credible. Last Friday we had another meeting for feedback - this time with Judy Ramey, where we discussed mostly how to do think-aloud and how our study might feed into one of Judy's projects. This morning we had another meeting with Judy, this time to practice our think-aloud protocol on her.

I've tried to specify the protocol in as much detail as possible, down to what to use as "continuers" in the conversation ("mm hmm"), what to use as reminders and how long to wait before using them ("and now?" after 3 seconds of silence, "go ahead" after another 3 seconds), what to use as "more information" cues (e.g. participant says, "That was weird," I say, "Weird?"), and what to say if something catastrophic happens, like a system crash. Everything is so spelled out that I feel a bit like a computer program when I use think-aloud, working very methodically and checking my constraints before any action. Everything about the study, in fact, that is scriptable has been scripted. It just helps maintain consistency between studies.

Speaking of the study, our first participant comes tomorrow! We've had to change our plans for the study a bit, because though UrbanSim is now installable and runnable (thanks to a "brownie challenge" A.J. put forth on Friday, that was met this afternoon), it still doesn't do any of the interesting things we wanted to test with urban planners, such as generate maps. So this first round of tests will be on the UrbanSim install and on software related to UrbanSim, and instead of using "real" urban planners, we've recruited computer science undergrads and grads, and said in the task description to pretend that they're "technical interns" for an urban planner. So the participant coming tomorrow is a computer science student. We'll run tests on the real UrbanSim interface, with actual urban planners, in September, in order to do data analysis and writing by October 6, the CHI deadline. I may run the remote tests with Seattle and Utah folks from Berkeley (truly remote!), maybe with A.J. on the phone also, and will probably fly up to Seattle for a few days of intensive local tests.

We've had to deal a bit with credibility in this part of the study, especially when the recruiting email meant just for cs grads and undergrads was sent around to other lists, with subject lines varying from the one I put - "UrbanSim Usability study - earn $20" - to "FREE MONEY!" I had to turn a few non-computer science students away, because we wanted to recruit computer-savvy participants who are comfortable installing software, and also don't want to introduce so much variation within the user group that interesting trends are masked.

Last week I set up a "usability lab," and yesterday and today A.J. and I frantically installed and tested all of the software needed. Tomorrow morning we'll run a pilot on Tasha, and the first user study is at 1:00!

For all of the careful planning and meticulous practice that went into the user studies, it feels like they're turning out to be rather unscientific! I've written scripts of what to say, details about what data to collect and why and how we're going to analyze it, and all of the hypotheses we want to test, but things just seem to fall apart when it comes to actually running the studies. For this study I forget to uninstall this piece of software that the last participant installed, for that I don't clear out the cache, for this other I forget to do the debriefing at the end of the tasks ... of the eight studies that I've done so far, fewer than half have run smoothly. They're very taxing, too - today I did three and took a nap from exhaustion when I got home. A.J. is helping me take notes for them, and I think they wear her out as well. Hopefully something interesting will come out of all this, messy as it is - before I go I'll copy the data onto my laptop so I can play around with it in SPSS in the next few weeks, before we run the next set of studies.

You do eighteen studies, and what d'ya get? Another screen capture and deeper in debt ... I must still be a bit giddy, having just finished the last study yesterday. I didn't realize how stressful the studies were until they were over, and the weight has lifted. Of course, the real tasks - data analysis and writeup - have just begun. Today, my last day here at UrbanSim, I'll copy all of the study data onto my laptop, so I can format it in Excel and crunch it in SPSS from Michigan (where I'll be at my mom's annual family reunion) and from Berkeley, before school starts on August 25. Then we'll start planning for the next round of studies, depending on the state of the interface and what interesting data comes out of this round. Currently the plan is for me to fly up to Seattle for a few days around September 8th and run lots of local studies, and either do remote studies up here too, or do REALLY remote studies from Berkeley. Then there'll be mad data analysis and writing for October 6, the CHI deadline.

It's been such a fabulous summer - even though I'm looking forward to seeing David, my aunt, and my friends in Berkeley, I'm also sad to go. My outlook on research has changed, and I feel much more sure that I want to go to graduate school - before, I wasn't sure how capable I was. It's also given me a taste of what it takes to sustain a long-distance relationship, which I may have to do more of if I go out of the Bay Area for graduate school. It was so great to have a mentor like A.J., who spent so much time and effort helping me, and it was wonderful to meet so many enthusiastic, interesting people in computer science and the information school. I'll miss it, in my busy busy semester ahead!

Research Journal

Timeline (as of June 4):