Google Health & Home · 2024–2025 Mixed Methods · Design Systems · Quant + Qual Google-first study

Redesigning Google Health apps:
de-risking a brand rebrand at scale

A phased 9-week mixed-methods program that answered the question VP stakeholders were nervous to ask — will users accept a major UI redesign across Fitbit and Pixel Watch? The answer required Google's first quantitative design systems study across mobile and smartwatch.

Role
Senior Mixed-Methods UX Researcher
Timeline
9 weeks · June 2024
Team
2 UXDs, 1 UXE, 1 PM, 1 UXR Director
Tools
Dscout, UserTesting, Qualtrics, Figma
Domain
Design systems · Health UI · Mobile + Wearable
Methods
Remote moderated IDIs (n=16) + unmoderated usability + Qualtrics survey (n=206 per study)
Key deliverables
Semantic mental model framework, usability benchmarks, statistical significance report, stakeholder narrative
Impact
Unblocked Director-level support for Fitbit rebrand; established Google's first cross-device Material Design benchmark
Context Challenge Study 1: IDIs Study 2: Quant Impact Reflection
Context

Google wanted to update its health UI — and needed evidence to move forward

In June 2024, the Google Health & Home team set out to update the UI across its health applications — mobile and smartwatch — to align with Material Design and expand business impact. The goal was a rebrand: new color profiles, updated visual language, a more coherent experience across Pixel Watch and the Fitbit mobile app.

But health applications carry a specific risk that other apps don't. Color in health UIs carries meaning — red signals warning, green signals success, yellow signals caution. Changing color profiles isn't just an aesthetic decision; it can affect whether users correctly understand their health data at a glance. VP-level stakeholders were nervous about change, and that nervousness needed to be resolved by research — not opinion.

The organizational stakes: This wasn't a small UI update. It was a cross-device rebrand that would touch Pixel Watch, the Fitbit mobile app, and health dashboards used by millions of users. Director-level support was blocked until research could de-risk the change.


The challenge

Two compounding problems: depth and scale

The research challenge wasn't simple. The team needed two different types of answers at two different levels of evidence. First, they needed depth — a qualitative understanding of how users make sense of color in relation to health data, what mental models they hold, and which data types carry semantic meaning that must be protected. Second, they needed scale — statistically valid evidence that the new design would be well received, that it wouldn't harm usability, and that it would positively impact brand perception.

Neither question could answer the other. Qualitative research alone wouldn't give VP stakeholders the statistical confidence they needed. Quantitative research alone wouldn't explain why certain color profiles worked or failed.

I designed a phased three-study program to answer both — qualitative depth first, quantitative validation second, with an in-person lab study planned as a third phase for contextual device testing.

Phase 01
Empathize
2 weeks
Remote moderated in-depth interviews
Understand how color profiles affect usability, glanceability, and health data comprehension. Surface user mental models around color semantics and identify which data types need protection.
Phase 02
Validate
4 weeks
Remote unmoderated usability + Qualtrics survey
Statistically measure the impact of new UI at scale. Test coherence, familiarity, trust, desirability, and behavioral intent. Run regression to identify which design factors drive business outcomes.
Phase 03
Contextualize
Planned
In-person lab usability (on-device)
Planned next step: contextualize quantitative findings with physical on-device testing, examining glanceability and comprehension in real interaction contexts.

Study 1 — Empathize

In-depth interviews: mapping color semantics and health data mental models

Before running scaled testing, I needed to build a qualitative foundation. The team had multiple new color profiles to evaluate — across mobile and watch — but no framework for understanding how users would interpret them in a health context. I ran 16 remote moderated in-depth interviews designed to surface that framework before a single line of quantitative survey code was written.

Study design
60-minute remote IDI via Dscout. Within-subjects design with counterbalanced presentation of color profiles and devices — each participant evaluated both mobile and watch variants in a randomized order to control for order effects.
n=16 · within-subjects · counterbalanced
Participants
U.S. participants, balanced for age and gender. All current platform users, split 50/50 between short-tenure (0–1 year) and established (3+ year) users. Proprietary segmentation applied.
Balanced age · gender · tenure
Planning (1 week)
Conducted stakeholder interviews to align on research questions. Reviewed previous research and design mocks. Worked with 2 UXDs and 1 UXE to build Figma prototypes specifically calibrated for study needs — not off-the-shelf design files.
Iterative prototype co-design
Fieldwork (1 week)
Developed discussion guide with Likert comprehension metrics. Built a matrixed spreadsheet to track participant responses across color profiles. Live-streamed sessions for team observation. Daily recaps via dedicated chat space kept stakeholders engaged without waiting for a final report.
Live-streamed · daily debriefs
Research questions
Color & usability
How do UI color profiles affect comprehension and glanceability — can users quickly and accurately read their health data across different color variants?
Mental models
What associations do users bring to color in a health context — emotion, temperature, goals, warning states? Which are stable enough to override with new branding?
Semantic protection
Which health data types — heart rate, sleep stages, activity, alerts — carry color-coded meaning that must remain consistent to avoid misinterpretation?
Preference
Which color profiles do users prefer across mobile and watch — and does preference align with comprehension, or diverge from it?
Analysis & deliverables
Semantic protection map
Identified which health data types were amenable to new UI colors and which required consistent color usage to protect semantic meaning — directly informing Material Design color profile constraints.
Mental model frameworks
Created visualized assets describing how participants understand color in relation to health data — organized by emotional associations, temperature metaphors, and goal-tracking conventions.
Usability benchmarks
Measured comprehension and friction for each Material Design color profile across mobile and watch, surfacing which profiles introduced errors and which were well understood.
Recommendations + quotes
Report organized with per-insight recommendations and participant quotes to help the team maintain empathy during design sprint decisions. Each recommendation traced to its evidence.

Study 2 — Validate

Unmoderated usability + survey: statistical validation at scale

The qualitative study told us how users experience the new color profiles. Study 2 answered the executive question: will this redesign hurt the brand, damage trust, or reduce usability at scale? This required a different approach — brand-blind, between-subjects, statistically powered, designed to go across the organization.

I also had to solve a cross-functional alignment challenge before fieldwork could begin. To make these findings credible across Google's product areas, I aligned with the Material Design team to adopt published and validated measurement scales — ensuring that results from this study could be compared against, and socialized to, the broader Material Design ecosystem.

Study design
Remote brand-blind unmoderated. Between-subjects design — participants saw either the baseline or new UI, not both, eliminating contrast bias. Two separate studies: one for mobile, one for watch.
n=206 per study · between-subjects · brand-blind
Instruments
Task-based usability testing via Figma prototypes on UserTesting (URL task success rate). Qualtrics survey measuring: coherence, familiarity, trust, desirability, behavioral intent, and brand perception. Validated scales adopted from published literature and aligned with Material Design team.
Validated scales · cross-PA alignment
Participants
U.S. participants, balanced for age and gender. All current platform users, split 50/50 by tenure. Proprietary segmentation applied. Faced recruitment challenges that required timeline management and stakeholder communication.
412 total participants across 2 studies
Analysis
Statistical significance testing for each question and metric group (two-tailed t-test). Regression analysis (R²) to identify which design and brand metrics drive business outcomes — giving the team prioritization levers, not just pass/fail results.
t-test · regression · driver analysis

As the research findings would be socialized across the org, I needed to provide both statistics and extract findings into layperson terms — "what does it mean" for designers, PMs, and marketing.

— From the case study presentation, December 2024
Key analysis outputs
Usability verification
Unmoderated task-based testing gave confidence in a neutral usability impact — the new UI did not harm users' ability to complete core health tracking tasks.
Statistical significance
Graphs provided the team clear direction on relative differences between baseline and new designs — showing specifically which areas improved, which stayed neutral, and which required attention.
Business outcome drivers
Regression modeling identified branding, design aesthetic, familiarity, and trust as the four key drivers of business outcomes — giving product and marketing concrete levers for the redesign strategy.
Stakeholder narrative
Synthesized statistical findings into a plain-language narrative for VP audiences — translating R² values and t-test results into strategic recommendations about which color profiles to adopt and which to retire.

Impact

Research outcomes

This program delivered two types of impact. The first was organizational: the research unblocked a stalled decision and gave senior stakeholders the evidence they needed to move forward confidently. The second was structural: it established a new measurement baseline that Google had never had before for cross-device health UI research.

Unblocked
Director-level support for Fitbit / Pixel Watch rebrand across the health apps ecosystem
First
Google's first quantitative study of Material Design impacts across mobile and smartwatch
412
Participants across 2 statistically powered studies validating UI change at scale
Product roadmap
Influenced Material Design adoption across the Google health apps ecosystem — with specific color profiles prioritized or retired based on comprehension, glanceability, and accessibility impacts.
Mental model guidelines
Developed user mental model guidelines to inform future health data visualizations — a reusable artifact that extended the value of this study beyond the immediate redesign.
Org-wide benchmark
Provided Google with new organizational insights into product coherence across mobile and watch devices — a baseline that future studies in any product area could build on.
VP stakeholder alignment
Assuaged senior stakeholder fears about change — converting a stalled decision into forward momentum with specific, evidence-backed design recommendations.

Note: Specific design artifacts, statistical outputs, and prototype details are available under NDA. Reach out to discuss the full study.


Reflection

What this project sharpened for me

The hardest part of this project wasn't the research methodology — it was the organizational challenge of running rigorous phased research at a pace that matched a product sprint cadence, while keeping VP stakeholders informed without overwhelming them with methodology.

What I learned: the best research communication for executive audiences isn't a summary of the methods — it's a reframing of the decision. The stakeholders weren't nervous about color profiles. They were nervous about making a public commitment to change that users might reject. Research's job was to either confirm that fear or dissolve it — and the way to do that was to use their language (brand, trust, behavioral intent) rather than ours (comprehension, semantic protection, t-test significance).

This is something I now build into every study from the start: who will make a decision based on this research, what they're afraid of, and how the findings will be framed to actually move them.