Documentation

PulseClean v1.0 — Wearable Data Preprocessing Methodology

Reference this page when documenting how PulseClean processed wearable data.

Suggested Reference

PulseClean. (2026). Wearable Data Preprocessing Pipeline v1.0. https://pulseclean.vercel.app/docs

01

Cleaning

Missing Values

Forward-fill (last observed value) is applied for gaps of 3 or fewer consecutive missing rows — these receive the missing_interpolated flag. Gaps of 4 or more consecutive rows are left null and receive the long_gap flag. Future values are never used for imputation, reducing the risk of future-data leakage in ML workflows.

Outlier Thresholds

PulseClean applies conservative plausibility thresholds for data cleaning only. These thresholds are not diagnostic criteria.

VitalRange
heart_rate0 – 250 bpm
blood_oxygen85 – 100 %
body_temp34 – 42 °C
resp_rate4 – 60 breaths/min

Duplicate Rows

When identical timestamps are found, the first occurrence is retained and later duplicates are removed before export.

02

Normalization

Timestamps

All timestamps are converted to ISO 8601 UTC regardless of input format or source timezone.

2026-05-21T09:00:00Z

Sample Rate

Original sampling rate is preserved and reported in the sample_rate_seconds and sample_rate_consistent metadata columns.

Units

All measurements are normalized to PulseClean standard units where applicable.

03

Quality Flag Reference

FlagDescription
okRow passes all validation checks
outlier_clampedValue exceeded physiological threshold — set to null in output row
missing_interpolatedShort gap (≤3 rows) filled by forward-fill from the last observed value
long_gapLong gap (≥4 consecutive rows) — value remains null, not estimated
duplicate_removedDuplicate timestamp row removed before export; not present as a row in the output CSV
low_confidenceRow could not be validated with activity-specific confidence rules
04

Standard Schema

All output files conform to this 11-column schema.

ColumnTypeNullableDescription
timestampstringNUTC timestamp (ISO 8601)
heart_ratefloatYHeart rate in bpm
blood_oxygenfloatYBlood oxygen saturation (%)
body_tempfloatYBody temperature (°C)
resp_ratefloatYRespiratory rate (breaths/min)
step_countintYStep count per interval
activitystringYActivity type label
quality_flagstringNQuality classification (see §3)
data_sourcestringNSource format identifier
sample_rate_secondsintYSeconds between samples
sample_rate_consistentboolYWhether sample rate is consistent
05

Version History

v1.0

v1.0 (2026-05) — Initial release. Cleaning, Normalization, Quality Flag pipeline. HealthKit XML + Generic CSV support.

PulseClean is still improving. If a preprocessing rule does not fit your workflow, or if you need support for another wearable format, reach out — we review requests quickly and prioritize updates based on real user needs.

Send feedback →← Home