PulseClean v1.0 — Wearable Data Preprocessing Methodology
Reference this page when documenting how PulseClean processed wearable data.
Suggested Reference
PulseClean. (2026). Wearable Data Preprocessing Pipeline v1.0. https://pulseclean.vercel.app/docs
Cleaning
Missing Values
Forward-fill (last observed value) is applied for gaps of 3 or fewer consecutive missing rows — these receive the missing_interpolated flag. Gaps of 4 or more consecutive rows are left null and receive the long_gap flag. Future values are never used for imputation, reducing the risk of future-data leakage in ML workflows.
Outlier Thresholds
PulseClean applies conservative plausibility thresholds for data cleaning only. These thresholds are not diagnostic criteria.
Duplicate Rows
When identical timestamps are found, the first occurrence is retained and later duplicates are removed before export.
Normalization
Timestamps
All timestamps are converted to ISO 8601 UTC regardless of input format or source timezone.
Sample Rate
Original sampling rate is preserved and reported in the sample_rate_seconds and sample_rate_consistent metadata columns.
Units
All measurements are normalized to PulseClean standard units where applicable.
Quality Flag Reference
Standard Schema
All output files conform to this 11-column schema.
Version History
v1.0 (2026-05) — Initial release. Cleaning, Normalization, Quality Flag pipeline. HealthKit XML + Generic CSV support.
PulseClean is still improving. If a preprocessing rule does not fit your workflow, or if you need support for another wearable format, reach out — we review requests quickly and prioritize updates based on real user needs.