Drop in reported Hive usage metrics
Resolved
Jan 29, 2025 at 3:00pm UTC
A customer sent malformed data, leading to unnoticed ClickHouse insert errors.
Lack of monitoring for asynchronous inserts prevented early detection.
We did the following measures to resolve this issue:
- Implemented stricter input validation on the usage endpoint.
- Deployed fixes, restoring expected metric levels.
- Integration tests and alerts for asynchronous insert errors were added.
Affected services
Updated
Jan 29, 2025 at 7:00am UTC
We completed the post-mortem. Please reach out to us for more information.
Affected services
Created
Jan 25, 2025 at 11:00pm UTC
A customer-reported drop in usage metrics led to an investigation, revealing that invalid data (negative duration values) caused insert failures in ClickHouse. The issue affected all customers due to batch metric writes.
Affected services