Drop in reported Hive usage metrics
Resolved
Jan 29 at 04:00pm CET
A customer sent malformed data, leading to unnoticed ClickHouse insert errors.
Lack of monitoring for asynchronous inserts prevented early detection.
We did the following measures to resolve this issue:
- Implemented stricter input validation on the usage endpoint.
- Deployed fixes, restoring expected metric levels.
- Integration tests and alerts for asynchronous insert errors were added.
Affected services
Usage reports processing
Updated
Jan 29 at 08:00am CET
We completed the post-mortem. Please reach out to us for more information.
Affected services
Usage reports processing
Created
Jan 26 at 12:00am CET
A customer-reported drop in usage metrics led to an investigation, revealing that invalid data (negative duration values) caused insert failures in ClickHouse. The issue affected all customers due to batch metric writes.
Affected services
Usage reports processing