On 32-bit platforms (ESP32 Xtensa), 64-bit shifts in varint parsing
compile to __ashldi3 library calls. Since the vast majority of protobuf
varint fields (message types, sizes, enum values, sensor readings) fit
in 4 bytes, the 64-bit arithmetic is unnecessary overhead on the common
path.
Split parse() into two phases:
- Bytes 0-3: uint32_t loop with native 32-bit shifts (0, 7, 14, 21)
- Bytes 4-9: noinline parse_wide_() with uint64_t, only for BLE
addresses and other 64-bit fields
The code generator auto-detects which proto messages use int64/uint64/
sint64 fields and emits USE_API_VARINT64 conditionally. On non-BLE
configs, parse_wide_() and the 64-bit accessors (as_uint64, as_int64,
as_sint64) are compiled out entirely.
Saves ~40 bytes flash on non-BLE configs. Benchmark shows 25-50%
faster parsing for 1-4 byte varints (the common case).