messageInfo() looks like this:
func (ms *messageState) messageInfo() *MessageInfo {
mi := ms.LoadMessageInfo()
if mi == nil {
panic("invalid nil message info; this suggests memory corruption due to a race or shallow copy on the message struct")
}
return mi
}
func (ms *messageState) LoadMessageInfo() *MessageInfo {
return (*MessageInfo)(atomic.LoadPointer((*unsafe.Pointer)(unsafe.Pointer(&ms.atomicMessageInfo))))
}
Which is an atomic load and a predictable branch. On x86, this 64-bit
load is just a MOV. On other platforms, like ARM64, there's actual
atomics involved (LDAR).
Meaning, it's cheap, but not free. Eliminate redundant copies of this
(Common Subexpression Elimination).
The newly added benchmarks improve by (geomean) 2.5%:
$ benchstat pre post | head -10
goarch: amd64
cpu: AMD Ryzen Threadripper PRO 3995WX 64-Cores
│ pre │ post │
│ sec/op │ sec/op vs base │
Extension/Has/None-12 106.4n ± 2% 104.0n ± 2% -2.21% (p=0.020 n=10)
Extension/Has/Set-12 116.4n ± 1% 114.4n ± 2% -1.76% (p=0.017 n=10)
Extension/Get/None-12 184.2n ± 1% 181.0n ± 1% -1.68% (p=0.003 n=10)
Extension/Get/Set-12 144.5n ± 3% 140.7n ± 2% -2.63% (p=0.041 n=10)
Extension/Set-12 227.2n ± 2% 218.6n ± 2% -3.81% (p=0.000 n=10)
geomean 149.6n 145.9n -2.42%
I didn't test on ARM64, but the difference should be larger due to the
reduced atomics.
Change-Id: I8eebeb6f753425b743368a7f5c7be4d48537e5c3
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/575036
Reviewed-by: Michael Stapelberg <stapelberg@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Commit-Queue: Nicolas Hillegeer <aktau@google.com>
Auto-Submit: Nicolas Hillegeer <aktau@google.com>
Calling the ProtoReflect method of the newly-constructed
message avoids an allocation in MessageInfo.MessageOf in
the common case of a generated message with an optimized
ProtoReflect method.
Benchmark for creating an empty message, darwin/arm64 M1 laptop:
name old time/op new time/op delta
EmptyMessage/New-10 32.1ns ± 2% 23.7ns ± 2% -26.06% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
EmptyMessage/New-10 64.0B ± 0% 48.0B ± 0% -25.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
EmptyMessage/New-10 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
Change-Id: Ifa3c3ffa8edc76f78399306d0f4964eae4aacd28
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/418677
Reviewed-by: Michael Stapelberg <stapelberg@google.com>
Reviewed-by: Lasse Folger <lassefolger@google.com>
Move all fast-path inputs and outputs into the Input/Output structs.
Collapse all booleans into bitfields.
Change-Id: I79ebfbac9cd1d8ef5ec17c4f955311db007391ca
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/219505
Reviewed-by: Joe Tsai <joetsai@google.com>
Return the size of the field read from the validator, permitting us to
avoid an extra parse when skipping over groups.
Return an UnmarshalOutput from the validator, since it already combines
two of the validator outputs: bytes read and initialization status.
Remove initialization status from the ValidationStatus enum, since it's
covered by the UnmarshalOutput.
Change-Id: I3e684c45d15aa1992d8dc3bde0f608880d34a94b
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/217763
Reviewed-by: Joe Tsai <joetsai@google.com>
Add a place to put microbenchmarks used to justify performance-related changes.
Change-Id: I6e90a3500594b3f6297cee0b8e321a50d0a556ca
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/216480
Reviewed-by: Joe Tsai <joetsai@google.com>