[eigen-commits] commit/eigen: sameeragarwal: Speed up row-major matrix-v

[eigen-commits] commit/eigen: sameeragarwal: Speed up row-major matrix-vector product on ARM

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen-commits Archives ]

To: eigen-commits@xxxxxxxxxxxxxxxxxxx
Subject: [eigen-commits] commit/eigen: sameeragarwal: Speed up row-major matrix-vector product on ARM
From: Bitbucket <commits-noreply@xxxxxxxxxxxxx>
Date: Sat, 02 Feb 2019 00:14:21 +0000 (UTC)
Dkim-signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=bitbucket.org; h=mime-version:content-type:content-transfer-encoding:subject:from:to; s=s1; bh=BhFcbU5fnhFLkDl26PQ7CRabJtw=; b=cSIjeDjuuwzDa2yNlMrZRDo 86U+9cBnZozJpRqeqv8GYcZ2pvxitn9uWrdhiVUxY4YsPrP8NyBSokaaVxLSIqXs aJIlPfVl6MVr+xbiAPIXAoVK6ZJR2nowqRL2uLCIiw6tPa7NhsgGpZpETsZjnz6Q 32Ex7vZqCz4aRcnduRqU=

1 new commit in eigen:

https://bitbucket.org/eigen/eigen/commits/b0947001de65/
Changeset:   b0947001de65
User:        sameeragarwal
Date:        2019-02-01 23:23:53+00:00
Summary:     Speed up row-major matrix-vector product on ARM

The row-major matrix-vector multiplication code uses a threshold to
check if processing 8 rows at a time would thrash the cache.

This change introduces two modifications to this logic.

1. A smaller threshold for ARM and ARM64 devices.

The value of this threshold was determined empirically using a Pixel2
phone, by benchmarking a large number of matrix-vector products in the
range [1..4096]x[1..4096] and measuring performance separately on
small and little cores with frequency pinning.

On big (out-of-order) cores, this change has little to no impact. But
on the small (in-order) cores, the matrix-vector products are up to
700% faster. Especially on large matrices.

The motivation for this change was some internal code at Google which
was using hand-written NEON for implementing similar functionality,
processing the matrix one row at a time, which exhibited substantially
better performance than Eigen.

With the current change, Eigen handily beats that code.

2. Make the logic for choosing number of simultaneous rows apply
unifiormly to 8, 4 and 2 rows instead of just 8 rows.

Since the default threshold for non-ARM devices is essentially
unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM
performance. This was verified by running the same set of benchmarks
on a Xeon desktop.
Affected #:  1 file

Repository URL: https://bitbucket.org/eigen/eigen/

--

This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.

Messages sorted by: [ date | thread ]
Next by Date: [eigen-commits] commit/eigen: rmlarsen: Merged in rmlarsen/eigen (pull request #578)
Next by thread: [eigen-commits] commit/eigen: rmlarsen: Merged in rmlarsen/eigen (pull request #578)

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/