Re: [eigen] Help on solving a race condition
• To: eigen@xxxxxxxxxxxxxxxxxxx
• Subject: Re: [eigen] Help on solving a race condition
• From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
• Date: Fri, 8 Jun 2012 17:22:41 +0200
• Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=fA/IjGBhNfYUH9Dxj5avsQY2x546aYZr+zwuX1xxxik=; b=FPRU2SkMk101zuOiOX/Wuv5oe8ltz9U0/BJWKDB7NYwwaxVAdf1S41agBHnEwZVksz RCNH/8a4Er12VydK4fh0WntBk+ABXDJglNw6KrH2k4v/HRzuTDWBcHVrSKhTwg4CdTRH 1ebhvGU679M4fctrLwnS2xaWgQSQNgajYoqkyFCnkfz4t0VTmmqvtK88YVuJKGz+Kr1v SteLgKJhTvrd9jPGxiOwGtp2sv8IA7sVlP4Dgy8poSL3+vmdPKzf1YEXJblixRwHeQZz 5kPzqaEwbBZcMzMWl3nUkPt4KIR19ysFR24x6p8B8w+xskAot4MYOiRSIoBta5CzRoH3 hWrA==

```Ok, after doing some benchmark, a critical section is indeed a no go:
x10 slowdown.

On the other hand, it seems that the simple solution of making them

has nearly no impact on the performance. Compared to the cost of
creating a thread, querying the cache size is a no-op. So I will
probably go with that simple solution.

thanks for all the suggestions!

gael.

On Fri, Jun 8, 2012 at 4:57 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Fri, Jun 8, 2012 at 4:35 PM, Hauke Heibel
> <hauke.heibel@xxxxxxxxxxxxxx> wrote:
>
>> #pragma omp critical
>> {
>>  static std::ptrdiff_t m_l1CacheSize =
>> manage_caching_sizes_helper(queryL1CacheSize(),8 * 1024);
>>  static std::ptrdiff_t m_l2CacheSize =
>> manage_caching_sizes_helper(queryTopLevelCacheSize(),1*1024*1024);
>> }
>
> yes that would work, but mutex introduces a too large overhead. This
> function is called for every matrix product!
>
> It seems OpenMP's support for atomics is not very good. I managed to
> make helgrind happy with the following:
>
>  static tbb::atomic<std::ptrdiff_t> m_l1CacheSize;
>  static tbb::atomic<std::ptrdiff_t> m_l2CacheSize;
>  if(!m_l1CacheSize)
>  {
>    std::ptrdiff_t l1 =
> manage_caching_sizes_helper(queryL1CacheSize(),8 * 1024);
>    m_l1CacheSize.fetch_and_store(l1);
>  }
>  if(!m_l2CacheSize)
>  {
>    std::ptrdiff_t l2 =
> manage_caching_sizes_helper(queryTopLevelCacheSize(),1*1024 * 1024);
>    m_l2CacheSize.fetch_and_store(l2);
>  }
>
> but 1) this requires Intel's TBB, 2) I still have to measure the