| Re: [eigen] Help on solving a race condition | 
[ Thread Index | 
Date Index
| More lists.tuxfamily.org/eigen Archives
] 
- To: eigen@xxxxxxxxxxxxxxxxxxx
 
- Subject: Re: [eigen] Help on solving a race condition
 
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
 
- Date: Fri, 8 Jun 2012 17:22:41 +0200
 
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20120113;        h=mime-version:in-reply-to:references:from:date:message-id:subject:to         :content-type:content-transfer-encoding;        bh=fA/IjGBhNfYUH9Dxj5avsQY2x546aYZr+zwuX1xxxik=;        b=FPRU2SkMk101zuOiOX/Wuv5oe8ltz9U0/BJWKDB7NYwwaxVAdf1S41agBHnEwZVksz         RCNH/8a4Er12VydK4fh0WntBk+ABXDJglNw6KrH2k4v/HRzuTDWBcHVrSKhTwg4CdTRH         1ebhvGU679M4fctrLwnS2xaWgQSQNgajYoqkyFCnkfz4t0VTmmqvtK88YVuJKGz+Kr1v         SteLgKJhTvrd9jPGxiOwGtp2sv8IA7sVlP4Dgy8poSL3+vmdPKzf1YEXJblixRwHeQZz         5kPzqaEwbBZcMzMWl3nUkPt4KIR19ysFR24x6p8B8w+xskAot4MYOiRSIoBta5CzRoH3         hWrA==
 
Ok, after doing some benchmark, a critical section is indeed a no go:
x10 slowdown.
On the other hand, it seems that the simple solution of making them
thread-private:
#pragma omp threadprivate(m_l1CacheSize,m_l2CacheSize)
has nearly no impact on the performance. Compared to the cost of
creating a thread, querying the cache size is a no-op. So I will
probably go with that simple solution.
thanks for all the suggestions!
gael.
On Fri, Jun 8, 2012 at 4:57 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Fri, Jun 8, 2012 at 4:35 PM, Hauke Heibel
> <hauke.heibel@xxxxxxxxxxxxxx> wrote:
>
>> #pragma omp critical
>> {
>>  static std::ptrdiff_t m_l1CacheSize =
>> manage_caching_sizes_helper(queryL1CacheSize(),8 * 1024);
>>  static std::ptrdiff_t m_l2CacheSize =
>> manage_caching_sizes_helper(queryTopLevelCacheSize(),1*1024*1024);
>> }
>
> yes that would work, but mutex introduces a too large overhead. This
> function is called for every matrix product!
>
> It seems OpenMP's support for atomics is not very good. I managed to
> make helgrind happy with the following:
>
>  static tbb::atomic<std::ptrdiff_t> m_l1CacheSize;
>  static tbb::atomic<std::ptrdiff_t> m_l2CacheSize;
>  if(!m_l1CacheSize)
>  {
>    std::ptrdiff_t l1 =
> manage_caching_sizes_helper(queryL1CacheSize(),8 * 1024);
>    m_l1CacheSize.fetch_and_store(l1);
>  }
>  if(!m_l2CacheSize)
>  {
>    std::ptrdiff_t l2 =
> manage_caching_sizes_helper(queryTopLevelCacheSize(),1*1024 * 1024);
>    m_l2CacheSize.fetch_and_store(l2);
>  }
>
> but 1) this requires Intel's TBB, 2) I still have to measure the
> overhead of this solution.
>
>
> best,
> Gael.
>
> gael.