Re: [eigen] Help on solving a race condition |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Help on solving a race condition
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Fri, 8 Jun 2012 17:22:41 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=fA/IjGBhNfYUH9Dxj5avsQY2x546aYZr+zwuX1xxxik=; b=FPRU2SkMk101zuOiOX/Wuv5oe8ltz9U0/BJWKDB7NYwwaxVAdf1S41agBHnEwZVksz RCNH/8a4Er12VydK4fh0WntBk+ABXDJglNw6KrH2k4v/HRzuTDWBcHVrSKhTwg4CdTRH 1ebhvGU679M4fctrLwnS2xaWgQSQNgajYoqkyFCnkfz4t0VTmmqvtK88YVuJKGz+Kr1v SteLgKJhTvrd9jPGxiOwGtp2sv8IA7sVlP4Dgy8poSL3+vmdPKzf1YEXJblixRwHeQZz 5kPzqaEwbBZcMzMWl3nUkPt4KIR19ysFR24x6p8B8w+xskAot4MYOiRSIoBta5CzRoH3 hWrA==
Ok, after doing some benchmark, a critical section is indeed a no go:
x10 slowdown.
On the other hand, it seems that the simple solution of making them
thread-private:
#pragma omp threadprivate(m_l1CacheSize,m_l2CacheSize)
has nearly no impact on the performance. Compared to the cost of
creating a thread, querying the cache size is a no-op. So I will
probably go with that simple solution.
thanks for all the suggestions!
gael.
On Fri, Jun 8, 2012 at 4:57 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Fri, Jun 8, 2012 at 4:35 PM, Hauke Heibel
> <hauke.heibel@xxxxxxxxxxxxxx> wrote:
>
>> #pragma omp critical
>> {
>> static std::ptrdiff_t m_l1CacheSize =
>> manage_caching_sizes_helper(queryL1CacheSize(),8 * 1024);
>> static std::ptrdiff_t m_l2CacheSize =
>> manage_caching_sizes_helper(queryTopLevelCacheSize(),1*1024*1024);
>> }
>
> yes that would work, but mutex introduces a too large overhead. This
> function is called for every matrix product!
>
> It seems OpenMP's support for atomics is not very good. I managed to
> make helgrind happy with the following:
>
> static tbb::atomic<std::ptrdiff_t> m_l1CacheSize;
> static tbb::atomic<std::ptrdiff_t> m_l2CacheSize;
> if(!m_l1CacheSize)
> {
> std::ptrdiff_t l1 =
> manage_caching_sizes_helper(queryL1CacheSize(),8 * 1024);
> m_l1CacheSize.fetch_and_store(l1);
> }
> if(!m_l2CacheSize)
> {
> std::ptrdiff_t l2 =
> manage_caching_sizes_helper(queryTopLevelCacheSize(),1*1024 * 1024);
> m_l2CacheSize.fetch_and_store(l2);
> }
>
> but 1) this requires Intel's TBB, 2) I still have to measure the
> overhead of this solution.
>
>
> best,
> Gael.
>
> gael.