Re: [eigen] Problem with ei_cache_friendly_product? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Problem with ei_cache_friendly_product?
- From: "Benoit Jacob" <jacob.benoit.1@xxxxxxxxx>
- Date: Wed, 24 Dec 2008 15:03:44 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=zkmusYxrn0cq8cyltFop6eLz+wf9wZEKfm4VUqJR7bo=; b=pwHI/nWWe3ywNZ9n/HZC1tP4HisdzmNK4CXuWHNgk19PD+/181czqwCDb5R394LHfy fn0Jv+fNlLxLk7Kdzt8te1D5DQA+GYTr9eur2MqpsWwVVJPThUhaAnfkijSo8vGPdnGE EyB45b6yIce1vS0Pkd3fIg/GODCRiH6Sbhfso=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=UIR4gMi9EsmQUB/HHRnguwUZtdFt+XS8dOLAEi4bYlmMMR9h0Oq+vQHEMN21xTsdoh 9DXDtTrCPhSQim4x4MeYqrt1P3C90HJ48DM5nh+e3Z+aPD5m7WMBfo+8TY84LlUIoNME ngCaBsbWzmxQt1Ik5X1TfpsB0c7EPBNYI3l3U=
Hi Frank,
You really found a new bug in Eigen. Please SVN up and retry as I
think I fixed it (r901151)... but can't test on windows. Read on for
the explanation...
What happens is that ei_cache_friendly_product allocates some internal
buffer 'block' at line 98:
block = ei_alloc_stack(Scalar, allocBlockSize);
Here it is really important that 'block' be aligned, for at line 182 we have:
const Scalar* EIGEN_RESTRICT localB = &block[offsetblock];
and this 'localB' is the pointer that we use in this crashing ei_pload
call at line 203:
A0 = ei_pload(localB + k*MaxBlockRows);
Now, what is ei_alloc_stack? On linux it is a wrapper around alloca()
which returns very (typically 4096 bytes) aligned pointers, but on all
other platforms we don't assume the presence of a usable alloca() so
.... looking at util/Memory.h line 155:
#else
#define ei_alloc_stack(TYPE,SIZE) new TYPE[SIZE]
#define ei_free_stack(PTR,TYPE,SIZE) delete[] PTR
#endif
That's the bug, obviously... we must use ei_aligned_malloc there
instead of operator new.
As to why it is happens equally with fixed-size or dynamic-size
matrices: all what matters is the size of the matrix, when it is large
enough ei_cache_friendly_matrix is called and it always uses
ei_alloc_stack.
As to why it sometimes seems to depend on the environment: the memory
allocation routines are provided by dynamic libraries and it simply
seems that on your windows setup the dynamic library that is used is
not the same depending on whether you run your program from MSVC or
from the command-line.
Cheers,
Benoit
2008/12/24 FMDSPAM <fmdspam@xxxxxxxxx>:
> Hi Benoit,
>
> The situation is even more wired. The crash not only depends on the code it
> self and if release or debug.
> It also depend on the calling environment.
> Started below MSVC some test run throught w/o problems but do crash when
> started by command prompt.
> See my protocol below
>
> A) Original testcase "Matrix<double,16,16>": started in MSVC or from Command
> promt:
> CacheFriendlyProduct.h:
> 202: PacketType A0, A1, A2, A3, A4, A5;
> +++: std::cout << "k: " << k << ", addr: " << localB + k*MaxBlockRows <<
> std::endl;
> 203: A0 = ei_pload(localB + k*MaxBlockRows);
>
> gives me
> at first (caused by "line b)" of my testcase):
> k: 0, addr: 003214D0
> ... (many lines skipped, see out.txt)
> k: 14, addr: 00321C50
> (do all end with 0 => no alignment issue)
>
> but then (caused by "line d)" of my testcase):
> k: 0, addr: 00126FE8
> (do not end with 0 => alignment issue)
>
> B.1) Your suggestion test case: "MatrixXd" started from MSVC:
> everthing is fine, no crash at all.
>
> B.2) Your suggestion test case: "MatrixXd" started command prompt:
> gives me: "k: 0, addr: 00187F58" at the very first time ( "line b)" )
> and consequently crash.
>
> Unfortunately, most of the inner details of your code is beyond my
> knowledge, but i will run test cases whenever you told me to do :-) .
>
> Maybe something of my setup is broken?
> Could please another MSVC-User run my testcase?
>
> Cheers and merry christmas, Joyeux Noël
> Frank
>
>
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003214D0
> k: 2, addr: 00321550
> k: 4, addr: 003215D0
> k: 6, addr: 00321650
> k: 8, addr: 003216D0
> k: 10, addr: 00321750
> k: 12, addr: 003217D0
> k: 14, addr: 00321850
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> k: 0, addr: 003218D0
> k: 2, addr: 00321950
> k: 4, addr: 003219D0
> k: 6, addr: 00321A50
> k: 8, addr: 00321AD0
> k: 10, addr: 00321B50
> k: 12, addr: 00321BD0
> k: 14, addr: 00321C50
> 10.92 25.29 40.66 56.02 71.39 86.76 102.1 117.5 132.9 148.2 163.6 179 194..3
> 209.7 225.1 240.4
> 9.981 26.48 40.97 56.47 71.96 87.46 103 118.5 133.9 149.4 164.9 180.4 195..9
> 211.4 226.9 242.4
> 10.04 25.66 42.29 56.91 72.54 88.16 103.8 119.4 135 150.7 166.3 181.9 197..5
> 213.2 228.8 244.4
> 10.1 25.85 41.61 58.36 73.11 88.86 104.6 120.4 136.1 151.9 167.6 183.4 199.1
> 214.9 230.6 246.4
> 10.16 26.04 41.92 57.8 74.68 89.56 105.4 121.3 137.2 153.1 169 184.8 200.7
> 216.6 232.5 248.4
> 10.22 26.23 42.24 58.25 74.25 91.26 106.3 122.3 138.3 154.3 170.3 186.3
> 202.3 218.3 234.3 250.3
> 10.28 26.42 42.56 58.69 74.83 90.96 108.1 123.2 139.4 155.5 171.6 187.8
> 203.9 220.1 236.2 252.3
> 10.34 26.61 42.87 59.14 75.4 91.66 107.9 125.2 140.5 156.7 173 189.2 205.5
> 221.8 238 254.3
> 10.4 26.8 43.19 59.58 75.97 92.36 108.8 125.1 142.5 157.9 174.3 190.7 207..1
> 223.5 239.9 256.3
> 10.46 26.98 43.5 60.02 76.54 93.06 109.6 126.1 142.6 160.1 175.7 192.2 208.7
> 225.2 241.7 258.3
> 10.53 27.17 43.82 60.47 77.12 93.77 110.4 127.1 143.7 160.4 178 193.7 210..3
> 226.9 243.6 260.2
> 10.59 27.36 44.14 60.91 77.69 94.47 111.2 128 144.8 161.6 178.3 196.1 211..9
> 228.7 245.4 262.2
> 10.65 27.55 44.45 61.36 78.26 95.17 112.1 129 145.9 162.8 179.7 196.6 214..5
> 230.4 247.3 264.2
> 10.71 27.74 44.77 61.8 78.83 95.87 112.9 129.9 147 164 181 198.1 215.1 233.1
> 249.2 266.2
> 10.77 27.93 45.09 62.25 79.41 96.57 113.7 130.9 148 165.2 182.4 199.5 216..7
> 233.8 252 268.2
> 10.83 28.12 45.4 62.69 79.98 97.27 114.6 131.8 149.1 166.4 183.7 201 218.3
> 235.6 252.9 271.1
> k: 0, addr: 00126FE8
>
>
---