I've got a quick question that's been bugging me for a long, long time: Let's say I've got a member variable (in a heavily threaded app): private int _firstTime = 0; To make changes to this is easy enough: InterlockedExchange(ref _firstTime, 1); .... or I can use the InterlockedCompareExchange: int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0); No problems here. .... but, if I just want to read the value, I CANNOT do: int myValue = _firstTime; or if (_fistTime==0) Doing these requires a memory barrier of some type (volatile variable, monitor, etc) to have any degree of reliability. Now, reading the value out using InterlockedCompareExchange always seemed a bit silly to me. I don't want to always write: int myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue, Int32.MinValue); I've never found (although not for lack of looking): int myValue = InterlockedRead(ref _firstTime); Of the code that I write, reading the value, and branching based on it's value, is much more common than writing to the value. I have, in the past, used volitile variables for this - but I've stopped due to the various issues regarding volatiles (non atomic reads & writes). Is there a concensus on the best way to do this? It seems like all the solutions are.... incomplete. Essentially I'm looking for a volitile varaiable that is atomically read and written. Unfortuantly, these don't exist in any language I've yet used. -- Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP http://www.coversant.com/blogs/cmullins
On Mon, 30 Apr 2007 13:13:47 -0700, Chris Mullins [MVP] [quoted text, click to view] <cmullins@yahoo.com> wrote: > [...] > Of the code that I write, reading the value, and branching based on it's > value, is much more common than writing to the value. > > I have, in the past, used volitile variables for this - but I've stopped > due to the various issues regarding volatiles (non atomic reads & > writes).
Jon's reply matches what I believed to be true. Can you explain why it is you believe that a non-atomic read or write is possible for a 32-bit value? If that's true, it's a pretty significant error in the way I've approached multi-threaded programming. Which is not out of the question by any means, but if that's the case I really want to know about it and know why it's the case.
[quoted text, click to view] Chris Mullins [MVP] <cmullins@yahoo.com> wrote:
<snip> [quoted text, click to view] > Essentially I'm looking for a volitile varaiable that is atomically > read and written. Unfortuantly, these don't exist in any language > I've yet used.
I'm not sure what you're after that a simple volatile *wouldn't* give you here, given that *any* Int32 variable will be accessed atomically, whether it's volatile or not. You can't do atomic increments without something like Interlocked, but simple reads and writes are atomic. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] Chris Mullins [MVP] wrote: > I've got a quick question that's been bugging me for a long, long time: > > Let's say I've got a member variable (in a heavily threaded app): > private int _firstTime = 0; > Doing these requires a memory barrier of some type (volatile variable, > monitor, etc) to have any degree of reliability.
Yes. [quoted text, click to view] > I've never found (although not for lack of looking): > int myValue = InterlockedRead(ref _firstTime);
Perhaps Thread.VolatileRead() is what you're looking for? [quoted text, click to view] > I have, in the past, used volitile variables for this - but I've stopped due > to the various issues regarding volatiles (non atomic reads & writes).
Can you expand on your objection to volatile variables? [quoted text, click to view] > Is there a concensus on the best way to do this? It seems like all the > solutions are.... incomplete. > > Essentially I'm looking for a volitile varaiable that is atomically read and > written. Unfortuantly, these don't exist in any language I've yet used.
I'm not aware of any circumstances where normally aligned volatile variables of the machine word size or smaller are read or written non-atomically on the CLR. Can you explain more? -- Barry --
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > | Doing these requires a memory barrier of some type (volatile variable, > | monitor, etc) to have any degree of reliability. > > you can. The Interlocked write ensures this across all cpus and cache. Any > thread reading the var after the Interlocked write will see the correct var > so it is a memory barrier without a volatile modifier.
But surely that's only guaranteed if the read *actually* happens after the Interlocked write occurs. If the variable isn't volatile, there's nothing to stop the reading thread from having reordered the reads. For instance: int x=0, y=0; .... Thread 1: Interlocked.Increment (ref x, 1); Interlocked.Increment (ref y, 1); Thread 2: int a = x; int b = y; The JIT *could* reorder the reads of x and y in thread 2, such that you get a=0, b=1, despite the fact that x is incremented before y. That wouldn't be the case with volatile variables. At least, that's how I understand it. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] "Chris Mullins [MVP]" <cmullins@yahoo.com> wrote > I've got a quick question that's been bugging me for a long, long time: > > Let's say I've got a member variable (in a heavily threaded app): > private int _firstTime = 0; > > To make changes to this is easy enough: > InterlockedExchange(ref _firstTime, 1); > > ... or I can use the InterlockedCompareExchange: > int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0); > > No problems here. > > ... but, if I just want to read the value, I CANNOT do: > > int myValue = _firstTime; > or > if (_fistTime==0) >
Why? [quoted text, click to view] > Doing these requires a memory barrier of some type (volatile variable, > monitor, etc) to have any degree of reliability. >
I don't see the need for a memory barrier, unless you need memory ordering guarantees. Volatile access would make sense since you really want to read from memory. However, this only affects how the Jitter generates code. [quoted text, click to view] > Now, reading the value out using InterlockedCompareExchange always seemed > a bit silly to me. I don't want to always write: > > int myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue, > Int32.MinValue);
This looks quite pointless. If you don't need memory ordering guarantees a volatile read is good enough. I don't find the spec now, but I suspect InterlockedCompareExchange has probably both acquire and release semantics. [quoted text, click to view] > > I've never found (although not for lack of looking): > int myValue = InterlockedRead(ref _firstTime); >
This is just a memory fence plus a volatile read. And it's not clear that you actually need the former. [quoted text, click to view] > Of the code that I write, reading the value, and branching based on it's > value, is much more common than writing to the value. >
Again volatile read is what you want. [quoted text, click to view] > I have, in the past, used volitile variables for this - but I've stopped > due to the various issues regarding volatiles (non atomic reads & writes). >
What makes you believe that either reads or writes are non-atomic? They certainly are at CLR instruction level. [quoted text, click to view] > Is there a concensus on the best way to do this? It seems like all the > solutions are.... incomplete.
For a one-time init thing you will probably need atomic compare exchange for acquiring the construction lock and release semantics when signaling successful construction. If you need to wait for construction you'd probably have three states: - uninitialized - under_construction - constructed Checking whether the shared object is constructed is done by a volatile read from the guard variable. If the result is "constructed" you're done. Just a few cycles - maybe satisified from cache. If not, use atomic compare exchange to acquire the "under-construction" lock. If an other thread was faster wait. If not construct object and use store with release semantics to signal object constructed. I'm not sure if there is a really efficient Wait method (in this particular context) in the CLR. Anyway, the standard case (object already constructed) is very fast. [quoted text, click to view] > > Essentially I'm looking for a volitile varaiable that is atomically read > and written. Unfortuantly, these don't exist in any language I've yet > used. >
I think the CLI standard requires that to a certain extent. That said, the text is vague. Bottom line I think you can rely on reads or writes being volatile, but not more complex operations that both read and write, e.g.: volatile int i; do_something_with(i); // ok i = some_value(); // ok i++; // bad no guarantees here But one might argue that this is really a language thing. -hg
| To make changes to this is easy enough: | InterlockedExchange(ref _firstTime, 1); | | ... or I can use the InterlockedCompareExchange: | int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0); | | No problems here. | | ... but, if I just want to read the value, I CANNOT do: | | int myValue = _firstTime; | or | if (_fistTime==0) | | Doing these requires a memory barrier of some type (volatile variable, | monitor, etc) to have any degree of reliability. you can. The Interlocked write ensures this across all cpus and cache. Any thread reading the var after the Interlocked write will see the correct var so it is a memory barrier without a volatile modifier. All locks are ultimatley built on interlocked at the lowest level to get this behavior. So for reads, you can just go "if (var == 2){}". Naturally, this works for a single var. If you have multiple invariants you need to protect, you probabaly need a lock.
| But surely that's only guaranteed if the read *actually* happens after | the Interlocked write occurs. If the variable isn't volatile, there's | nothing to stop the reading thread from having reordered the reads. For | instance: Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a reader will never see x==0 (until it wraps). So a read on x or y is volatile using an interlocked. Naturally, in either method, a reader could read x and y after x was incremented, but before y was incremented because there is no lock here. But this is a different problem (i.e race). If we need both vars to be atomic in respect to each other and handled as an invariant "set", then we need a lock of some type protecting the set. As I said, maybe I am not following your intent and will slap my head. --wjs
So are you saying the JIT can reorder these two lines: int a = x; int b = y; Like so: int b = y; int a = x; ? tia -- William Stacey [C# MVP] [quoted text, click to view] "Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message news:MPG.20a86b07bb54e61e7d@msnews.microsoft.com... | William Stacey [C# MVP] <william.stacey@gmail.com> wrote:
| > | But surely that's only guaranteed if the read *actually* happens after | > | the Interlocked write occurs. If the variable isn't volatile, there's | > | nothing to stop the reading thread from having reordered the reads. For | > | instance: | > | > Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a | > reader will never see x==0 (until it wraps). So a read on x or y is volatile | > using an interlocked. Naturally, in either method, a reader could read x and | > y after x was incremented, but before y was incremented because there is no | > lock here. | | I still believe that it can read y before either of them are | incremented, and x after both of them are incremented. There's nothing | to stop the JIT from reordering the reads to read y before x, because | they're not volatile. I wouldn't be surprised to learn that the | interlocked operation makes sure that when it *does* try to read x and | y, those values are really the latest ones, but I don't think there's | any guarantee about the order of those reads unless at least one of | them is volatile. (It's nearly midnight, so I can't be bothered to work | out which it should be - y, I think.) | | I definitely *wasn't* talking about the atomicity of the two operations | - just the ordering. | | -- | Jon Skeet - <skeet@pobox.com> | http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet | If replying to the group, please do not mail me too
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > | But surely that's only guaranteed if the read *actually* happens after > | the Interlocked write occurs. If the variable isn't volatile, there's > | nothing to stop the reading thread from having reordered the reads. For > | instance: > > Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a > reader will never see x==0 (until it wraps). So a read on x or y is volatile > using an interlocked. Naturally, in either method, a reader could read x and > y after x was incremented, but before y was incremented because there is no > lock here.
I still believe that it can read y before either of them are incremented, and x after both of them are incremented. There's nothing to stop the JIT from reordering the reads to read y before x, because they're not volatile. I wouldn't be surprised to learn that the interlocked operation makes sure that when it *does* try to read x and y, those values are really the latest ones, but I don't think there's any guarantee about the order of those reads unless at least one of them is volatile. (It's nearly midnight, so I can't be bothered to work out which it should be - y, I think.) I definitely *wasn't* talking about the atomicity of the two operations - just the ordering. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] William Stacey [C# MVP] wrote: > So are you saying the JIT can reorder these two lines: > int a = x; > int b = y; > > Like so: > int b = y; > int a = x;
Yes, if x and y are not volatile locations. The language of the spec is that reads and writes to volatile locations are the only observable side effects of memory access; memory accesses may be arbitrarily reordered if they aren't volatile. For example, if the value of y has been loaded to a register because it was used somewhere before the line 'int a = x;', then the value of y used in the line 'int b = y;' may predate the actual in-memory value of x, because the value may be used from the register, rather than freshly loading it from memory. So, volatile can inhibit these kinds of optimizations (avoiding redundant loads). There's more to it than that, volatile reads have acquire semantics, volatile writes release semantics. That has stronger guarantees than just inhibiting optimization, as it affects e.g. write buffer reordering in some processors. -- Barry --
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > So are you saying the JIT can reorder these two lines: > int a = x; > int b = y; > > Like so: > int b = y; > int a = x;
Exactly, so long as there aren't any memory barriers in the way. Memory barriers (which include operations with volatile locations) stop the JIT from moving memory operations beyond the barrier. It's unfortunate that the CLI spec isn't nearly as clear as it might be about all this, but Vance Morrison wrote a great MSDN article about the ..NET 2.0 model (which is stricter than the CLI spec): http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/ -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] William Stacey [C# MVP] wrote: > Here is some more food for the party: > http://msdn2.microsoft.com/en-us/library/ms686355.aspx > > Here he states the compiler will not re-order volatile variable access, > However these operations still *could be re-ordered by the processor. Volatile read has acquire semantics. Loads won't move before an acquire, similarly, writes won't move after a release. These should be implemented by CPU instructions, as explained in the article you linked to, and aren't restricted to just compiler optimizations. This means that volatile read on .NET is sufficient, because the CPU won't reorder such that the second load moves before the first load, because of the acquire semantics (aka read fence). [quoted text, click to view] > In > fact, he shows correcting this issue with an Interlocked operation. This > article also state that Interlocked functions *ensure appropriat barriers > for memory ordering.
The have full fence semantics (except the Interlocked*Acquire and Interlocked*Release). [quoted text, click to view] > So (at least for this example) it would seem > Interlocked provides as good or better symatics as the volatile example (and > also prevents CPU re-order).
Volatile also prevents CPU re-order, but asymmetrically: read causes acquire, write causes release. The write side release means that e.g. "publishing" a new instance to a static volatile variable won't have a problem such that some fields in the instance aren't fully retired before other threads could read the instance reference - writes won't move ahead of the release / write fence. -- Barry --
Here is some more food for the party: http://msdn2.microsoft.com/en-us/library/ms686355.aspx Here he states the compiler will not re-order volatile variable access, However these operations still *could be re-ordered by the processor. In fact, he shows correcting this issue with an Interlocked operation. This article also state that Interlocked functions *ensure appropriat barriers for memory ordering. So (at least for this example) it would seem Interlocked provides as good or better symatics as the volatile example (and also prevents CPU re-order). However, I am not sure any this matters for this example, as in both cases you still have the race issue. R1 can read x, then W1 writes x and y, then R1 reads y - or various combinations of same. So some form of critical section around both x and y would seem to be the proper behavior, unless we don't care about x or y in respect to each other (i.e. simple counters that are not treated as a set). -- William Stacey [C# MVP] PCR concurrency library: www.codeplex.com/pcr PSH Scripts Project www.codeplex.com/psobject [quoted text, click to view] "Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message news:MPG.20a8e1115b0c872180@msnews.microsoft.com... | William Stacey [C# MVP] <william.stacey@gmail.com> wrote:
| > So are you saying the JIT can reorder these two lines: | > int a = x; | > int b = y; | > | > Like so: | > int b = y; | > int a = x; | | Exactly, so long as there aren't any memory barriers in the way. Memory | barriers (which include operations with volatile locations) stop the | JIT from moving memory operations beyond the barrier. | | It's unfortunate that the CLI spec isn't nearly as clear as it might be | about all this, but Vance Morrison wrote a great MSDN article about the | .NET 2.0 model (which is stricter than the CLI spec): | http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/ | | -- | Jon Skeet - <skeet@pobox.com> | http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet | If replying to the group, please do not mail me too
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > Here is some more food for the party: > http://msdn2.microsoft.com/en-us/library/ms686355.aspx > > Here he states the compiler will not re-order volatile variable access, > However these operations still *could be re-ordered by the processor. Hmm. That sounds like it's a broken implementation then. Basically, if the .NET CLR executes code in a way which violates the spec, the implementation is flawed. If the processor could reorder things, the CLR should make sure that that is invisible to the user, beyond what is possible within the CLI spec. I'd be interested to hear Joe Duffy's opinion on that part of the article. [quoted text, click to view] > In fact, he shows correcting this issue with an Interlocked operation. This > article also state that Interlocked functions *ensure appropriat barriers > for memory ordering. So (at least for this example) it would seem > Interlocked provides as good or better symatics as the volatile example (and > also prevents CPU re-order).
No - because only the thread which *calls* the Interlocked method knows that interlocked is involved, whereas with volatile both the reading thread *and* the writing thread know to use memory barriers. Of course, if you use Interlocked in both threads, to both read *and* write the value, then everything will be okay. [quoted text, click to view] > However, I am not sure any this matters for > this example, as in both cases you still have the race issue. R1 can read > x, then W1 writes x and y, then R1 reads y - or various combinations of > same. So some form of critical section around both x and y would seem to be > the proper behavior, unless we don't care about x or y in respect to each > other (i.e. simple counters that are not treated as a set).
If we care about atomicity, there needs to be some kind of locking. If we only care that the change to y is seen after the change to x (i.e. you can't see x=0, y=1) then volatile will do the job but I don't believe that changing the variable with Interlocked and then reading it directly in another thread is guaranteed to work. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] "Barry Kelly" <barry.j.kelly@gmail.com> wrote > There's more to it than that, volatile reads have acquire semantics, > volatile writes release semantics. That has stronger guarantees than > just inhibiting optimization, as it affects e.g. write buffer reordering > in some processors.
I see that in CLI spec, but have you actually verified it? It is not what I would expect and not what I have seen. Oddly, I see that the Jitter even removes dead loads altogher. Anyway, given the rationale in the CLI spec (hardware register access), I don't see how this could possibly work in a reasonably cheap way for the P4 memory model. -hg
| Hmm. That sounds like it's a broken implementation then. Basically, if | the .NET CLR executes code in a way which violates the spec, the | implementation is flawed. If the processor could reorder things, the | CLR should make sure that that is invisible to the user, beyond what is | possible within the CLI spec. | | I'd be interested to hear Joe Duffy's opinion on that part of the | article. I would like to see Joe comment on that as well. If that article is correct, it sounds like "volatile" may not work as people expect it to be working. | No - because only the thread which *calls* the Interlocked method knows | that interlocked is involved, whereas with volatile both the reading | thread *and* the writing thread know to use memory barriers. Of course, | if you use Interlocked in both threads, to both read *and* write the | value, then everything will be okay. The reader does not need to know as it gets the read fence automatically (at the hardware layer) because the Interlocked buts up a *full memory barrier fence so reads can not be reordered at the instruction level. vista has some other versions with seperate read aquire/write aquire methods, but the ones in the framework use the full fence. I am not sure how volatile is implemented under the covers, but it would not suprise me if it also uses an interlocked operation. | If we care about atomicity, there needs to be some kind of locking. | If we only care that the change to y is seen after the change to x | (i.e. you can't see x=0, y=1) then volatile will do the job but I don't | believe that changing the variable with Interlocked and then reading it | directly in another thread is guaranteed to work. hmm. We have seem to cover the same ground. Everything I have seen (more then that link) says it does. I would be interested if you find something to contrary as that would be good info. Also, most (if not all) locks are based on Interlocked in their guts, so if it does not work, then everything in the history of the world is broke. Ok, that is bit drastic... tia -- William Stacey [C# MVP]
Thanks Barry. My assertion has been that Interlocked works here, not that volatile does not. One thing outstanding, however, is that little "blurb" about the cpu being able to reordering reads even with volatile. That is a bit unsettling. -- William Stacey [C# MVP] [quoted text, click to view] "Barry Kelly" <barry.j.kelly@gmail.com> wrote in message news:t4nu339s34tpgovpblna8ilkf2rb2r194e@4ax.com... | William Stacey [C# MVP] wrote:
| | > Here is some more food for the party: | > http://msdn2.microsoft.com/en-us/library/ms686355.aspx | > | > Here he states the compiler will not re-order volatile variable access, | > However these operations still *could be re-ordered by the processor. | | Volatile read has acquire semantics. Loads won't move before an acquire, | similarly, writes won't move after a release. These should be | implemented by CPU instructions, as explained in the article you linked | to, and aren't restricted to just compiler optimizations. This means | that volatile read on .NET is sufficient, because the CPU won't reorder | such that the second load moves before the first load, because of the | acquire semantics (aka read fence). | | > In | > fact, he shows correcting this issue with an Interlocked operation. This | > article also state that Interlocked functions *ensure appropriat barriers | > for memory ordering. | | The have full fence semantics (except the Interlocked*Acquire and | Interlocked*Release). | | > So (at least for this example) it would seem | > Interlocked provides as good or better symatics as the volatile example (and | > also prevents CPU re-order). | | Volatile also prevents CPU re-order, but asymmetrically: read causes | acquire, write causes release. The write side release means that e.g. | "publishing" a new instance to a static volatile variable won't have a | problem such that some fields in the instance aren't fully retired | before other threads could read the instance reference - writes won't | move ahead of the release / write fence. | | -- Barry | | -- | http://barrkel.blogspot.com/
After re-reading, I think I found it. He was talking about VS2005 volatile -vs- VS2003 volatile - they seem to have implemented them better in 2005 using proper acquire and release semantics. That is info I never saw before. The volatile code he shows works correctly with vs2005 compiler, but may not with 2003 compiler, hence the alternative Interlocked example given. Makes more sense now. -- William Stacey [C# MVP] [quoted text, click to view] "William Stacey [C# MVP]" <william.stacey@gmail.com> wrote in message news:epyqE0QkHHA.4516@TK2MSFTNGP03.phx.gbl...
| Thanks Barry. My assertion has been that Interlocked works here, not that | volatile does not. One thing outstanding, however, is that little "blurb" | about the cpu being able to reordering reads even with volatile. That is a | bit unsettling. | | -- | William Stacey [C# MVP] | | [quoted text, click to view] | "Barry Kelly" <barry.j.kelly@gmail.com> wrote in message | news:t4nu339s34tpgovpblna8ilkf2rb2r194e@4ax.com... || William Stacey [C# MVP] wrote:
|| || > Here is some more food for the party: || > http://msdn2.microsoft.com/en-us/library/ms686355.aspx || > || > Here he states the compiler will not re-order volatile variable access, || > However these operations still *could be re-ordered by the processor. || || Volatile read has acquire semantics. Loads won't move before an acquire, || similarly, writes won't move after a release. These should be || implemented by CPU instructions, as explained in the article you linked || to, and aren't restricted to just compiler optimizations. This means || that volatile read on .NET is sufficient, because the CPU won't reorder || such that the second load moves before the first load, because of the || acquire semantics (aka read fence). || || > In || > fact, he shows correcting this issue with an Interlocked operation. | This || > article also state that Interlocked functions *ensure appropriat | barriers || > for memory ordering. || || The have full fence semantics (except the Interlocked*Acquire and || Interlocked*Release). || || > So (at least for this example) it would seem || > Interlocked provides as good or better symatics as the volatile example | (and || > also prevents CPU re-order). || || Volatile also prevents CPU re-order, but asymmetrically: read causes || acquire, write causes release. The write side release means that e.g. || "publishing" a new instance to a static volatile variable won't have a || problem such that some fields in the instance aren't fully retired || before other threads could read the instance reference - writes won't || move ahead of the release / write fence. || || -- Barry || || -- || http://barrkel.blogspot.com/ | |
|| I'd be interested to hear Joe Duffy's opinion on that part of the || article. I posted correction in thread above. It seems the issue with volatile was with vs2003, not vs2005. --wjs
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > > | Hmm. That sounds like it's a broken implementation then. Basically, if > | the .NET CLR executes code in a way which violates the spec, the > | implementation is flawed. If the processor could reorder things, the > | CLR should make sure that that is invisible to the user, beyond what is > | possible within the CLI spec. > | > | I'd be interested to hear Joe Duffy's opinion on that part of the > | article. > > I would like to see Joe comment on that as well. If that article is > correct, it sounds like "volatile" may not work as people expect it to be > working.
Well, not just how people expect it to work, but how it's *specified* to work. I've now seen your post about it basically being a potential bug in the ..NET 1.1 CLR. [quoted text, click to view] > | No - because only the thread which *calls* the Interlocked method knows > | that interlocked is involved, whereas with volatile both the reading > | thread *and* the writing thread know to use memory barriers. Of course, > | if you use Interlocked in both threads, to both read *and* write the > | value, then everything will be okay. > > The reader does not need to know as it gets the read fence automatically (at > the hardware layer) because the Interlocked buts up a *full memory barrier > fence so reads can not be reordered at the instruction level.
But the read could *already* have been reordered by the JIT, ages before this occurs. How could the JIT know whether a variable would *ever* be used with Interlocked, and avoid reordering? [quoted text, click to view] > vista has > some other versions with seperate read aquire/write aquire methods, but the > ones in the framework use the full fence. I am not sure how volatile is > implemented under the covers, but it would not suprise me if it also uses an > interlocked operation.
But there's more to it than that - there's the restriction on JIT reordering, and *that* can't be known about ahead of time. [quoted text, click to view] > | If we care about atomicity, there needs to be some kind of locking. > | If we only care that the change to y is seen after the change to x > | (i.e. you can't see x=0, y=1) then volatile will do the job but I don't > | believe that changing the variable with Interlocked and then reading it > | directly in another thread is guaranteed to work. > > hmm. We have seem to cover the same ground. Everything I have seen (more > then that link) says it does. I would be interested if you find something > to contrary as that would be good info. Also, most (if not all) locks are > based on Interlocked in their guts, so if it does not work, then everything > in the history of the world is broke. Ok, that is bit drastic... tia
I'm not saying that Interlocked doesn't work. I'm saying that it only works if you use it consistently across all threads. I'll try to get round to mailing Joe about it... -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
| But the read could *already* have been reordered by the JIT, ages | before this occurs. How could the JIT know whether a variable would | *ever* be used with Interlocked, and avoid reordering? I assume it knows the same way it knows a var is volatile - by inspecting the code as it compiles and keeping some state if it sees a var wrapped with interlocked or volatile, it does not reordering. Not sure. I am not sure when and where is actually does these optimizations. I think I read something about tight local loops, but not sure. It would be interesting to see a good article on the subject. | I'll try to get round to mailing Joe about it... Thanks. That would be great to see what he says. -- William Stacey [C# MVP] PCR concurrency library: www.codeplex.com/pcr PSH Scripts Project www.codeplex.com/psobject
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > | But the read could *already* have been reordered by the JIT, ages > | before this occurs. How could the JIT know whether a variable would > | *ever* be used with Interlocked, and avoid reordering? > > I assume it knows the same way it knows a var is volatile - by inspecting > the code as it compiles and keeping some state if it sees a var wrapped with > interlocked or volatile, it does not reordering. Not sure. I am not sure > when and where is actually does these optimizations. I think I read > something about tight local loops, but not sure. It would be interesting to > see a good article on the subject.
When the JIT has to create native code for one method, it might not even have loaded all the assemblies which will manipulate the same variables as that method does. I can't see that it could *possibly* know whether or not some code, somewhere, is going to use it with the Interlocked class. Compare that with finding out whether or not the variable is volatile - you look in a single place, and that's it. [quoted text, click to view] > | I'll try to get round to mailing Joe about it... > > Thanks. That would be great to see what he says.
Yeah. This relies on me finding the time, of course :) -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] Jon Skeet [C# MVP] <skeet@pobox.com> wrote:
<snip> [quoted text, click to view] > > | I'll try to get round to mailing Joe about it... > > > > Thanks. That would be great to see what he says. > > Yeah. This relies on me finding the time, of course :)
I've found the time: http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile- reads-and-interlocked-and-how-they-interact.aspx All comments welcome. I'm sorry that it's clearly written from my viewpoint rather than that of an impartial observer - I hope I haven't misrepresented any views. I'm happy to edit where appropriate. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
"Ecma-335:2006 .... 12.6.5 5. Explicit atomic operations... These operations (e.g. Increment, Decrement, Exchange, and CompareExchange) perform implicit acquire/release operations. ..." So by my read, this contract would require the CLI to respect the same optimization rules as volatile. System.Thread.MemoryBarrier would seem to have to respect proper order too even if the vars are not decorated with volatile. Do you read it different? Thanks. -- William Stacey [C# MVP] [quoted text, click to view] "Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message news:MPG.20abb72ef1c3a592b6@msnews.microsoft.com... | Jon Skeet [C# MVP] <skeet@pobox.com> wrote:
| | <snip> | | > > | I'll try to get round to mailing Joe about it... | > > | > > Thanks. That would be great to see what he says. | > | > Yeah. This relies on me finding the time, of course :) | | I've found the time: | | http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile- | reads-and-interlocked-and-how-they-interact.aspx | | All comments welcome. I'm sorry that it's clearly written from my | viewpoint rather than that of an impartial observer - I hope I haven't | misrepresented any views. I'm happy to edit where appropriate. | | -- | Jon Skeet - <skeet@pobox.com> | http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet | If replying to the group, please do not mail me too
[quoted text, click to view] William Stacey [C# MVP] wrote:
I think we're starting to lose some context. Here's your original reply to Chris: [quoted text, click to view] William wrote: > Chris wrote: > | ... but, if I just want to read the value, I CANNOT do: > | > | int myValue = _firstTime; > | or > | if (_fistTime==0) > | > | Doing these requires a memory barrier of some type (volatile variable, > | monitor, etc) to have any degree of reliability. > > you can. The Interlocked write ensures this across all cpus and cache.
This is the basic point that Jon (and I, and the standard, and Chris Brumme's post that you quote) disagree with you on. An interlocked operation is a full fence (read and write barrier, both acquire and release) on the thread that it was *executed* on. It doesn't affect reads from *other* threads. [quoted text, click to view] > | Basically, both threads need a memory barrier in order to be effective. > | Using Interlocked puts a memory barrier on the calling thread, but > | that's all - IMO :) > > Hi Jon. Two issues here I think: > 1) Interlocked and barrier. Interlocked does provide a full memory barrier. > At least from the docs, this is clear. > 2) Does it prevent reads from being reordered by the CLI? This is not as > clear.
There is no way it could perform its advertised purpose if it didn't prevent related reads moving before it. The CLI needs to pass the location of the target to the interlocked operation, so it can't have related values cached in (e.g.) a register or it would be plainly faulty with only a *single* thread of operation. Since that discounts compiler optimizations that would move related reads before the interlocked operation, the only thing left is CPU semantics - and that's what the Interlocked operation guarantees, it guarantees a full fence for the calling thread. [quoted text, click to view] > http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx : > [extra quoted for clarity -- Barry] > > "We've spent a lot of time thinking about what the correct memory model for > > the CLR should be. If I had to guess, we're going to switch from the ECMA > > model to the following model. I think that we will try to persuade other > > CLI implementations to adopt this same model, and that we will try to change > > the ECMA specification to reflect this. > > > > 1.. Memory ordering only applies to locations which can be globally > > visible or locations that are marked volatile. Any locals that are not > > address exposed can be optimized without using memory ordering as a > > constraint since these locations cannot be touched by multiple threads in > > parallel. > > 2.. Non-volatile loads can be reordered freely. > > 3.. Every store (regardless of volatile marking) is considered a release. > > 4.. Volatile loads are considered acquire. > > 5.. Device oriented software may need special programmer care. Volatile > > stores are still required for any access of device memory. This is > > typically not a concern for the managed developer." > This was wrote back in May 2003, and don't know if this made it into 2.0 or > latter or at all. But he seems to touch on this very issue we are talking > about. Interlocked operations pass the address (reference to the location), so they are "address exposed" and therefore the read can't move before the interlocked operation. Volatile semantics aren't required for this; in fact, not even threads are required for this. An optimization that moved the read before the interlocked call (or any other call by reference) is wrong in every case. [quoted text, click to view] > Order changes can only be applied to locals that are not address > exposed and do not have some volatile symatics around them.
That's not what Chris Brumme wrote. Instead, he writes: [quoted text, click to view] > Chris Brumme (in post above) wrote: > > 1.. Memory ordering only applies to locations which can be globally > > visible or locations that are marked volatile. Any locals that are not > > address exposed can be optimized without using memory ordering as a > > constraint since these locations cannot be touched by multiple threads in > > parallel.
That is to say, order changes are irrelevant (and in fact, can't even be detected, if the compiler has no bugs) for locals that are not address exposed, since only a single thread of execution can ever see their values. That is to say, he says almost exactly the *opposite* of what you said - that order changes can *only* be detected, are *only* relevant, for locations that are globally visible, address exposed or marked volatile. [quoted text, click to view] > As interlocked > and MemoryBarrier has implicit volatile contract, I would assume CLI > inspection would have to honor those same as with volatile keyword. Maybe > not. Would be nice to get a clearer writing on this.
All this is irrelevant to your original point at the root of the thread though, in that, contrary to what you claim, a memory barrier is still required for reads from threads *other* than the one that performed the interlocked operation, if you're going to guarantee that you read what's really there and not some stale value. -- Barry --
| Basically, both threads need a memory barrier in order to be effective. | Using Interlocked puts a memory barrier on the calling thread, but | that's all - IMO :) Hi Jon. Two issues here I think: 1) Interlocked and barrier. Interlocked does provide a full memory barrier. At least from the docs, this is clear. 2) Does it prevent reads from being reordered by the CLI? This is not as clear. It seems that if it implicitly expresses volatile read/write as it claims in ECMA, then it would also prevent such optimizations. Same way you would expect Thread.MemoryBarrier() to do. IIRC, these CLI optimization rules must be very localized and are very strict. They can't just do them all over the place. They can only do them them when they *know there can be no side effects. http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx "We've spent a lot of time thinking about what the correct memory model for the CLR should be. If I had to guess, we're going to switch from the ECMA model to the following model. I think that we will try to persuade other CLI implementations to adopt this same model, and that we will try to change the ECMA specification to reflect this. 1.. Memory ordering only applies to locations which can be globally visible or locations that are marked volatile. Any locals that are not address exposed can be optimized without using memory ordering as a constraint since these locations cannot be touched by multiple threads in parallel. 2.. Non-volatile loads can be reordered freely. 3.. Every store (regardless of volatile marking) is considered a release. 4.. Volatile loads are considered acquire. 5.. Device oriented software may need special programmer care. Volatile stores are still required for any access of device memory. This is typically not a concern for the managed developer." This was wrote back in May 2003, and don't know if this made it into 2.0 or latter or at all. But he seems to touch on this very issue we are talking about. Order changes can only be applied to locals that are not address exposed and do not have some volatile symatics around them. As interlocked and MemoryBarrier has implicit volatile contract, I would assume CLI inspection would have to honor those same as with volatile keyword. Maybe not. Would be nice to get a clearer writing on this. -- William Stacey [C# MVP]
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > "Ecma-335:2006 > ... > 12.6.5 > > 5. Explicit atomic operations... These operations (e.g. Increment, > Decrement, Exchange, and CompareExchange) perform implicit acquire/release > operations. ..." > > So by my read, this contract would require the CLI to respect the same > optimization rules as volatile. System.Thread.MemoryBarrier would seem to > have to respect proper order too even if the vars are not decorated with > volatile. Do you read it different? Thanks.
The operations on Interlocked *do* perform implicit acquire/release operations - but the *reading* operations don't. That's the problem. To put it an alternative way, it is (in some ways - not all) a bit like acquiring a lock while writing a value, but not acquiring the same lock when reading. Yes, you've made sure that the data is available to be read with the memory barrier on the writing thread, but you haven't made sure that the reading thread actually performs the read when you expect it to. Basically, both threads need a memory barrier in order to be effective. Using Interlocked puts a memory barrier on the calling thread, but that's all - IMO :) -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > | All this is irrelevant to your original point at the root of the thread > | though, in that, contrary to what you claim, a memory barrier is still > | required for reads from threads *other* than the one that performed the > | interlocked operation, if you're going to guarantee that you read what's > | really there and not some stale value. > > Hi Barry. I address this part first as it is key to everything else. There > *is a full fence on interlocked methods (save the newer win32 ones like > InterlockedIncrementAcquire that provide acquire and release semantics). It > is *gaureenteed* by the OS and hw. It is documented in many places (see > some below). There can never be stale data, as hw ensures this across all > cpus and caches. After an Interlock completes, any and all threads will see > only the last data. This is one reason why it is a relatively expensive > call. Where is it documented there is not a full barrier?
There *is* a full barrier, *within that thread*. Not within all others. Any thread which actually *performs* a read after the Interlock completes will indeed see the new value. The point is that with reordering in the reading thread, there's no guarantee that it will make any attempt to read the value after the Interlock completes - it could have read it before the Interlocked operation executes, even if a value is later read which looks like it should have been read beforehand. Suppose the source code is: Thread 1 int b = y; int a = x; Thread 2 Interlocked.Increment (ref x); Interlocked.Increment (ref y); The first thread's code can be reordered by the JIT, and then executed as: Thread 1 Thread 2 int a = x; Interlocked.Increment (ref x); Interlocked.Increment (ref y); int b = y; Note the reordering. The guarantee that threads reading after Interlocked.Increment see the new values is still intact - it's just that x has been read before y, which appears to be wrong according to the original source code. This is because there aren't any memory barriers in the reading thread. Could you point to where in the ECMA spec it says anything about a memory barrier on one thread affecting what is possible in terms of reordering on another thread? It's important to understand that this *isn't* just a case of hardware caches and memory fences etc - the JIT itself can reorder the reads, which is what I've been trying to show above. [quoted text, click to view] I don't see anything in those pages to prevent the JIT from reordering the reads on the reading thread. Half of them aren't even about .NET, which means they won't be addressing the JIT-reordering issue at all. I haven't heard back from Joe yet - I might mail Chris Brumme to see if he'd care to comment. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
| All this is irrelevant to your original point at the root of the thread | though, in that, contrary to what you claim, a memory barrier is still | required for reads from threads *other* than the one that performed the | interlocked operation, if you're going to guarantee that you read what's | really there and not some stale value. Hi Barry. I address this part first as it is key to everything else. There *is a full fence on interlocked methods (save the newer win32 ones like InterlockedIncrementAcquire that provide acquire and release semantics). It is *gaureenteed* by the OS and hw. It is documented in many places (see some below). There can never be stale data, as hw ensures this across all cpus and caches. After an Interlock completes, any and all threads will see only the last data. This is one reason why it is a relatively expensive call. Where is it documented there is not a full barrier? Here is some linq: http://msdn2.microsoft.com/en-us/library/ms684122.aspx http://www.gamasutra.com/features/20060630/paquet_01.shtml http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx -- William Stacey [C# MVP]
| There *is* a full barrier, *within that thread*. Not within all others. What would be the point of a full barrier if it did not work across threads? You would not need it for a single thread. The full barrier ensures the cpu can not reorder reads and all reads will see the last write. But, I think your talking about reorder by the CLI as below. | Any thread which actually *performs* a read after the Interlock | completes will indeed see the new value. The point is that with | reordering in the reading thread, there's no guarantee that it will | make any attempt to read the value after the Interlock completes - it | could have read it before the Interlocked operation executes, even if a | value is later read which looks like it should have been read | beforehand. Suppose the source code is: | | Thread 1 | int b = y; | int a = x; | | Thread 2 | Interlocked.Increment (ref x); | Interlocked.Increment (ref y); | | The first thread's code can be reordered by the JIT, and then executed | as: | | Thread 1 Thread 2 | int a = x; | Interlocked.Increment (ref x); | Interlocked.Increment (ref y); | int b = y; | | Note the reordering. The guarantee that threads reading after | Interlocked.Increment see the new values is still intact - it's just | that x has been read before y, which appears to be wrong according to | the original source code. | | This is because there aren't any memory barriers in the reading thread. I understand your point. I am playing devils advocate with myself trying to find proof either way. But, by my read (reading thru the sparse lines) the JIT is *not allowed to make this optimization here. It can only reorder when: 1) There is no volatile 2) The var is not address exposed (e.g. local). 3) There is no other explicit or implicit memory barriers protecting the var (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc) So this would fail test #3 and not allow reorder by JIT. The ecma I posted hints at this, but is far from detailed on this. Chris B talks more about this, but don't know when/if he implemented it in the CLR. Naturally, I could be nuts. It would not be the first time. Thank you. -- William Stacey [C# MVP]
There's quite a bit of dicusssion going on with regard to memory models and interlocked writes and standard reads. I figured I would write a little program and see if I can make the problem show up. The sample app I wrote below does, every once in a while, go "Boom". This means the memory model is indeed reordering the reads of test and test2. My environment for running this test is: Dual-Core AMD Athlon Windows XP x64 Compiled in Release Mode, using "Any CPU" Running without a debugger attached. I've run the test using "Any CPU", "x86" and "x64" as specific compile targets. Same result each time. Sometimes it runs for a long time without failing, other times I get a ton of failures. For example: .... Boom : 287884447, 287884446 Boom : 287885420, 287885419 Boom : 287885780, 287885779 Boom : 287886016, 287886015 Boom : 287887440, 287887439 Boom : 287887798, 287887797 using System; using System.Collections.Generic; using System.Text; using System.Threading; namespace ConsoleApplication4 { class Program { static long test = 0; static long test2 = 0; static void Main(string[] args) { Thread t1 = new Thread(ReadThreadProc); Thread t2 = new Thread(ReadThreadProc); Thread t3 = new Thread(WriteThreadProc); t1.IsBackground = true; t2.IsBackground = true; t3.IsBackground = true; t1.Start(); t2.Start(); t3.Start(); Console.WriteLine("Press any key to exit"); Console.ReadLine(); } private static void WriteThreadProc() { while (true) { Interlocked.Increment(ref test); Interlocked.Increment(ref test2); } } private static void ReadThreadProc() { while (true) { long v1 = test; long v2 = test2; if (v1 > v2) Console.WriteLine("Boom : {0}, {1}", v1, v2); } } } } -- Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP http://www.coversant.com/blogs/cmullins
There's a (very obvious) bug in my test. Please ignore that last post... -- Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP http://www.coversant.com/blogs/cmullins [quoted text, click to view] "Chris Mullins [MVP]" <cmullins@yahoo.com> wrote in message news:eJAdpbAlHHA.4852@TK2MSFTNGP03.phx.gbl... > There's quite a bit of dicusssion going on with regard to memory models > and interlocked writes and standard reads. > > I figured I would write a little program and see if I can make the problem > show up. > The sample app I wrote below does, every once in a while, go "Boom". > > This means the memory model is indeed reordering the reads of test and > test2. > > My environment for running this test is: > Dual-Core AMD Athlon > Windows XP x64 > Compiled in Release Mode, using "Any CPU" > Running without a debugger attached. > > I've run the test using "Any CPU", "x86" and "x64" as specific compile > targets. Same result each time. > > Sometimes it runs for a long time without failing, other times I get a ton > of failures. For example: > ... > Boom : 287884447, 287884446 > Boom : 287885420, 287885419 > Boom : 287885780, 287885779 > Boom : 287886016, 287886015 > Boom : 287887440, 287887439 > Boom : 287887798, 287887797 > > > > using System; > using System.Collections.Generic; > using System.Text; > using System.Threading; > > namespace ConsoleApplication4 > { > class Program > { > static long test = 0; > static long test2 = 0; > > static void Main(string[] args) > { > Thread t1 = new Thread(ReadThreadProc); > Thread t2 = new Thread(ReadThreadProc); > Thread t3 = new Thread(WriteThreadProc); > > t1.IsBackground = true; > t2.IsBackground = true; > t3.IsBackground = true; > > t1.Start(); > t2.Start(); > t3.Start(); > > Console.WriteLine("Press any key to exit"); > Console.ReadLine(); > > } > > private static void WriteThreadProc() > { > while (true) > { > Interlocked.Increment(ref test); > Interlocked.Increment(ref test2); > } > } > > private static void ReadThreadProc() > { > while (true) > { > long v1 = test; > long v2 = test2; > > if (v1 > v2) > Console.WriteLine("Boom : {0}, {1}", v1, v2); > } > } > } > } > > > -- > Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP > http://www.coversant.com/blogs/cmullins > > >
[quoted text, click to view] > cpus and caches. After an Interlock completes, any and all threads will > see > only the last data. This is one reason why it is a relatively expensive > call. Where is it documented there is not a full barrier?
It is a full memory barrier. That means that the new value is actually in RAM, coherent across all processors (some architectures might choose write snooping instead of invalidating caches in other cores) before the Interlocked call returns. This falls far short of forcing other threads to "see the last data", because another thread could have cached the value in a CPU register (hoisting a complex expression outside a loop, anyone?). Access to the memory address is guaranteed to yield the new value, but only volatile variables result in memory being addressed on every use of the variable name. Otherwise the compiler can perform alias analysis, determine that the reading thread can't change the variable in the interim, and thus uses the cached value.
[quoted text, click to view] William Stacey [C# MVP] wrote: > | There *is* a full barrier, *within that thread*. Not within all others. > > What would be the point of a full barrier if it did not work across threads?
A full barrier is a combination of both a read barrier and a write barrier. A write barrier makes sure that all pending writes are retired, and is useful just before publishing a value, so that when it is read on a different thread / CPU, there are no pending writes that could catch any readers by surprise if they were to end up being retired a little later (out of order). The hardware-level concept of a write barrier affects only the CPU executing the write barrier, ensuring that all pending writes are retired before the publishing the final value. It's an ordering primitive for that CPU. Yes it does affect other threads, but only in so far as what they can observe from memory, in conjunction with the *other* half of the equation - a read barrier. A read barrier makes sure that no reads / prefetches / what have you have snuck before retrieve a particular value from a location, a value that you expect to have been published by another thread. The read barrier only affects the CPU performing the read. It's directly complementary to a write barrier on the publishing thread, it ensures that you read what you think you read, and not what the CPU inferred you we're going to indirectly read by (e.g.) sneaking ahead and peeking at the instruction stream. Read and write barriers are complementary. Write barriers stop writes moving forward in time, while read barriers stop reads moving backwards in time. If you don't have balanced write and read barriers on threads that are communicating via shared memory without locks, then their reads and writes will overlap, and consequently linear, intuitive, von Neumann expectations of invariants will be broken [unless of course, a finite state machine model for all possible combinations have been worked out and tested, see e.g. Cliff Click's work on a lock-free hash table for Java: http://blogs.azulsystems.com/cliff/2007/05/hardware_visibi.html ..] Again, a read barrier only affects the thread / CPU that it's executed on. It doesn't affect any writers on other threads. And read barriers of some kind (even the implicit ones with 'lock') aren't optional even when the writing thread is using volatile / interlocked / write barriers / other mechanisms. [quoted text, click to view] > You would not need it for a single thread.
Barriers aren't needed for single threads because then there's only one thread context, one CPU, and the OS would have responsibility for any read / write barriers before context switching into / out of your application's thread of control, since there's no concurrency possible with only a single thread. [quoted text, click to view] > The full barrier ensures the cpu > can not reorder reads and all reads will see the last write. But, I think > your talking about reorder by the CLI as below.
Memory barriers / fences only affect ordering of reads and writes of the thread / CPU they were executed on, though. [quoted text, click to view] > I understand your point. I am playing devils advocate with myself trying to > find proof either way. But, by my read (reading thru the sparse lines) the > JIT is *not allowed to make this optimization here. It can only reorder > when: > 1) There is no volatile > 2) The var is not address exposed (e.g. local). >> 3) There is no other explicit or implicit memory barriers protecting the var > (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)
I believe you're misreading something big. Quoting from Chris Brumme's post: http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx: [quoted text, click to view] > 1. Memory ordering only applies to locations which > can be globally visible or locations that are marked > volatile. Any locals that are not address exposed can be > optimized without using memory ordering as a constraint > since these locations cannot be touched by multiple threads > in parallel.
This implies that loads and stores can be reordered freely when its non-exposed locals being dealt with. ("Reordered freely" is always subject to correctness on a single-threaded basis, of course, so there are limits to this reordering.) [quoted text, click to view] > 2. Non-volatile loads can be reordered freely.
Now here's something that you seem to have missed: any and all reads (loads) that aren't volatile can be reordered *freely*, even though the location being loaded *isn't* local. [quoted text, click to view] > 3. Every store (regardless of volatile marking) is considered a release.
This was an upgrade in the CLR's memory model over the ECMA memory model, IIRC, though Intel x86 architecture keeps it ok, but it did need specifying for IA64. It implies that a write barrier isn't required when publishing a value, because the CLR (but not ECMA) will handle that situation automatically. [quoted text, click to view] > 4. Volatile loads are considered acquire.
And by contrast, from point 2, non-volatile loads are not considered acquire, so they can freely move backwards in time. This is what makes your original post to the thread incorrect. [quoted text, click to view] > 5. Device oriented software may need special programmer care.
I've little to say about devices in CLR :) [quoted text, click to view] > So this would fail test #3 and not allow reorder by JIT.
I believe the rules you write are mistaken and not supported by the standards or links you post, and in any case, interlocked operations only affect a *single* thread, not all threads. -- Barry --
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote:
<snip> [quoted text, click to view] > 2.. Non-volatile loads can be reordered freely.
And that's the absolutely key one. As far as the reading thread is concerned, it's performing non-volatile loads. It doesn't know that an Interlocked operation will occur anywhere else, and it's not loading a volatile variable - therefore it's a non-volatile load, and the operations can be reordered as shown in the post I've just made. <snip> -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] William Stacey [C# MVP] <william.stacey@gmail.com> wrote: > | There *is* a full barrier, *within that thread*. Not within all others. > > What would be the point of a full barrier if it did not work across threads?
The same point as locking has. Both threads need to try to acquire a lock before it becomes particularly useful, but I don't think you'd argue that it *isn't* useful. [quoted text, click to view] > You would not need it for a single thread. The full barrier ensures the cpu > can not reorder reads and all reads will see the last write.
.... if they perform a genuine read *after* that point, yes. [quoted text, click to view] > But, I think your talking about reorder by the CLI as below.
Yes - although from the memory model point of view (which doesn't know about CPU fences) it's irrelevant. The memory model makes certain guarantees without saying whether the optimisations which can be performed are due to the JIT or the CPU. <snip> [quoted text, click to view] > | Note the reordering. The guarantee that threads reading after > | Interlocked.Increment see the new values is still intact - it's just > | that x has been read before y, which appears to be wrong according to > | the original source code. > | > | This is because there aren't any memory barriers in the reading thread. > > I understand your point. I am playing devils advocate with myself trying to > find proof either way. But, by my read (reading thru the sparse lines) the > JIT is *not allowed to make this optimization here. It can only reorder > when: > 1) There is no volatile > 2) The var is not address exposed (e.g. local). > 3) There is no other explicit or implicit memory barriers protecting the var > (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)
You seem to be treating these as an "and" whereas it's not - otherwise you could never have reordering of *any* non-local reads. I agree with 1. I believe you're misreading the spec on point 2 - it's not a condition, it's just that *any* reads/writes can be reordered. [quoted text, click to view] > So this would fail test #3 and not allow reorder by JIT. The ecma I posted > hints at this, but is far from detailed on this. Chris B talks more about > this, but don't know when/if he implemented it in the CLR. Naturally, I > could be nuts. It would not be the first time. Thank you.
I don't see where it hints at this at all. Out of the 5 rules you mentioned in the spec, the important one is this: "Non-volatile loads can be reordered freely." The statement: int a = x; is *not* a volatile load when x isn't a volatile variable. Just because some other thread may or may not perform an interlocked operation doesn't make loading it volatile. I think that's the crux of our disagreement. So, having reduced it to this (if you agree with me on that much) can you provide any indication that accessing a variable at any point on any thread with an Interlocked operation makes every load of that variable on every thread count as being volatile? I haven't seen any evidence of that. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] Chris Mullins [MVP] <cmullins@yahoo.com> wrote: > There's a (very obvious) bug in my test. > > Please ignore that last post...
If it's any consolation, I made the exact same mistake when I first made the blog post :) -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
Hi, One of the posts in this very interesting threat referred to this article : http://msdn2.microsoft.com:80/en-us/library/ms686355.aspx This article includes the following piece of code (to fix the race condition): volatile int iValue; volatile BOOL fValueHasBeenComputed = FALSE; extern int ComputeValue(); void CacheComputedValue() { if (!fValueHasBeenComputed) { iValue = ComputeValue(); fValueHasBeenComputed = TRUE; } } BOOL FetchComputedValue(int *piResult) { if (fValueHasBeenComputed) { *piResult = iValue; return TRUE; } else return FALSE; } My (complete ignorent) question is this: Must fValueHasBeenComputed really be declared volatile for the program to produce correct results ? Isn't it true that , because iValue is declared volatile, it is impossible that fValueHasBeenComputed = TRUE and the result of ComputeValue() is not stored in iValue ? Of course, because fValueHasBeenComputed is not declared volatile it may be that FetchComputedValue() returns FALSE when in fact iValue does contain the computed value but the other case can never occur. E.g. FetchComputedValue returns TRUE and piResult is incorrect. Are my assumptions correct ? Jos
not obvious to me. what is it? [quoted text, click to view] "Chris Mullins [MVP]" <cmullins@yahoo.com> wrote in message news:unrnxiAlHHA.4904@TK2MSFTNGP05.phx.gbl... > There's a (very obvious) bug in my test. > > Please ignore that last post... > > -- > Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP > http://www.coversant.com/blogs/cmullins > > "Chris Mullins [MVP]" <cmullins@yahoo.com> wrote in message > news:eJAdpbAlHHA.4852@TK2MSFTNGP03.phx.gbl... >> There's quite a bit of dicusssion going on with regard to memory models >> and interlocked writes and standard reads. >> >> I figured I would write a little program and see if I can make the >> problem show up. >> The sample app I wrote below does, every once in a while, go "Boom". >> >> This means the memory model is indeed reordering the reads of test and >> test2. >> >> My environment for running this test is: >> Dual-Core AMD Athlon >> Windows XP x64 >> Compiled in Release Mode, using "Any CPU" >> Running without a debugger attached. >> >> I've run the test using "Any CPU", "x86" and "x64" as specific compile >> targets. Same result each time. >> >> Sometimes it runs for a long time without failing, other times I get a >> ton of failures. For example: >> ... >> Boom : 287884447, 287884446 >> Boom : 287885420, 287885419 >> Boom : 287885780, 287885779 >> Boom : 287886016, 287886015 >> Boom : 287887440, 287887439 >> Boom : 287887798, 287887797 >> >> >> >> using System; >> using System.Collections.Generic; >> using System.Text; >> using System.Threading; >> >> namespace ConsoleApplication4 >> { >> class Program >> { >> static long test = 0; >> static long test2 = 0; >> >> static void Main(string[] args) >> { >> Thread t1 = new Thread(ReadThreadProc); >> Thread t2 = new Thread(ReadThreadProc); >> Thread t3 = new Thread(WriteThreadProc); >> >> t1.IsBackground = true; >> t2.IsBackground = true; >> t3.IsBackground = true; >> >> t1.Start(); >> t2.Start(); >> t3.Start(); >> >> Console.WriteLine("Press any key to exit"); >> Console.ReadLine(); >> >> } >> >> private static void WriteThreadProc() >> { >> while (true) >> { >> Interlocked.Increment(ref test); >> Interlocked.Increment(ref test2); >> } >> } >> >> private static void ReadThreadProc() >> { >> while (true) >> { >> long v1 = test; >> long v2 = test2; >> >> if (v1 > v2) >> Console.WriteLine("Boom : {0}, {1}", v1, v2); >> } >> } >> } >> } >> >> >> -- >> Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP >> http://www.coversant.com/blogs/cmullins >> >> >> > >
[quoted text, click to view] Stelrad Kypski <blah@blah.com> wrote: > not obvious to me. what is it?
The reads are done in the same order as the increments, which means that even without any reordering, it's quite possible to get any of the four possibilities. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
[quoted text, click to view] "Jos Scherders" <thrower@home.nl> wrote in message news:O0OWZdBlHHA.1388@TK2MSFTNGP05.phx.gbl... > Hi, > > One of the posts in this very interesting threat referred to this article > : > > http://msdn2.microsoft.com:80/en-us/library/ms686355.aspx > > This article includes the following piece of code (to fix the race > condition): > > volatile int iValue; > volatile BOOL fValueHasBeenComputed = FALSE; > extern int ComputeValue(); > > void CacheComputedValue() > { > if (!fValueHasBeenComputed) > { > iValue = ComputeValue(); > fValueHasBeenComputed = TRUE; > } > } > > BOOL FetchComputedValue(int *piResult) > { > if (fValueHasBeenComputed) > { > *piResult = iValue; > return TRUE; > } > > else return FALSE; > } > > My (complete ignorent) question is this: > > Must fValueHasBeenComputed really be declared volatile for the program to > produce correct results ? Isn't it true that , because iValue is declared > volatile, it is impossible that fValueHasBeenComputed = TRUE and the > result of ComputeValue() is not stored in iValue ? > Of course, because fValueHasBeenComputed is not declared volatile it may > be that FetchComputedValue() returns FALSE when in fact iValue does > contain the computed value but the other case can never occur. E.g. > FetchComputedValue returns TRUE and piResult is incorrect. > > Are my assumptions correct ? I think so, but it wouldn't be very useful. I assume that if FetchComputedValue() returns FALSE, you would sleep and try again? Only, the compiler is allowed to determine that FetchComputedValue doesn't depend on any volatile variables, so it optimizes by hoisting it outside the loop, and turns this: int buf; while (!FetchComputedValue(&buf)) Sleep(); into int buf; begin: BOOL result = FetchComputedValue(&buf); if (!result) goto end; Sleep(); goto begin; end: into int buf; BOOL result = FetchComputedValue(&buf); if (!result) goto end; begin: Sleep(); goto begin; end: or int buf; if (!FetchComputedValue(&buf)) { while (true) Sleep(); } Whole program optimization (cross-module inlining) makes this a virtual certainty. [quoted text, click to view] > > Jos >
| I believe the rules you write are mistaken and not supported by the | standards or links you post, and in any case, interlocked operations | only affect a *single* thread, not all threads. Thanks Barry. Is this documented somewhere? An Interlocked would hardly be useful if that was the case. They effect other threads by virtue of their atomic nature *and the barrier they impose. In the case of a shared variable, only changed in CPU cache, the execution of the normal interlocked instruction make this var (and all cache memory of current CPU) be committed into physical memory *before executing the interlocked instruction. So other threads will "see" changes from interlocked operations after seeing changes in another shared variable (i.e. release semantics). For example: a++; InterlockedIncrement (ref b); c++; Other threads will only see changes in "a" prior to changes in "b", and will see changes in "c" after changes in b. Critical section lock algorithms, such as the "Peterson's algorithm" rely on this behavior or would not work on SMP hardware. Maybe we are saying the same thing, and I am just not understanding your intent. -- William Stacey [C# MVP]
| Joe Duffy has replied to the blog post, by the way, confirming this. Thanks. Good to see his reply. So he is saying Interlocked *would prevent the clr re-order issue? So doing something like: int a = x; int b = Interlocked.Read(ref y); to your example would fix this example in terms of clr reorder issue? As he said, it is easier to use volatile, but just to finish the thread. Or would you need Interlocked.Read on both lines? -- wjs
|