Re: [Sawfish] Bugreport by gentoo

[ Thread Index | Date Index ]


Hi,

> On Sat, 30 Jul 2011 00:14:06 +0200, fuchur wrote:
> > Can someone this reproduces?
> > https://bugs.gentoo.org/show_bug.cgi?id=341855#c11
>
> Anyone? It's the old issue of sawfish inside GNOME, perhaps
> not limited to Gentoo.
> (Fuchur, please tell the subject if you want others to spare
> their time to read. :)
>
> Thanks in advance.
>
> Teika (Teika kazura)

I cannot reproduce exactly the same bug, but nearly ...

I compiled sawfish-1.8.1 and its dependent libs (librep, rep-gtk) on my
gentoo-box.  To test it without a risk of loosing my current desktop, I
first used the nested X server `Xephyr :1', and there sawfish failed to
work. Trying to solve that problem I recompiled sawfish with `-g' and
_without_ `-O2', started 'gdb sawfish', and then the bug disappeared!

That was really confusing, and it took me some time to find out what had
happened. Here it is, to my best knowledge:

In display.c, function 'acquire_manager_selection', at the end, are the lines
    XClientMessageEvent cm;
    ... // other stuff here
    cm.message_type = xa_manager;
    cm.format = 32;
    cm.data.l[0] = startup_time;
    cm.data.l[1] = xa_wm_sn;
    cm.data.l[2] = no_focus_window;
    XSendEvent (dpy, root_window, False, StructureNotifyMask, (XEvent *) &cm);
which seems to be a standard way of announcing a window manager.

However, as I see it the type of the event is never set; it should have been
    cm.type = ClientMessage;
somewhere before the `XSendEvent'.

As a result, the final XEvent which is sent has a type which depends on
whatever garbage happens to be in memory at the location of `cm.type'.
Thus depending on your current memory, your mood, bad luck and the phase
of the moon, this results in an X11 error, or just not (more details
below).

For all of my tests, the `XSendEvent' above does either nothing or
triggers later the (wrong!) sawfish message
   "You may only run one window manager".

IFF an error was triggered, it happened just a few statements after that
`XSendEvent': In `display.c', function `sys_init' are the lines
        ...
        acquire_manager_selection (sel_owner);  // calls the XSendEvent above

        XSetErrorHandler (error_other_wm);
        XSelectInput (dpy, root_window, ROOT_EVENTS);
        XSync (dpy, False);  // HERE THE ERROR HAPPENS!
        XSetErrorHandler (error_handler);
        ...
where the `XSync' finally calls `error_other_wm', which, in this particular
case, gave me that somewhat misleading message from above.

------------------------------------------------------------------------

Now the tricky part: Why does it sometimes work and sometimes not?

The data structure `XClientMessageEvent cm;' lives on the program stack,
and in theory, it should be reproducible which value ends up in the
cm.type member. However, setting a watchpoint under gdb which just
reports any write access to `cm.type' and then continues, gave me one
zillion of such writes, the latest somewhere in librep.so.

Thus for a given fixed setup of sawfish and librep and whatnot,
`cm.type' should always have the same value; but changing that setup,
eg. by recompiling with other flags, maybe by just using other command
line args etc., I don't really know, that value might change.

I tested a few scenarios, with Xephyr as above, with default display :0,
with and without `-O2', with and without `--replace'ing another wm: In
any case `cm.type' was fixed for one scenario, but the value was
nevertheless garbage, and in no case it equals the constant
`ClientMessage'.

So why does it work at all, at least sometimes?

Reading the man page for XSendEvent I learned that XSendEvent may just
return a `False' status if it can not decode its data -- in which case no
event is send anywhere, the request is simply dropped.

If XSendEvent returns `True', the data is indeed passed to the
X server, but AFAIK if no client has any interest in that event, it is
dropped, too, and again nothing happens.

Thus for a real X11 error to happen, the following is needed:
  - There must be just the right amount of garbage in the
    `XClientMessageEvent' data to make it still a valid event somehow
    (Note that even the data members which are supposed to be unused,
    like cm.data.l[3] etc. may count here.)
  - There must be an X11 client which is listening for that kind
    of event.
  - The event data must be wrong enough to have the X server
    (and not the listening client) generate an error.

With my setup it just happens by coincidence that _without_ `-O2'
XSendEvent returns `False' and the statement was more or less ignored.
As a result sawfish could run in all cases without `-O2'.

_With_ `-O2', XSendEvent returns `True', and the data was passed to the
X server. In the case of the server being `Xephyr', it generated an
error, and, by uncommenting the `XSetErrorHandler' line in sawfish,
I really got
    X Error of failed request:  BadValue (integer parameter out of range for operation)
      Major opcode of failed request:  25 (X_SendEvent)
      ...

In the case of just X on display :0, no one seems to listen for that
spurious event, and it was sent to the nirvana. At least I guess so,
because even with `XSynchronise' added short before that `XSendEvent', I
found no trace left of it. I have not debugged the real X, though.

------------------------------------------------------------------------

Well, this mail has become somewhat lengthy, however, last night's
debug session was lengthy, too, so this is just fair.

Adding the above `cm.type = ClientMessage;' solved all issues for me,
and if the developers can approve that, we can maybe close the gentoo bug
    https://bugs.gentoo.org/show_bug.cgi?id=341855#c11
(There I have reported that _not_ using `-O2' solved the bug of
comment #11; however, that was of course not the wanted solution
for a version bump, and it might or might not work for others).

Cheers!


---
--
Sawfish ML


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/