I looks like the problem is one of user interface.
Typically, the way rotations are expressed, are through a transform off identity. Whenever you change a widget, a brand new orientation is created, and the transformation matrix is re-calculated from scratch.
It sounds like you are trying to calculate transformation using increments -- i e, After you rotate X 50 and Y 30, you then keep the result of that, and apply X - 50 to the end of whatever the previous rotation was. This is not how modeling UIs work. Modeling UIs start with 0, and then re-create the rotation matrix from identity each time you update the input parameters. You will note that, if you do this, you will also end up at "identity" when X and Y are back to zero.
In general, the only case where you need incremental rotation is in a physics engine, and when you keep incremental rotations, you have to be very careful to re-normalize the quaternion (or re-orthonormalize the matrix) between each step, or you will suffer build-up of skew.
There's also a question of where the point of reference is. Either the point of reference is the local object, or the point of reference is the world. You will express the same rotation differently, depending on how you think about it. (This is why you have to invert most scene graph matrices to create the model matrix you use for rendering.)