每當靜下心來好好的寫程式…就能感受到在邏輯的殿堂中自己意志力以及知識的力量,意志力經由手指的敲打化為真實的程式,就像是魔法師一般。看它順利的完成任務,心裏總是開心的滿足的。這單純的喜悅正是為什麼如此著迷於設計並實作軟體,即便是其它人不容易感受到內心的喜悅。
然而…如果想要改善台灣軟體環境的現況,我需要學會的…真的遠遠超過把軟體寫好的能力。夜深人靜時,總是不停的思索這個問題…好像看得到未來的方向,又好虛無飄渺。嗯,我必需學會另一種高度…
文章列表
2010年6月9日 星期三
Eclair Libgralloc Deadlock Problem
As we are developing 0xdroid beagle-eclair branch, we occasionally encounter screen flipping issue. This issue rarely happens however it bothers the user experience very much when running some resource eating applications. Last week in the Computex Taipei 2010 show, we demoed 0xdroid beagle-eclair connecting wireless modules and played games. I noticed that this issue happens very often while playing a game called "Frozen Bubble". (It's a good game, we all love this game a lot, and spent a lot of time on it. ;-) ) It's kind of embarrassing when the screen keeps flipping on the show. Therefore I decided to dig out this issue.
Beside of 0xdroid on beagleboard and Devkit8000, I tested some other platforms and find out actually almost all of them having this problem. Therefore I suspected it's not a hardware related problem. Maybe a framework or HAL issue. We noticed that when the screen is flipping, the logcat message will complain as following
Beside of 0xdroid on beagleboard and Devkit8000, I tested some other platforms and find out actually almost all of them having this problem. Therefore I suspected it's not a hardware related problem. Maybe a framework or HAL issue. We noticed that when the screen is flipping, the logcat message will complain as following
E/SurfaceFlinger( 768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS) E/gralloc ( 768): handle 0x13f8c0 not locked ~E/gralloc ( 768): handle 0x13f8c0 already locked for write E/libagl ( 768): eglSwapBuffers() failed to lock buffer 0x1368e0 (640x480) E/SurfaceFlinger( 768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS) E/gralloc ( 768): handle 0x13f8c0 not locked E/gralloc ( 768): handle 0x13f8c0 already locked for write E/libagl ( 768): eglSwapBuffers() failed to lock buffer 0x1368e0 (640x480) E/SurfaceFlinger( 768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS) E/gralloc ( 768): handle 0x13f8c0 not locked E/gralloc ( 768): handle 0x13f8c0 already locked for write E/libagl ( 768): eglSwapBuffers() failed to lock buffer 0x1368e0 (640x480) E/SurfaceFlinger( 768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS) E/gralloc ( 768): handle 0x13f8c0 not locked
Therefore I checked the libgralloc and adding some debug message. The libgralloc plugin 0xdroid used is branched from original eclair source tree. I took few hours created the omap3/libgralloc at the first day when I got eclair source code months ago. Since it works well for the most of time, I didn't pay too much attention to it, until I found the deadlock issue goes crazy on frozen bubble.
After noticing the lock log and swap error, I took a close look of the gralloc_lock and gralloc_unlock in hardware/omap3/libgralloc/mapper.c
After noticing the lock log and swap error, I took a close look of the gralloc_lock and gralloc_unlock in hardware/omap3/libgralloc/mapper.c
int gralloc_lock(gralloc_module_t const* module, buffer_handle_t handle, int usage, int l, int t, int w, int h, void** vaddr) { if (private_handle_t::validate(handle) < 0) return -EINVAL; int err = 0; private_handle_t* hnd = (private_handle_t*)handle; int32_t current_value, new_value; int retry; do { current_value = hnd->lockState; new_value = current_value; if (current_value & private_handle_t::LOCK_STATE_WRITE) { // already locked for write LOGE("handle %p already locked for write", handle); return -EBUSY; } else if (current_value & private_handle_t::LOCK_STATE_READ_MASK) { // already locked for read if (usage & (GRALLOC_USAGE_SW_WRITE_MASK | GRALLOC_USAGE_HW_RENDER)) { LOGE("handle %p already locked for read", handle); return -EBUSY; } else { // this is not an error //LOGD("%p already locked for read... count = %d", // handle, (current_value & ~(1<<31))); } } // not currently locked if (usage & (GRALLOC_USAGE_SW_WRITE_MASK | GRALLOC_USAGE_HW_RENDER)) { // locking for write new_value |= private_handle_t::LOCK_STATE_WRITE; } new_value++; retry = android_atomic_cmpxchg(current_value, new_value, } while (retry); if (new_value & private_handle_t::LOCK_STATE_WRITE) { // locking for write, store the tid hnd->writeOwner = gettid(); } if (usage & (GRALLOC_USAGE_SW_READ_MASK | GRALLOC_USAGE_SW_WRITE_MASK)) { if (!(current_value & private_handle_t::LOCK_STATE_MAPPED)) { // we need to map for real pthread_mutex_t* const lock = &sMapLock; pthread_mutex_lock(lock); if (!(hnd->lockState & private_handle_t::LOCK_STATE_MAPPED)) { err = gralloc_map(module, handle, vaddr); if (err == 0) { android_atomic_or(private_handle_t::LOCK_STATE_MAPPED, (volatile int32_t*)&(hnd->lockState)); } } pthread_mutex_unlock(lock); } *vaddr = (void*)hnd->base; } return err; } int gralloc_unlock(gralloc_module_t const* module, buffer_handle_t handle) { if (private_handle_t::validate(handle) < 0) return -EINVAL; private_handle_t* hnd = (private_handle_t*)handle; int32_t current_value, new_value; do { current_value = hnd->lockState; new_value = current_value; if (current_value & private_handle_t::LOCK_STATE_WRITE) { // locked for write if (hnd->writeOwner == gettid()) { hnd->writeOwner = 0; new_value &= ~private_handle_t::LOCK_STATE_WRITE; } } if ((new_value & private_handle_t::LOCK_STATE_READ_MASK) == 0) { LOGE("handle %p not locked", handle); return -EINVAL; } new_value--; } while (android_atomic_cmpxchg(current_value, new_value, (volatile int32_t*)&hnd->lockState)); return 0; }
The code looks reasonably for the first look. Lock and unlock pair looks good. However there is a very tricky part "android_atomic_cmpxchg may fail". Understanding this, it is not hard to see there is a potential bug in gralloc_unlock. If android_atomic_cmpxchg fails, it will run the do while loop for more than once. However for the first run, the hnd->writeOwner will be changed to 0. And then the new_value will not be changed anymore. This lock will goes crazy here after.
The patch solves this problem.
The patch solves this problem.
diff --git a/mapper.cpp b/mapper.cpp index 16ebcc2..1f3e722 100644 --- a/mapper.cpp +++ b/mapper.cpp @@ -267,13 +267,13 @@ int gralloc_unlock(gralloc_module_t const* module, if (current_value & private_handle_t::LOCK_STATE_WRITE) { // locked for write if (hnd->writeOwner == gettid()) { - hnd->writeOwner = 0; new_value &= ~private_handle_t::LOCK_STATE_WRITE; } } if ((new_value & private_handle_t::LOCK_STATE_READ_MASK) == 0) { LOGE("handle %p not locked", handle); + hnd->writeOwner = 0; return -EINVAL; } @@ -282,5 +282,6 @@ int gralloc_unlock(gralloc_module_t const* module, } while (android_atomic_cmpxchg(current_value, new_value, (volatile int32_t*)&hnd->lockState)); + hnd->writeOwner = 0; return 0; }
It make sure the value hnd->writeOwner is the same as the first loop, if android_atomic_cmpxchg fails.
This issue comes from the original eclair source tree, and it is still there, and had been inherited to many different platforms. If you encounter two frames crazily flipping and having the lock message, you may try to take a look of your libgralloc.
This issue comes from the original eclair source tree, and it is still there, and had been inherited to many different platforms. If you encounter two frames crazily flipping and having the lock message, you may try to take a look of your libgralloc.
訂閱:
文章 (Atom)