2010年11月29日 星期一

有感

每當靜下心來好好的寫程式…就能感受到在邏輯的殿堂中自己意志力以及知識的力量,意志力經由手指的敲打化為真實的程式,就像是魔法師一般。看它順利的完成任務,心裏總是開心的滿足的。這單純的喜悅正是為什麼如此著迷於設計並實作軟體,即便是其它人不容易感受到內心的喜悅。
然而…如果想要改善台灣軟體環境的現況,我需要學會的…真的遠遠超過把軟體寫好的能力。夜深人靜時,總是不停的思索這個問題…好像看得到未來的方向,又好虛無飄渺。嗯,我必需學會另一種高度…

2010年6月9日 星期三

Eclair Libgralloc Deadlock Problem

As we are developing 0xdroid beagle-eclair branch, we occasionally encounter screen flipping issue. This issue rarely happens however it bothers the user experience very much when running some resource eating applications. Last week in the Computex Taipei 2010 show, we demoed 0xdroid beagle-eclair connecting wireless modules and played games. I noticed that this issue happens very often while playing a game called "Frozen Bubble". (It's a good game, we all love this game a lot, and spent a lot of time on it. ;-) ) It's kind of embarrassing when the screen keeps flipping on the show. Therefore I decided to dig out this issue.




Beside of 0xdroid on beagleboard and Devkit8000, I tested some other platforms and find out actually almost all of them having this problem. Therefore I suspected it's not a hardware related problem. Maybe a framework or HAL issue. We noticed that when the screen is flipping, the logcat message will complain as following

E/SurfaceFlinger(  768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS)
E/gralloc (  768): handle 0x13f8c0 not locked
~E/gralloc (  768): handle 0x13f8c0 already locked for write
E/libagl  (  768): eglSwapBuffers() failed to lock buffer 0x1368e0 (640x480)
E/SurfaceFlinger(  768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS)
E/gralloc (  768): handle 0x13f8c0 not locked
E/gralloc (  768): handle 0x13f8c0 already locked for write
E/libagl  (  768): eglSwapBuffers() failed to lock buffer 0x1368e0 (640x480)
E/SurfaceFlinger(  768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS)
E/gralloc (  768): handle 0x13f8c0 not locked
E/gralloc (  768): handle 0x13f8c0 already locked for write
E/libagl  (  768): eglSwapBuffers() failed to lock buffer 0x1368e0 (640x480)
E/SurfaceFlinger(  768): eglSwapBuffers: EGL error 0x3002 (EGL_BAD_ACCESS)
E/gralloc (  768): handle 0x13f8c0 not locked

Therefore I checked the libgralloc and adding some debug message. The libgralloc plugin 0xdroid used is branched from original eclair source tree. I took few hours created the omap3/libgralloc at the first day when I got eclair source code months ago. Since it works well for the most of time, I didn't pay too much attention to it, until I found the deadlock issue goes crazy on frozen bubble.
After noticing the lock log and swap error, I took a close look of the gralloc_lock and gralloc_unlock in hardware/omap3/libgralloc/mapper.c

int gralloc_lock(gralloc_module_t const* module,
        buffer_handle_t handle, int usage,
        int l, int t, int w, int h,
        void** vaddr)
{
    if (private_handle_t::validate(handle) < 0)
        return -EINVAL;

    int err = 0;
    private_handle_t* hnd = (private_handle_t*)handle;
    int32_t current_value, new_value;
    int retry;

    do {
        current_value = hnd->lockState;
        new_value = current_value;

        if (current_value & private_handle_t::LOCK_STATE_WRITE) {
            // already locked for write 
            LOGE("handle %p already locked for write", handle);
            return -EBUSY;
        } else if (current_value & private_handle_t::LOCK_STATE_READ_MASK) {
            // already locked for read
            if (usage & (GRALLOC_USAGE_SW_WRITE_MASK | GRALLOC_USAGE_HW_RENDER)) {
                LOGE("handle %p already locked for read", handle);
                return -EBUSY;
            } else {
                // this is not an error
                //LOGD("%p already locked for read... count = %d", 
                //        handle, (current_value & ~(1<<31)));
            }
        }

        // not currently locked
        if (usage & (GRALLOC_USAGE_SW_WRITE_MASK | GRALLOC_USAGE_HW_RENDER)) {
            // locking for write
            new_value |= private_handle_t::LOCK_STATE_WRITE;
        }
        new_value++;

        retry = android_atomic_cmpxchg(current_value, new_value, 
    } while (retry);

    if (new_value & private_handle_t::LOCK_STATE_WRITE) {
        // locking for write, store the tid
        hnd->writeOwner = gettid();
    }

    if (usage & (GRALLOC_USAGE_SW_READ_MASK | GRALLOC_USAGE_SW_WRITE_MASK)) {
        if (!(current_value & private_handle_t::LOCK_STATE_MAPPED)) {
            // we need to map for real
            pthread_mutex_t* const lock = &sMapLock;
            pthread_mutex_lock(lock);
            if (!(hnd->lockState & private_handle_t::LOCK_STATE_MAPPED)) {
                err = gralloc_map(module, handle, vaddr);
                if (err == 0) {
                    android_atomic_or(private_handle_t::LOCK_STATE_MAPPED,
                            (volatile int32_t*)&(hnd->lockState));
                }
            }
            pthread_mutex_unlock(lock);
        }
        *vaddr = (void*)hnd->base;
    }

    return err;
}

int gralloc_unlock(gralloc_module_t const* module, 
        buffer_handle_t handle)
{
    if (private_handle_t::validate(handle) < 0)
        return -EINVAL;

    private_handle_t* hnd = (private_handle_t*)handle;
    int32_t current_value, new_value;

    do {
        current_value = hnd->lockState;
        new_value = current_value;

        if (current_value & private_handle_t::LOCK_STATE_WRITE) {
            // locked for write
            if (hnd->writeOwner == gettid()) {
                hnd->writeOwner = 0;
                new_value &= ~private_handle_t::LOCK_STATE_WRITE;
            }
        }

        if ((new_value & private_handle_t::LOCK_STATE_READ_MASK) == 0) {
            LOGE("handle %p not locked", handle);
            return -EINVAL;
        }

        new_value--;

    } while (android_atomic_cmpxchg(current_value, new_value, 
            (volatile int32_t*)&hnd->lockState));

    return 0;
}

The code looks reasonably for the first look. Lock and unlock pair looks good. However there is a very tricky part "android_atomic_cmpxchg may fail". Understanding this, it is not hard to see there is a potential bug in gralloc_unlock.  If android_atomic_cmpxchg fails, it will run the do while loop for more than once. However for the first run, the hnd->writeOwner will be changed to 0.  And then the new_value will not be changed anymore. This lock will goes crazy here after.

The patch solves this problem.

diff --git a/mapper.cpp b/mapper.cpp
index 16ebcc2..1f3e722 100644
--- a/mapper.cpp
+++ b/mapper.cpp
@@ -267,13 +267,13 @@ int gralloc_unlock(gralloc_module_t const* module,
         if (current_value & private_handle_t::LOCK_STATE_WRITE) {
             // locked for write
             if (hnd->writeOwner == gettid()) {
-                hnd->writeOwner = 0;
                 new_value &= ~private_handle_t::LOCK_STATE_WRITE;
             }
         }
 
         if ((new_value & private_handle_t::LOCK_STATE_READ_MASK) == 0) {
             LOGE("handle %p not locked", handle);
+            hnd->writeOwner = 0;
             return -EINVAL;
         }
 
@@ -282,5 +282,6 @@ int gralloc_unlock(gralloc_module_t const* module,
     } while (android_atomic_cmpxchg(current_value, new_value, 
             (volatile int32_t*)&hnd->lockState));
 
+    hnd->writeOwner = 0;
     return 0;
 }

It make sure the value hnd->writeOwner is the same as the first loop, if android_atomic_cmpxchg fails.

This issue comes from the original eclair source tree, and it is still there, and had been inherited to many different platforms.  If you encounter two frames crazily flipping and having the lock message, you may try to take a look of your libgralloc.  

2009年12月14日 星期一

[備忘] RGB565 To PNG/JPEG

竟然忘掉了… 寫在這備忘

ffmpeg -vcodec rawvideo -f rawvideo -pix_fmt rgb565 -s 1024x720 -i input.raw -f image2 -vcodec png output.png

2009年10月13日 星期二

Oprofile 0xdroid Android on Beagleboard

Android supports oprofile actually. And you can play happily with that with some oprofile knowledge on G1. However the external/oprofile in Android does not support ARM_V7 for now. To play with it patch the following type and trigger support of ARM_V7


diff --git a/libop/op_cpu_type.c b/libop/op_cpu_type.c
index b9d13de..737f63e 100644
--- a/libop/op_cpu_type.c
+++ b/libop/op_cpu_type.c
@@ -74,6 +74,7 @@ static struct cpu_descr const cpu_descrs[MAX_CPU_TYPE] = {
{ "ppc64 POWER5++", "ppc64/power5++", CPU_PPC64_POWER5pp, 6 },
{ "e300", "ppc/e300", CPU_PPC_E300, 4 },
{ "AVR32", "avr32", CPU_AVR32, 3 },
+ { "ARM V7 PMNC", "arm/armv7", CPU_ARM_V7, 5},
};

static size_t const nr_cpu_descrs = sizeof(cpu_descrs) / sizeof(struct cpu_descr);
diff --git a/libop/op_cpu_type.h b/libop/op_cpu_type.h
index be95ae2..f4db260 100644
--- a/libop/op_cpu_type.h
+++ b/libop/op_cpu_type.h
@@ -72,6 +72,7 @@ typedef enum {
CPU_PPC64_POWER5pp, /**< ppc64 Power5++ family */
CPU_PPC_E300, /**< e300 */
CPU_AVR32, /**< AVR32 */
+ CPU_ARM_V7, /**< ARM V7 */
MAX_CPU_TYPE
} op_cpu;

diff --git a/libop/op_events.c b/libop/op_events.c
index b4a10e7..7f0ed25 100644
--- a/libop/op_events.c
+++ b/libop/op_events.c
@@ -793,6 +793,7 @@ void op_default_event(op_cpu cpu_type, struct op_default_event_descr * descr)
case CPU_ARM_XSCALE2:
case CPU_ARM_MPCORE:
case CPU_ARM_V6:
+ case CPU_ARM_V7:
case CPU_AVR32:
descr->name = "CPU_CYCLES";
break;
diff --git a/opimport_pull b/opimport_pull
index 7dbac4a..bf1f19a 100755
--- a/opimport_pull
+++ b/opimport_pull
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.4 -E
+#!/usr/bin/python -E

import os
import re


And adding event tables for ARMv7

commit f129bca975b1704c06e07df7710d29de13a1e922
Author: Tick Chen <tick@0xlab.org>
Date: Sat Sep 26 22:56:44 2009 +0800

[oprofile] adding metadata of armv7

diff --git a/linux-x86/oprofile/arm/armv7/events b/linux-x86/oprofile/arm/armv7/events
new file mode 100644
index 0000000..2550e41
--- /dev/null
+++ b/linux-x86/oprofile/arm/armv7/events
@@ -0,0 +1,53 @@
+# ARM V7 events
+# From Cortex A8 DDI (ARM DDI 0344B, revision r1p1)
+#
+event:0x00 counters:1,2,3,4 um:zero minimum:500 name:PMNC_SW_INCR : Software increment of PMNC registers
+event:0x01 counters:1,2,3,4 um:zero minimum:500 name:IFETCH_MISS : Instruction fetch misses from cache or normal cacheable memory
+event:0x02 counters:1,2,3,4 um:zero minimum:500 name:ITLB_MISS : Instruction fetch misses from TLB
+event:0x03 counters:1,2,3,4 um:zero minimum:500 name:DCACHE_REFILL : Data R/W operation that causes a refill from cache or normal cacheable memory
+event:0x04 counters:1,2,3,4 um:zero minimum:500 name:DCACHE_ACCESS : Data R/W from cache
+event:0x05 counters:1,2,3,4 um:zero minimum:500 name:DTLB_REFILL : Data R/W that causes a TLB refill
+event:0x06 counters:1,2,3,4 um:zero minimum:500 name:DREAD : Data read architecturally executed (note: architecturally executed = for instructions that are unconditional or that pass the condition code)
+event:0x07 counters:1,2,3,4 um:zero minimum:500 name:DWRITE : Data write architecturally executed
+event:0x08 counters:1,2,3,4 um:zero minimum:500 name:INSTR_EXECUTED : All executed instructions
+event:0x09 counters:1,2,3,4 um:zero minimum:500 name:EXC_TAKEN : Exception taken
+event:0x0A counters:1,2,3,4 um:zero minimum:500 name:EXC_EXECUTED : Exception return architecturally executed
+event:0x0B counters:1,2,3,4 um:zero minimum:500 name:CID_WRITE : Instruction that writes to the Context ID Register architecturally executed
+event:0x0C counters:1,2,3,4 um:zero minimum:500 name:PC_WRITE : SW change of PC, architecturally executed (not by exceptions)
+event:0x0D counters:1,2,3,4 um:zero minimum:500 name:PC_IMM_BRANCH : Immediate branch instruction executed (taken or not)
+event:0x0E counters:1,2,3,4 um:zero minimum:500 name:PC_PROC_RETURN : Procedure return architecturally executed (not by exceptions)
+event:0x0F counters:1,2,3,4 um:zero minimum:500 name:UNALIGNED_ACCESS : Unaligned access architecturally executed
+event:0x10 counters:1,2,3,4 um:zero minimum:500 name:PC_BRANCH_MIS_PRED : Branch mispredicted or not predicted. Counts pipeline flushes because of misprediction
+event:0x12 counters:1,2,3,4 um:zero minimum:500 name:PC_BRANCH_MIS_USED : Branch or change in program flow that could have been predicted
+event:0x40 counters:1,2,3,4 um:zero minimum:500 name:WRITE_BUFFER_FULL : Any write buffer full cycle
+event:0x41 counters:1,2,3,4 um:zero minimum:500 name:L2_STORE_MERGED : Any store that is merged in L2 cache
+event:0x42 counters:1,2,3,4 um:zero minimum:500 name:L2_STORE_BUFF : Any bufferable store from load/store to L2 cache
+event:0x43 counters:1,2,3,4 um:zero minimum:500 name:L2_ACCESS : Any access to L2 cache
+event:0x44 counters:1,2,3,4 um:zero minimum:500 name:L2_CACH_MISS : Any cacheable miss in L2 cache
+event:0x45 counters:1,2,3,4 um:zero minimum:500 name:AXI_READ_CYCLES : Number of cycles for an active AXI read
+event:0x46 counters:1,2,3,4 um:zero minimum:500 name:AXI_WRITE_CYCLES : Number of cycles for an active AXI write
+event:0x47 counters:1,2,3,4 um:zero minimum:500 name:MEMORY_REPLAY : Any replay event in the memory subsystem
+event:0x48 counters:1,2,3,4 um:zero minimum:500 name:UNALIGNED_ACCESS_REPLAY : Unaligned access that causes a replay
+event:0x49 counters:1,2,3,4 um:zero minimum:500 name:L1_DATA_MISS : L1 data cache miss as a result of the hashing algorithm
+event:0x4A counters:1,2,3,4 um:zero minimum:500 name:L1_INST_MISS : L1 instruction cache miss as a result of the hashing algorithm
+event:0x4B counters:1,2,3,4 um:zero minimum:500 name:L1_DATA_COLORING : L1 data access in which a page coloring alias occurs
+event:0x4C counters:1,2,3,4 um:zero minimum:500 name:L1_NEON_DATA : NEON data access that hits L1 cache
+event:0x4D counters:1,2,3,4 um:zero minimum:500 name:L1_NEON_CACH_DATA : NEON cacheable data access that hits L1 cache
+event:0x4E counters:1,2,3,4 um:zero minimum:500 name:L2_NEON : L2 access as a result of NEON memory access
+event:0x4F counters:1,2,3,4 um:zero minimum:500 name:L2_NEON_HIT : Any NEON hit in L2 cache
+event:0x50 counters:1,2,3,4 um:zero minimum:500 name:L1_INST : Any L1 instruction cache access, excluding CP15 cache accesses
+event:0x51 counters:1,2,3,4 um:zero minimum:500 name:PC_RETURN_MIS_PRED : Return stack misprediction at return stack pop (incorrect target address)
+event:0x52 counters:1,2,3,4 um:zero minimum:500 name:PC_BRANCH_FAILED : Branch prediction misprediction
+event:0x53 counters:1,2,3,4 um:zero minimum:500 name:PC_BRANCH_TAKEN : Any predicted branch that is taken
+event:0x54 counters:1,2,3,4 um:zero minimum:500 name:PC_BRANCH_EXECUTED : Any taken branch that is executed
+event:0x55 counters:1,2,3,4 um:zero minimum:500 name:OP_EXECUTED : Number of operations executed (in instruction or mutli-cycle instruction)
+event:0x56 counters:1,2,3,4 um:zero minimum:500 name:CYCLES_INST_STALL : Cycles where no instruction available
+event:0x57 counters:1,2,3,4 um:zero minimum:500 name:CYCLES_INST : Number of instructions issued in a cycle
+event:0x58 counters:1,2,3,4 um:zero minimum:500 name:CYCLES_NEON_DATA_STALL : Number of cycles the processor waits on MRC data from NEON
+event:0x59 counters:1,2,3,4 um:zero minimum:500 name:CYCLES_NEON_INST_STALL : Number of cycles the processor waits on NEON instruction queue or NEON load queue
+event:0x5A counters:1,2,3,4 um:zero minimum:500 name:NEON_CYCLES : Number of cycles NEON and integer processors are not idle
+event:0x70 counters:1,2,3,4 um:zero minimum:500 name:PMU0_EVENTS : Number of events from external input source PMUEXTIN[0]
+event:0x71 counters:1,2,3,4 um:zero minimum:500 name:PMU1_EVENTS : Number of events from external input source PMUEXTIN[1]
+event:0x72 counters:1,2,3,4 um:zero minimum:500 name:PMU_EVENTS : Number of events from both external input sources PMUEXTIN[0] and PMUEXTIN[1]
+event:0xFF counters:0 um:zero minimum:500 name:CPU_CYCLES : Number of CPU cycles
+
diff --git a/linux-x86/oprofile/arm/armv7/unit_masks b/linux-x86/oprofile/arm/armv7/unit_masks
new file mode 100644
index 0000000..02464a3
--- /dev/null
+++ b/linux-x86/oprofile/arm/armv7/unit_masks
@@ -0,0 +1,4 @@
+# ARM V7 PMNC possible unit masks
+#
+name:zero type:mandatory default:0x00
+ 0x00 No unit mask


This way we can play oprofile on beagleboard already. But you cannot analysis it yet.
Because of that prebuild opreport does not supports ARM_v7. Therefore I downloaded and compile the oprofile 0.9.5. Replace those in prebuild, then we can analysis the data happily.


All of these stuff had been done in 0xdroid, therefore you can play directly with 0xdroid.
The default kernel released in http://downloads.0xlab.org/ currently does not set oprofile flags up therefore you will need to set them up and recompile it.


+ CONFIG_OPROFILE_ARMV7=y
+ CONFIG_OPROFILE=y
+ CONFIG_PROFILING=y
+ CONFIG_HAVE_OPROFILE=y
+ CONFIG_TRACEPOINTS=y


You can throw the vmlinux into a usb storage or SD card with VFAT partition as the first partition.

After booting up 0xdroid beagle-cupcake or beagle-donut, you can run


opcontrol —setup —event=CPU_CYCLES:15000:::1:1 \
—vmlinux=/sdcard/vmlinux \
—kernel-range=0xc0008000,0xcfffffff
echo 16 > /dev/oprofile/backtrace_depth


That will setup the oprofiled to trigger sampling for every 15000 clock cycles. The smaller CPU_CYCLES the more heavy loading of profiling and getting more details. The larger CPU_CYCLES the less detail we get and lower profiling loading.
When I am profiling the overhead of camera preview I found one interesting phenomenon. When I use 150000 as sampling CPU_CYCLES, it's about sampling 30 times per second. I cannot get anything meaningful with the sampling rate. This confused me for a while before I realize it's just about the same frame rate with camera. I always sampled at the same point. Therefore even if we samples a lot, the grid of sampling period should be much smaller than what you want to profile. We always may be blind to some samples. We should be aware of that, and we may need to change various CPU_CYCLES profiling the same topic to get more confidence of the result.

When you are ready to profile just enter

opcontrol --start


And then do whatever you want to profile.
Stop oprofile with

opcontrol --stop


After stopping oprofile, you can use a mini usb cable to download all the samples to the host machine, and analysis them.


On device:
1. plug in usb line between laptop and beagleboard (OTG port)
2. netcfg usb0 up
3. ifconfig usb0 192.168.0.202
On you host:
1. sudo ifconfig usb0 192.168.0.200 # beware nm-applet may breaks it, you can set it up.
2. export ADBHOST=192.168.0.202
3. export PATH={Where you put 0xdroid}/out/host/linux-x86/bin:$PATH
4. pkill adb
5. adb devices # If you can see the device then you can do next step, or you may need to checkout what’s wrong.


Then:


cd {Where you put 0xdroid}
. build/envsetup.sh
setpaths
export OPROFILE_EVENTS_DIR=${PWD}/linux-x86/oprofile/
cd external/oprofile
./opimport_pull /tmp/0xdroid-oprofile


Copy your vmlinux to ${OUT}/symbols

Then you can analysis the whole symbols with

${OPROFILE_EVENTS_DIR}/bin/opreport --session-dir=/tmp/0xdroid-oprofile -p ${OUT}/symbols


After analyzing, we can use ooffice, graphvis, gnuplot, or whatever you like to rework the data. For example:







Happy profiling. :-)

2009年8月31日 星期一

murmur

沒什麼,太久沒寫文章,發個聲,証明自已還活著。

今年是一個充滿挑戰的一年,從籌備到成立 0xlab,接受各式各樣的挑戰,同時面對各方面的問題。和一群優秀的人一起工作,做一堆沒人做過的事,真是刺激極了。

這幾個月的目標是建立一個大家可以一同工作的平台,從設備到開發流程的建立。在大家的努力之下,慢慢的把一些東西建立了起來。對內做了相當多的實驗,對外則是提出了一個可以和大家一同工作的軟體平台。第一次的 code drop 之中,把 beagle-cupcake 調到可以玩,可以快速共同開發,容易整合。準備把心中的 item 慢慢一個一個完成。

把基礎打好了,真正的挑戰才要開始。給自已加油,也為大家加油。

這幾個月中之,發現到自已體力上的極限。刻意的放慢自已的腳步,我們是要做長做久的,不可以一下子就把自已燒掉。小心小心切記切記。對於自已一些 FOSS 的案子,真是對使用者感到抱歉,只要有時間和體力,我就會回來看的。 XD

在 lab 進入軌道後,接下來最重要的目標反而是調整好自已:管理好自已的情緒,讓自已更快樂、給自已更多時間,早點下班,多運動,讓自已更健康、訓練自已的表達能力,讓自已更能夠清楚的表達出想法。

希望能給大家和自已一個更好的 Tick. XD

2009年4月27日 星期一

0xlab is opening

0xlab looks very like 0x'1'ab and 0x1ab is 427. Therefore we choose this day to announce our lab. :)
http://0xlab.org

We are a group of software engineers who have strong passion in Free Open Source Software. We believe the power of knowledge and creativity, and we think we can do something very interesting and valuable.

2009年3月29日 星期日

Beagleboard demo

Demo for the last four days developement.