Add exynos video driver

Documentation is provided in README-exynos.
This commit is contained in:
Tobias Jakobi 2013-11-24 17:28:21 +01:00
parent 9a3ac9c5fd
commit 7efa9def07
10 changed files with 1709 additions and 1 deletions

View File

@ -220,6 +220,12 @@ ifeq ($(HAVE_OMAP), 1)
OBJ += gfx/omap_gfx.o
endif
ifeq ($(HAVE_EXYNOS), 1)
OBJ += gfx/exynos_gfx.o memcpy-neon.o
LIBS += $(DRM_LIBS) $(EXYNOS_LIBS)
DEFINES += $(DRM_CFLAGS) $(EXYNOS_CFLAGS)
endif
ifeq ($(HAVE_OPENGL), 1)
OBJ += gfx/gl.o \
gfx/gfx_context.o \

60
README-exynos.md Normal file
View File

@ -0,0 +1,60 @@
# RetroArch Exynos-G2D video driver
The Exynos-G2D video driver for RetroArch uses the Exynos DRM layer for presentation and the Exynos G2D block to scale and blit the emulator framebuffer to the screen. The G2D subsystem is a separate functional block on modern Samsung Exynos SoCs (in particular Exynos4412 and Exynos5250) that accelerates various kind of 2D blit operations. It can fill, copy, scale and blend pixel buffers and therefore provides adequate functionality for RetroArch purposes.
## Reasons to use the driver
Hardware accelerated rendering on devices based on an Exynos SoC is usually restricted to the use of the GPU block, which is either a Mali or PowerVR IP. Both GPU types have the problem that interfacing with them requires a proprietary driver stack, comprised of kernel and userspace code. While the kernel code is open source, the userspace code is only available as a binary blob to the enduser.
If you want to use such a device with an upstream kernel, the GPU block will most likely not work for you. Also the chances of Mali or PowerVR kernel code being accepted upstream is very slim. Still, one might want to ask the question if using the GPU block for such trivial operations (basically scale and blend) is the right approach in the first place.
Since the G2D block is present on all modern Exynos SoCs, the natural way of proceeding would be to use it instead of the GPU block. The G2D is still a dedicated piece of hardware, so all operations are offloaded from the CPU. It should be noted though, that using the G2D instead of the GPU removes the possibility to use GPU shaders to enhance the image quality of your emulator core of choice. If the user relies on these enhancements, then he's advised to continue using the GPU, most likely by using the EGL/GLES video driver.
The author uses a Hardkernel ODROID-X2, which is an developer board powered by an Exynos4412 SoC. The vendor supplied kernel, a Linux tree based on the 3.8.y branch, currently offers no way to use the G2D because of issues related to clock setup. However upstreaming work is in progress and a tree based on 3.15.y, with some slight modifications, is available from here:
[odroid-3.15.y repository](https://github.com/tobiasjakobi/linux-odroid)
Please refer to the minimalistic documentation in README-ODROID for setup.
## Performance analysis
Some simple benchmarking was done to evaluate the performance of the G2D block. The test run was done with the snes9x-next emulation core and a game title that uses a native resolution of 256x224 pixels. The output screen was configured to a 1280x720 mode. Scaling to the output screen was done by keeping the native aspect ratio. In this case this would result in an output rectangle of size 822x720.
total memcpy calls: 18795
total g2d calls: 18795
total memcpy time: 8.978532 seconds
total g2d time: 29.703944 seconds
average time per memcpy call: 477.708540 microseconds
average time per g2d call: 1580.417345 microseconds
The average time to display the emulator framebuffer on screen is roughly 2058 microseconds, or around 486 frames per second. Assuming that the time consumption increases linearly with the amount of pixels processed, which is usually a safe assumption, scaling to an output rectangle of size 1920x1080 would yield a average duration of 7207 microseconds, which is still 138 frames per second.
## Configuration
The video driver uses the libdrm API to interface with the DRM. Some patches are still missing in the upstream tree, therefore the user is advised to use the 'exynos' branch of the repository mentioned below.
[libdrm repository](https://github.com/tobiasjakobi/libdrm)
Make sure that the Exynos API support is enabled. If you're building libdrm from source, then use
./configure --enable-exynos-experimental-api
to enable it.
The video driver name is 'exynos'. It honors the following video settings:
- video\_monitor\_index
- video\_fullscreen\_x and video\_fullscreen\_y
The monitor index maps to the DRM connector index. If it is zero, then it just selects the first 'sane' connector, which means that it is connected to a display device and it provides at least one useable mode. If the value is non-zero, it forces the selection of this connector. For example, on the author's ODROID-X2, with an odroid-3.15.y kernel, the HDMI connector has index 1.
The two fullscreen parameters select the mode the DRM should select. If zero, the native connector mode is selected. If non-zero, the DRM tries to select the wanted mode. This might fail if the mode is not available from the connector.
## Issues and TODOs
The driver still suffers from some issues.
- The aspect ratio computation can be improved. In particular the user supplied aspect ratio is currently unused.
- Font rendering and blitting is very inefficient since the backing buffer is cleared every frame. Introduce a invalidation rectangle which covers the region where font glyphs are drawn, and then only clear this region.
- Temporary GEM buffers are used as source for blitting operations. Support for the IOMMU has to be enabled, so that one can use the 'userptr' functionality.
- More TODOs are pointed out in the code itself.

View File

@ -45,6 +45,7 @@ enum
VIDEO_VG,
VIDEO_NULL,
VIDEO_OMAP,
VIDEO_EXYNOS,
AUDIO_RSOUND,
AUDIO_OSS,

View File

@ -139,6 +139,9 @@ static const video_driver_t *video_drivers[] = {
#endif
#ifdef HAVE_OMAP
&video_omap,
#endif
#ifdef HAVE_EXYNOS
&video_exynos,
#endif
NULL,
};

View File

@ -626,6 +626,7 @@ extern const video_driver_t video_vg;
extern const video_driver_t video_null;
extern const video_driver_t video_lima;
extern const video_driver_t video_omap;
extern const video_driver_t video_exynos;
extern const input_driver_t input_android;
extern const input_driver_t input_sdl;
extern const input_driver_t input_dinput;

1489
gfx/exynos_gfx.c Normal file

File diff suppressed because it is too large Load Diff

139
memcpy-neon.S Normal file
View File

@ -0,0 +1,139 @@
/*
* NEON code contributed by Siarhei Siamashka <siarhei.siamashka@nokia.com>.
* Origin: http://sourceware.org/ml/libc-ports/2009-07/msg00003.html
*
* The GNU C Library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License.
*
* Tweaked for Android by Jim Huang <jserv@0xlab.org>
*/
.arm
.fpu neon
.global memcpy_neon
/*
* ENABLE_UNALIGNED_MEM_ACCESSES macro can be defined to permit the use
* of unaligned load/store memory accesses supported since ARMv6. This
* will further improve performance, but can purely theoretically cause
* problems if somebody decides to set SCTLR.A bit in the OS kernel
* (to trap each unaligned memory access) or somehow mess with strongly
* ordered/device memory.
*/
#define ENABLE_UNALIGNED_MEM_ACCESSES 1
#define NEON_MAX_PREFETCH_DISTANCE 320
.align 4
memcpy_neon:
.fnstart
mov ip, r0
cmp r2, #16
blt 4f @ Have less than 16 bytes to copy
@ First ensure 16 byte alignment for the destination buffer
tst r0, #0xF
beq 2f
tst r0, #1
ldrneb r3, [r1], #1
strneb r3, [ip], #1
subne r2, r2, #1
tst ip, #2
#ifdef ENABLE_UNALIGNED_MEM_ACCESSES
ldrneh r3, [r1], #2
strneh r3, [ip], #2
#else
ldrneb r3, [r1], #1
strneb r3, [ip], #1
ldrneb r3, [r1], #1
strneb r3, [ip], #1
#endif
subne r2, r2, #2
tst ip, #4
beq 1f
vld4.8 {d0[0], d1[0], d2[0], d3[0]}, [r1]!
vst4.8 {d0[0], d1[0], d2[0], d3[0]}, [ip, :32]!
sub r2, r2, #4
1:
tst ip, #8
beq 2f
vld1.8 {d0}, [r1]!
vst1.8 {d0}, [ip, :64]!
sub r2, r2, #8
2:
subs r2, r2, #32
blt 3f
mov r3, #32
@ Main copy loop, 32 bytes are processed per iteration.
@ ARM instructions are used for doing fine-grained prefetch,
@ increasing prefetch distance progressively up to
@ NEON_MAX_PREFETCH_DISTANCE at runtime
1:
vld1.8 {d0-d3}, [r1]!
cmp r3, #(NEON_MAX_PREFETCH_DISTANCE - 32)
pld [r1, r3]
addle r3, r3, #32
vst1.8 {d0-d3}, [ip, :128]!
sub r2, r2, #32
cmp r2, r3
bge 1b
cmp r2, #0
blt 3f
1: @ Copy the remaining part of the buffer (already prefetched)
vld1.8 {d0-d3}, [r1]!
subs r2, r2, #32
vst1.8 {d0-d3}, [ip, :128]!
bge 1b
3: @ Copy up to 31 remaining bytes
tst r2, #16
beq 4f
vld1.8 {d0, d1}, [r1]!
vst1.8 {d0, d1}, [ip, :128]!
4:
@ Use ARM instructions exclusively for the final trailing part
@ not fully fitting into full 16 byte aligned block in order
@ to avoid "ARM store after NEON store" hazard. Also NEON
@ pipeline will be (mostly) flushed by the time when the
@ control returns to the caller, making the use of NEON mostly
@ transparent (and avoiding hazards in the caller code)
#ifdef ENABLE_UNALIGNED_MEM_ACCESSES
movs r3, r2, lsl #29
ldrcs r3, [r1], #4
strcs r3, [ip], #4
ldrcs r3, [r1], #4
strcs r3, [ip], #4
ldrmi r3, [r1], #4
strmi r3, [ip], #4
movs r2, r2, lsl #31
ldrcsh r3, [r1], #2
strcsh r3, [ip], #2
ldrmib r3, [r1], #1
strmib r3, [ip], #1
#else
movs r3, r2, lsl #29
bcc 1f
.rept 8
ldrcsb r3, [r1], #1
strcsb r3, [ip], #1
.endr
1:
bpl 1f
.rept 4
ldrmib r3, [r1], #1
strmib r3, [ip], #1
.endr
1:
movs r2, r2, lsl #31
ldrcsb r3, [r1], #1
strcsb r3, [ip], #1
ldrcsb r3, [r1], #1
strcsb r3, [ip], #1
ldrmib r3, [r1], #1
strmib r3, [ip], #1
#endif
bx lr
.fnend

View File

@ -85,6 +85,11 @@ if [ "$HAVE_EGL" != "no" ]; then
fi
fi
if [ "$HAVE_EXYNOS" != "no" ]; then
check_pkgconf EXYNOS libdrm_exynos
check_pkgconf DRM libdrm
fi
if [ "$LIBRETRO" ]; then
echo "Explicit libretro used, disabling dynamic libretro loading ..."
HAVE_DYNAMIC='no'
@ -276,6 +281,7 @@ add_define_make OS "$OS"
# Creates config.mk and config.h.
add_define_make GLOBAL_CONFIG_DIR "$GLOBAL_CONFIG_DIR"
VARS="RGUI LAKKA ALSA OSS OSS_BSD OSS_LIB AL RSOUND ROAR JACK COREAUDIO PULSE SDL OPENGL LIMA OMAP GLES GLES3 VG EGL KMS GBM DRM DYLIB GETOPT_LONG THREADS CG LIBXML2 ZLIB DYNAMIC FFMPEG AVCODEC AVFORMAT AVUTIL SWSCALE FREETYPE XKBCOMMON XVIDEO X11 XEXT XF86VM XINERAMA MALI_FBDEV NETPLAY NETWORK_CMD STDIN_CMD COMMAND SOCKET_LEGACY FBO STRL STRCASESTR MMAP PYTHON FFMPEG_ALLOC_CONTEXT3 FFMPEG_AVCODEC_OPEN2 FFMPEG_AVIO_OPEN FFMPEG_AVFORMAT_WRITE_HEADER FFMPEG_AVFORMAT_NEW_STREAM FFMPEG_AVCODEC_ENCODE_AUDIO2 FFMPEG_AVCODEC_ENCODE_VIDEO2 BSV_MOVIE VIDEOCORE NEON FLOATHARD FLOATSOFTFP UDEV V4L2 AV_CHANNEL_LAYOUT"
VARS="RGUI LAKKA ALSA OSS OSS_BSD OSS_LIB AL RSOUND ROAR JACK COREAUDIO PULSE SDL OPENGL LIMA OMAP GLES GLES3 VG EGL KMS EXYNOS GBM DRM DYLIB GETOPT_LONG THREADS CG LIBXML2 ZLIB DYNAMIC FFMPEG AVCODEC AVFORMAT AVUTIL SWSCALE FREETYPE XKBCOMMON XVIDEO X11 XEXT XF86VM XINERAMA MALI_FBDEV NETPLAY NETWORK_CMD STDIN_CMD COMMAND SOCKET_LEGACY FBO STRL STRCASESTR MMAP PYTHON FFMPEG_ALLOC_CONTEXT3 FFMPEG_AVCODEC_OPEN2 FFMPEG_AVIO_OPEN FFMPEG_AVFORMAT_WRITE_HEADER FFMPEG_AVFORMAT_NEW_STREAM FFMPEG_AVCODEC_ENCODE_AUDIO2 FFMPEG_AVCODEC_ENCODE_VIDEO2 BSV_MOVIE VIDEOCORE NEON FLOATHARD FLOATSOFTFP UDEV V4L2 AV_CHANNEL_LAYOUT"
>>>>>>> Add exynos video driver
create_config_make config.mk $VARS
create_config_header config.h $VARS

View File

@ -20,6 +20,7 @@ HAVE_LIMA=no # Enable Lima video support
HAVE_OMAP=no # Enable OMAP video support
HAVE_XINERAMA=auto # Disable Xinerama support.
HAVE_KMS=auto # Enable KMS context support
HAVE_EXYNOS=no # Enable Exynos video support
HAVE_EGL=auto # Enable EGL context support
HAVE_VG=auto # Enable OpenVG support
HAVE_CG=auto # Enable Cg shader support

View File

@ -109,6 +109,8 @@ const char *config_get_default_video(void)
return "null";
case VIDEO_OMAP:
return "omap";
case VIDEO_EXYNOS:
return "exynos";
default:
return NULL;
}