mirror of
https://github.com/libretro/RetroArch
synced 2025-03-28 08:37:41 +00:00
Add exynos video driver
Documentation is provided in README-exynos.
This commit is contained in:
parent
9a3ac9c5fd
commit
7efa9def07
6
Makefile
6
Makefile
@ -220,6 +220,12 @@ ifeq ($(HAVE_OMAP), 1)
|
||||
OBJ += gfx/omap_gfx.o
|
||||
endif
|
||||
|
||||
ifeq ($(HAVE_EXYNOS), 1)
|
||||
OBJ += gfx/exynos_gfx.o memcpy-neon.o
|
||||
LIBS += $(DRM_LIBS) $(EXYNOS_LIBS)
|
||||
DEFINES += $(DRM_CFLAGS) $(EXYNOS_CFLAGS)
|
||||
endif
|
||||
|
||||
ifeq ($(HAVE_OPENGL), 1)
|
||||
OBJ += gfx/gl.o \
|
||||
gfx/gfx_context.o \
|
||||
|
60
README-exynos.md
Normal file
60
README-exynos.md
Normal file
@ -0,0 +1,60 @@
|
||||
# RetroArch Exynos-G2D video driver
|
||||
|
||||
The Exynos-G2D video driver for RetroArch uses the Exynos DRM layer for presentation and the Exynos G2D block to scale and blit the emulator framebuffer to the screen. The G2D subsystem is a separate functional block on modern Samsung Exynos SoCs (in particular Exynos4412 and Exynos5250) that accelerates various kind of 2D blit operations. It can fill, copy, scale and blend pixel buffers and therefore provides adequate functionality for RetroArch purposes.
|
||||
|
||||
## Reasons to use the driver
|
||||
|
||||
Hardware accelerated rendering on devices based on an Exynos SoC is usually restricted to the use of the GPU block, which is either a Mali or PowerVR IP. Both GPU types have the problem that interfacing with them requires a proprietary driver stack, comprised of kernel and userspace code. While the kernel code is open source, the userspace code is only available as a binary blob to the enduser.
|
||||
|
||||
If you want to use such a device with an upstream kernel, the GPU block will most likely not work for you. Also the chances of Mali or PowerVR kernel code being accepted upstream is very slim. Still, one might want to ask the question if using the GPU block for such trivial operations (basically scale and blend) is the right approach in the first place.
|
||||
|
||||
Since the G2D block is present on all modern Exynos SoCs, the natural way of proceeding would be to use it instead of the GPU block. The G2D is still a dedicated piece of hardware, so all operations are offloaded from the CPU. It should be noted though, that using the G2D instead of the GPU removes the possibility to use GPU shaders to enhance the image quality of your emulator core of choice. If the user relies on these enhancements, then he's advised to continue using the GPU, most likely by using the EGL/GLES video driver.
|
||||
|
||||
The author uses a Hardkernel ODROID-X2, which is an developer board powered by an Exynos4412 SoC. The vendor supplied kernel, a Linux tree based on the 3.8.y branch, currently offers no way to use the G2D because of issues related to clock setup. However upstreaming work is in progress and a tree based on 3.15.y, with some slight modifications, is available from here:
|
||||
|
||||
[odroid-3.15.y repository](https://github.com/tobiasjakobi/linux-odroid)
|
||||
|
||||
Please refer to the minimalistic documentation in README-ODROID for setup.
|
||||
|
||||
## Performance analysis
|
||||
|
||||
Some simple benchmarking was done to evaluate the performance of the G2D block. The test run was done with the snes9x-next emulation core and a game title that uses a native resolution of 256x224 pixels. The output screen was configured to a 1280x720 mode. Scaling to the output screen was done by keeping the native aspect ratio. In this case this would result in an output rectangle of size 822x720.
|
||||
|
||||
total memcpy calls: 18795
|
||||
total g2d calls: 18795
|
||||
total memcpy time: 8.978532 seconds
|
||||
total g2d time: 29.703944 seconds
|
||||
average time per memcpy call: 477.708540 microseconds
|
||||
average time per g2d call: 1580.417345 microseconds
|
||||
|
||||
The average time to display the emulator framebuffer on screen is roughly 2058 microseconds, or around 486 frames per second. Assuming that the time consumption increases linearly with the amount of pixels processed, which is usually a safe assumption, scaling to an output rectangle of size 1920x1080 would yield a average duration of 7207 microseconds, which is still 138 frames per second.
|
||||
|
||||
## Configuration
|
||||
|
||||
The video driver uses the libdrm API to interface with the DRM. Some patches are still missing in the upstream tree, therefore the user is advised to use the 'exynos' branch of the repository mentioned below.
|
||||
|
||||
[libdrm repository](https://github.com/tobiasjakobi/libdrm)
|
||||
|
||||
Make sure that the Exynos API support is enabled. If you're building libdrm from source, then use
|
||||
|
||||
./configure --enable-exynos-experimental-api
|
||||
|
||||
to enable it.
|
||||
|
||||
The video driver name is 'exynos'. It honors the following video settings:
|
||||
|
||||
- video\_monitor\_index
|
||||
- video\_fullscreen\_x and video\_fullscreen\_y
|
||||
|
||||
The monitor index maps to the DRM connector index. If it is zero, then it just selects the first 'sane' connector, which means that it is connected to a display device and it provides at least one useable mode. If the value is non-zero, it forces the selection of this connector. For example, on the author's ODROID-X2, with an odroid-3.15.y kernel, the HDMI connector has index 1.
|
||||
|
||||
The two fullscreen parameters select the mode the DRM should select. If zero, the native connector mode is selected. If non-zero, the DRM tries to select the wanted mode. This might fail if the mode is not available from the connector.
|
||||
|
||||
## Issues and TODOs
|
||||
|
||||
The driver still suffers from some issues.
|
||||
|
||||
- The aspect ratio computation can be improved. In particular the user supplied aspect ratio is currently unused.
|
||||
- Font rendering and blitting is very inefficient since the backing buffer is cleared every frame. Introduce a invalidation rectangle which covers the region where font glyphs are drawn, and then only clear this region.
|
||||
- Temporary GEM buffers are used as source for blitting operations. Support for the IOMMU has to be enabled, so that one can use the 'userptr' functionality.
|
||||
- More TODOs are pointed out in the code itself.
|
@ -45,6 +45,7 @@ enum
|
||||
VIDEO_VG,
|
||||
VIDEO_NULL,
|
||||
VIDEO_OMAP,
|
||||
VIDEO_EXYNOS,
|
||||
|
||||
AUDIO_RSOUND,
|
||||
AUDIO_OSS,
|
||||
|
3
driver.c
3
driver.c
@ -139,6 +139,9 @@ static const video_driver_t *video_drivers[] = {
|
||||
#endif
|
||||
#ifdef HAVE_OMAP
|
||||
&video_omap,
|
||||
#endif
|
||||
#ifdef HAVE_EXYNOS
|
||||
&video_exynos,
|
||||
#endif
|
||||
NULL,
|
||||
};
|
||||
|
1
driver.h
1
driver.h
@ -626,6 +626,7 @@ extern const video_driver_t video_vg;
|
||||
extern const video_driver_t video_null;
|
||||
extern const video_driver_t video_lima;
|
||||
extern const video_driver_t video_omap;
|
||||
extern const video_driver_t video_exynos;
|
||||
extern const input_driver_t input_android;
|
||||
extern const input_driver_t input_sdl;
|
||||
extern const input_driver_t input_dinput;
|
||||
|
1489
gfx/exynos_gfx.c
Normal file
1489
gfx/exynos_gfx.c
Normal file
File diff suppressed because it is too large
Load Diff
139
memcpy-neon.S
Normal file
139
memcpy-neon.S
Normal file
@ -0,0 +1,139 @@
|
||||
/*
|
||||
* NEON code contributed by Siarhei Siamashka <siarhei.siamashka@nokia.com>.
|
||||
* Origin: http://sourceware.org/ml/libc-ports/2009-07/msg00003.html
|
||||
*
|
||||
* The GNU C Library is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU Lesser General Public License.
|
||||
*
|
||||
* Tweaked for Android by Jim Huang <jserv@0xlab.org>
|
||||
*/
|
||||
|
||||
.arm
|
||||
.fpu neon
|
||||
|
||||
.global memcpy_neon
|
||||
|
||||
/*
|
||||
* ENABLE_UNALIGNED_MEM_ACCESSES macro can be defined to permit the use
|
||||
* of unaligned load/store memory accesses supported since ARMv6. This
|
||||
* will further improve performance, but can purely theoretically cause
|
||||
* problems if somebody decides to set SCTLR.A bit in the OS kernel
|
||||
* (to trap each unaligned memory access) or somehow mess with strongly
|
||||
* ordered/device memory.
|
||||
*/
|
||||
#define ENABLE_UNALIGNED_MEM_ACCESSES 1
|
||||
|
||||
#define NEON_MAX_PREFETCH_DISTANCE 320
|
||||
|
||||
.align 4
|
||||
memcpy_neon:
|
||||
.fnstart
|
||||
mov ip, r0
|
||||
cmp r2, #16
|
||||
blt 4f @ Have less than 16 bytes to copy
|
||||
|
||||
@ First ensure 16 byte alignment for the destination buffer
|
||||
tst r0, #0xF
|
||||
beq 2f
|
||||
tst r0, #1
|
||||
ldrneb r3, [r1], #1
|
||||
strneb r3, [ip], #1
|
||||
subne r2, r2, #1
|
||||
tst ip, #2
|
||||
#ifdef ENABLE_UNALIGNED_MEM_ACCESSES
|
||||
ldrneh r3, [r1], #2
|
||||
strneh r3, [ip], #2
|
||||
#else
|
||||
ldrneb r3, [r1], #1
|
||||
strneb r3, [ip], #1
|
||||
ldrneb r3, [r1], #1
|
||||
strneb r3, [ip], #1
|
||||
#endif
|
||||
subne r2, r2, #2
|
||||
|
||||
tst ip, #4
|
||||
beq 1f
|
||||
vld4.8 {d0[0], d1[0], d2[0], d3[0]}, [r1]!
|
||||
vst4.8 {d0[0], d1[0], d2[0], d3[0]}, [ip, :32]!
|
||||
sub r2, r2, #4
|
||||
1:
|
||||
tst ip, #8
|
||||
beq 2f
|
||||
vld1.8 {d0}, [r1]!
|
||||
vst1.8 {d0}, [ip, :64]!
|
||||
sub r2, r2, #8
|
||||
2:
|
||||
subs r2, r2, #32
|
||||
blt 3f
|
||||
mov r3, #32
|
||||
|
||||
@ Main copy loop, 32 bytes are processed per iteration.
|
||||
@ ARM instructions are used for doing fine-grained prefetch,
|
||||
@ increasing prefetch distance progressively up to
|
||||
@ NEON_MAX_PREFETCH_DISTANCE at runtime
|
||||
1:
|
||||
vld1.8 {d0-d3}, [r1]!
|
||||
cmp r3, #(NEON_MAX_PREFETCH_DISTANCE - 32)
|
||||
pld [r1, r3]
|
||||
addle r3, r3, #32
|
||||
vst1.8 {d0-d3}, [ip, :128]!
|
||||
sub r2, r2, #32
|
||||
cmp r2, r3
|
||||
bge 1b
|
||||
cmp r2, #0
|
||||
blt 3f
|
||||
1: @ Copy the remaining part of the buffer (already prefetched)
|
||||
vld1.8 {d0-d3}, [r1]!
|
||||
subs r2, r2, #32
|
||||
vst1.8 {d0-d3}, [ip, :128]!
|
||||
bge 1b
|
||||
3: @ Copy up to 31 remaining bytes
|
||||
tst r2, #16
|
||||
beq 4f
|
||||
vld1.8 {d0, d1}, [r1]!
|
||||
vst1.8 {d0, d1}, [ip, :128]!
|
||||
4:
|
||||
@ Use ARM instructions exclusively for the final trailing part
|
||||
@ not fully fitting into full 16 byte aligned block in order
|
||||
@ to avoid "ARM store after NEON store" hazard. Also NEON
|
||||
@ pipeline will be (mostly) flushed by the time when the
|
||||
@ control returns to the caller, making the use of NEON mostly
|
||||
@ transparent (and avoiding hazards in the caller code)
|
||||
|
||||
#ifdef ENABLE_UNALIGNED_MEM_ACCESSES
|
||||
movs r3, r2, lsl #29
|
||||
ldrcs r3, [r1], #4
|
||||
strcs r3, [ip], #4
|
||||
ldrcs r3, [r1], #4
|
||||
strcs r3, [ip], #4
|
||||
ldrmi r3, [r1], #4
|
||||
strmi r3, [ip], #4
|
||||
movs r2, r2, lsl #31
|
||||
ldrcsh r3, [r1], #2
|
||||
strcsh r3, [ip], #2
|
||||
ldrmib r3, [r1], #1
|
||||
strmib r3, [ip], #1
|
||||
#else
|
||||
movs r3, r2, lsl #29
|
||||
bcc 1f
|
||||
.rept 8
|
||||
ldrcsb r3, [r1], #1
|
||||
strcsb r3, [ip], #1
|
||||
.endr
|
||||
1:
|
||||
bpl 1f
|
||||
.rept 4
|
||||
ldrmib r3, [r1], #1
|
||||
strmib r3, [ip], #1
|
||||
.endr
|
||||
1:
|
||||
movs r2, r2, lsl #31
|
||||
ldrcsb r3, [r1], #1
|
||||
strcsb r3, [ip], #1
|
||||
ldrcsb r3, [r1], #1
|
||||
strcsb r3, [ip], #1
|
||||
ldrmib r3, [r1], #1
|
||||
strmib r3, [ip], #1
|
||||
#endif
|
||||
bx lr
|
||||
.fnend
|
@ -85,6 +85,11 @@ if [ "$HAVE_EGL" != "no" ]; then
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ "$HAVE_EXYNOS" != "no" ]; then
|
||||
check_pkgconf EXYNOS libdrm_exynos
|
||||
check_pkgconf DRM libdrm
|
||||
fi
|
||||
|
||||
if [ "$LIBRETRO" ]; then
|
||||
echo "Explicit libretro used, disabling dynamic libretro loading ..."
|
||||
HAVE_DYNAMIC='no'
|
||||
@ -276,6 +281,7 @@ add_define_make OS "$OS"
|
||||
|
||||
# Creates config.mk and config.h.
|
||||
add_define_make GLOBAL_CONFIG_DIR "$GLOBAL_CONFIG_DIR"
|
||||
VARS="RGUI LAKKA ALSA OSS OSS_BSD OSS_LIB AL RSOUND ROAR JACK COREAUDIO PULSE SDL OPENGL LIMA OMAP GLES GLES3 VG EGL KMS GBM DRM DYLIB GETOPT_LONG THREADS CG LIBXML2 ZLIB DYNAMIC FFMPEG AVCODEC AVFORMAT AVUTIL SWSCALE FREETYPE XKBCOMMON XVIDEO X11 XEXT XF86VM XINERAMA MALI_FBDEV NETPLAY NETWORK_CMD STDIN_CMD COMMAND SOCKET_LEGACY FBO STRL STRCASESTR MMAP PYTHON FFMPEG_ALLOC_CONTEXT3 FFMPEG_AVCODEC_OPEN2 FFMPEG_AVIO_OPEN FFMPEG_AVFORMAT_WRITE_HEADER FFMPEG_AVFORMAT_NEW_STREAM FFMPEG_AVCODEC_ENCODE_AUDIO2 FFMPEG_AVCODEC_ENCODE_VIDEO2 BSV_MOVIE VIDEOCORE NEON FLOATHARD FLOATSOFTFP UDEV V4L2 AV_CHANNEL_LAYOUT"
|
||||
VARS="RGUI LAKKA ALSA OSS OSS_BSD OSS_LIB AL RSOUND ROAR JACK COREAUDIO PULSE SDL OPENGL LIMA OMAP GLES GLES3 VG EGL KMS EXYNOS GBM DRM DYLIB GETOPT_LONG THREADS CG LIBXML2 ZLIB DYNAMIC FFMPEG AVCODEC AVFORMAT AVUTIL SWSCALE FREETYPE XKBCOMMON XVIDEO X11 XEXT XF86VM XINERAMA MALI_FBDEV NETPLAY NETWORK_CMD STDIN_CMD COMMAND SOCKET_LEGACY FBO STRL STRCASESTR MMAP PYTHON FFMPEG_ALLOC_CONTEXT3 FFMPEG_AVCODEC_OPEN2 FFMPEG_AVIO_OPEN FFMPEG_AVFORMAT_WRITE_HEADER FFMPEG_AVFORMAT_NEW_STREAM FFMPEG_AVCODEC_ENCODE_AUDIO2 FFMPEG_AVCODEC_ENCODE_VIDEO2 BSV_MOVIE VIDEOCORE NEON FLOATHARD FLOATSOFTFP UDEV V4L2 AV_CHANNEL_LAYOUT"
|
||||
>>>>>>> Add exynos video driver
|
||||
create_config_make config.mk $VARS
|
||||
create_config_header config.h $VARS
|
||||
|
@ -20,6 +20,7 @@ HAVE_LIMA=no # Enable Lima video support
|
||||
HAVE_OMAP=no # Enable OMAP video support
|
||||
HAVE_XINERAMA=auto # Disable Xinerama support.
|
||||
HAVE_KMS=auto # Enable KMS context support
|
||||
HAVE_EXYNOS=no # Enable Exynos video support
|
||||
HAVE_EGL=auto # Enable EGL context support
|
||||
HAVE_VG=auto # Enable OpenVG support
|
||||
HAVE_CG=auto # Enable Cg shader support
|
||||
|
@ -109,6 +109,8 @@ const char *config_get_default_video(void)
|
||||
return "null";
|
||||
case VIDEO_OMAP:
|
||||
return "omap";
|
||||
case VIDEO_EXYNOS:
|
||||
return "exynos";
|
||||
default:
|
||||
return NULL;
|
||||
}
|
||||
|
Loading…
x
Reference in New Issue
Block a user