A modified UTF-8 transformation format of ISO 10646 for storage optimization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
453462	694863	2006	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

A modified UTF-8 transformation format of ISO 10646 for storage optimization

چکیده انگلیسی

ISO 10646 Universal Character Set (UCS) covers symbols in most of the world's written languages. There are various UCS transformation formats (UTF), but UTF-8 is the most important one because of its compatibility with both software systems and communication systems that assume 8-bit characters. At first, three properties an UTF-8-like transformation format should satisfy are defined to preserve the main characteristics of UTF-8. Then, a derived 5-byte sequence with 31 free bits is illustrated to construct an UTF-8-like transformation format, which is capable of resolving the dummy byte sequences locally. After that, we try to reveal if the last byte patterns of the 3-byte and 4-byte sequences in the UTF-8-like transformation format are replaced with byte pattern 1xxxxxxx, two more free bits for the 3-byte and 4-byte sequences can be increased. The final version of the derived UTF-8-like transformation format, UTF-8M, is proved to have the minimal average storage of encoding an UCS-4 character, 16.3% less than what UTF-8 requires.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Standards & Interfaces - Volume 28, Issue 6, September 2006, Pages 650–659

نویسندگان

Cheng-Huang Tung, Ming-Chi Lee,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A modified UTF-8 transformation format of ISO 10646 for storage optimization

دسترسی سریع

ارتباط

English Website