正则工具 : regexr.com
GBK (GB2312/GB18030)
x00-xff GBK双字节编码范围
x20-x7f ASCII
xa1-xff 中文
x80-xff 中文
UTF-8 (Unicode)
u4e00-u9fa5 (中文)
x3130-x318F (韩文)
xAC00-xD7A3 (韩文)
u0800-u4e00 (日文)
uff21 – uff5a 英文全角 A-z
uff01 - uff09 美式键盘 1-9 上标字符 02 双引号 06 中文省略号……
uff10 - uff19 全角数字 0 – 9
uff20 @
韩文是大于[u9fa5]的字符
正则例子(使用PHP):
preg_replace(“/([x80-xff])/”,”",$str); //GBK中匹配
preg_replace(“/([u4e00-u9fa5])/”,”",$str); //UTF8中匹配
有的语言需要转义,使用[\u4e00-\u9fa5]
来匹配
r'\u1100-\u11FF' # Hangul Jamo
r'\u3040-\u309F' # Hiragana
r'\u30A0-\u30FF' # Katakana
r'\u3130-\u318F' # Hangul Compatibility Jamo
r'\u3400-\u4DBF' # CJK Unified Ideographs Extension A
r'\u4E00-\u9FFF' # CJK Unified Ideographs
r'\uA960-\uA97F' # Hangul Jamo Extended-A
r'\uAC00-\uD7A3' # Hangul Syllables
r'\uD7B0-\uD7FF' # Hangul Jamo Extended-B
r'\uF900-\uFAFF' # CJK Compatibility Ideographs
r'\uFF65-\uFF9F' # half-width katakana
r'\uFFA0-\uFFDC' # halfwidth forms of compatibility jamo characters for Hangul
r'\u20000-\u2A6DF' # CJK Unified Ideographs Extension B
r'\u2A700-\u2B73F' # CJK Unified Ideographs Extension C
r'\u2B740-\u2B81F' # CJK Unified Ideographs Extension D
r'\u2B820-\u2CEAF' # CJK Unified Ideographs Extension E
r'\u2F800-\u2FA1F' # CJK Compatibility Ideographs Supplement