星期三, 一月 17, 2007

Use Regular Expression to divide english and chinese

The original article is write for the command which used in Unix, but the a comment figured out that it is also avaliable in PHP.

------------------------------------------------------------

First, it is only work for UTF-8. Because if the code is GBK, some chinese characters would overlap english characters. (not test for GB2312 yet, but UTF-8 would be a better choice).

Use [\x00-\xFF] to match. (Chinese or English?need further test)

[^[A-Za-z0-9]+$] to match all english character, but only in UTF-8.

p.s.: Using regular expression with Unicode code must add mode code at end of every string. Ex. to match unicode character 005c, it must be written as "/\x{005c}/u". The last "u" is the mode code.

------------------------------------------------------------

Relative Links:

How to differentiate Chinese Character and English Character using regular expression.
http://www.chedong.com/blog/archives/001261.html

没有评论: