job.answiz.com
  • 0
Votes
name
name Punditsdkoslkdosdkoskdo

How do I grep for all non-ASCII characters?

I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following:

grep -e "[x{00FF}-x{FFFF}]" file.xml

But this returns every line in the file, regardless of whether the line contains a character in the range specified.

Do I have the syntax wrong or am I doing something else wrong? I've also tried:

egrep "[x{00FF}-x{FFFF}]" file.xml 

(with both single and double quotes surrounding the pattern).

You can use the command:

grep --color='auto' -P -n "[€-˙]" file.xml

This will give you the line number, and will highlight non-ascii chars in red.

In some systems, depending on your settings, the above will not work, so you can grep by the inverse

grep --color='auto' -P -n "[^-]" file.xml

Note also, that the important bit is the -P flag which equates to --perl-regexp: so it will interpret your pattern as a Perl regular expression. It also says that

this is highly experimental and grep -P may warn of unimplemented features.

  • 0
Reply Report