How to show non-ascii characters in python?

Question

How to show non-ascii characters in python?

I am using Python shell this way:

>>> s = 'Ã'
>>> s
'\xc3'

How can I print the variable s to show the character Ã ??? This is the first and easiest question. In fact, I am getting content from a web page with non-ascii characters like the previous ones, but others with a tilde like á, é, í, ñ, etc. Also, I am trying to execute a regex with these characters into a templated expression against the content of a web page.

How to solve this problem?

This is an example of one regex:

u'<td[^>]*>\s*Definición\s*</td><td class="value"[^>]*>\s*(?P<data>[\w ,-:\.\(\)]+)\s*</td>'

If I use Expresson app works great.

EDIT [05/26/2009 04:38 PM]: Sorry about my explanations. I will try to explain better.

I need to get text from a page. I have the url of this page and I have a regex to get this text. The first thing I thought was this regex was wrong. I tested it with Expresso and it worked fine, I got the text I wanted. So secondly, I thought I needed to print the content of the page, and that was when I saw that the content was not what I see in the original code of the web page. Differences are non-ascii characters like á, é, í, etc. Now I don't know what to do and if the problem is in the encoding of the page content or in the text of the regex pattern. One of the regexes I have defined is the previous one.

Wolud be question: is there a problem using regex which has non-ascii characters in the template ???

0

python urllib2

jaloplo May 26 '09 at 13:53

a source to share

3 answers

How can I print the variable s to show the character Ã ???
use print

:

>>> s = 'Ã'
>>> s
'\xc3'
>>> print s
Ã

+2

Jason coon May 26 '09 at 14:06

a source to share

I would use ord()

to find out if a character is ASCII / special:

if ord(c) > 127:
    # special character

This probably won't work with multibyte encodings like UTF-8. In this case, I have to convert to Unicode before testing.

If you are getting special characters from a web page, you must know the encoding. Then decode it, see the Unicode HOWTO .

Edit: I definitely don't know what this question is about ... It might be a good idea to clarify this.

+1

Bastien Léonard May 26 '09 at 14:07

a source to share

odwl · Accepted Answer · 2009-05-26T15:41:53+0000

Let's say you want to print it as utf-8. Before python 3, it's best to specifically code it

print u'Ã'.encode('utf-8')

if you get text from outside then you need to decode specifically ('utf-8) like

f = open(my_file)
a = f.next().decode('utf-8') # you have a unicode line in a
print a.encode('utf-8')

How to show non-ascii characters in python?

More articles: