Fix encoding issues + IDNA support#735
Conversation
I fixed many encoding issues of ZeroNet. `os.path` module sometimes returns a unicode string, and sometimes returns a string encoded with file system encoding. To fix that inconsistency, I wrote a `self.fixFsEncoding` method. I also made it support IDNA namecoin domain. Check out some namecoin domains starting with `xn--`: ``` Emoji domains from ZeroNet domain registry: 🌔.bit 🌕.bit 🌝.bit 🌌.bit ɥsıɯɐɥ.bit κσ.bit ⛏.bit ```
| self.log.debug("-- path repr: " + repr(path)) | ||
| path = path.decode('utf-8') | ||
| self.log.debug("-- path decoded repr: " + repr(path)) | ||
|
|
There was a problem hiding this comment.
If you are annoyed by encoding issues of Python 2, try using repr(...) on strings when printing out debug log.
|
Solves issues #131 and #298 but has homograph attacks. It may be a good idea to show a warning to show the encoded domain name, the domain public key, and whether known homographs are found in the domain name. Lists of homographs: |
|
I decided not to make ZeroNet support IDNA domain names simply because it will make browsing IDNA sites more inconvenient. Suppose someone registered for a fancy domain name >>> u'☉' == u'⊙'
False
>>> print repr(u'☉'),repr(u'⊙')
u'\u2609' u'\u2299'Most fonts are not optimized for differentiating these homographs, so we want ZeroNet to display them in puny coded form. >>> u'⊙net.bit'.encode('idna'), u'☉net.bit'.encode('idna')
('xn--net-vr2a.bit', 'xn--net-gn5a.bit')To not to get to the phishing site, one has to memorize the full puny coded form of the innocent site's domain name. In this case, one will choose to use public keys to access these sites instead. I will create a separate pull request only to fix encoding issues. |
|
Yeah, I'm also uncertain about utf8 domain names, no one can tell the difference between Оnet, ⵔnet, Onet, Οnet or Onet (All uses different "0" character) or for example: http://secret.ɢoogle.com |
|
Closed. IDNA domains will not be supported due to security concerns. Discussion about string encoding bugs has been moved here: #765 |
Please test it on your own operating system and on your own locale.