Skip to content

Fix encoding issues + IDNA support#735

Closed
MuxZeroNet wants to merge 5 commits into
HelloZeroNet:masterfrom
MuxZeroNet:master
Closed

Fix encoding issues + IDNA support#735
MuxZeroNet wants to merge 5 commits into
HelloZeroNet:masterfrom
MuxZeroNet:master

Conversation

@MuxZeroNet
Copy link
Copy Markdown
Contributor

@MuxZeroNet MuxZeroNet commented Jan 4, 2017

Please test it on your own operating system and on your own locale.

I fixed many encoding issues of ZeroNet. `os.path` module sometimes returns a unicode string, and sometimes returns a string encoded with file system encoding. To fix that inconsistency, I wrote a `self.fixFsEncoding` method.

I also made it support IDNA namecoin domain. Check out some namecoin domains starting with `xn--`:
```
Emoji domains from ZeroNet domain registry:
🌔.bit
🌕.bit
🌝.bit
🌌.bit
ɥsıɯɐɥ.bit
κσ.bit
⛏.bit
```
@MuxZeroNet MuxZeroNet changed the title Fix Spelling Fix Spelling + encoding issues + IDNA support Jan 4, 2017
Comment thread src/Ui/UiRequest.py
self.log.debug("-- path repr: " + repr(path))
path = path.decode('utf-8')
self.log.debug("-- path decoded repr: " + repr(path))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are annoyed by encoding issues of Python 2, try using repr(...) on strings when printing out debug log.

@MuxZeroNet MuxZeroNet changed the title Fix Spelling + encoding issues + IDNA support Fix encoding issues + IDNA support Jan 5, 2017
@MuxZeroNet
Copy link
Copy Markdown
Contributor Author

MuxZeroNet commented Jan 5, 2017

Solves issues #131 and #298 but has homograph attacks. It may be a good idea to show a warning to show the encoded domain name, the domain public key, and whether known homographs are found in the domain name.

Lists of homographs:
https://github.com/reinderien/mimic (The super evil)
https://github.com/adam-lynch/olc (Similar super evil)
https://github.com/MattOates/Text--Homoglyph/blob/master/lib/Text/Homoglyph.pm6
https://github.com/codebox/homoglyph
https://github.com/SoftwareAddictionShow/IDN-homograph-attack
https://en.wikipedia.org/wiki/IDN_homograph_attack (a few on Wikipedia page)
http://www.unicode.org/Public/security/latest/confusables.txt (Unicode confusables)
http://www.unicode.org/reports/tr36/ (Unicode security considerations)
https://github.com/minimaxir/big-list-of-naughty-strings/blob/master/blns.json (Big list of naughty strings)
https://krebsonsecurity.com/2011/09/right-to-left-override-aids-email-attacks/ (Control characters and Right-to-Left override attack)

@MuxZeroNet
Copy link
Copy Markdown
Contributor Author

I decided not to make ZeroNet support IDNA domain names simply because it will make browsing IDNA sites more inconvenient.

Suppose someone registered for a fancy domain name ☉net.bit (the Sun symbol), and a bad guy registered for a domain name ⊙net.bit (the circled dot operator) for phishing.

>>> u'☉' == u'⊙'
False
>>> print repr(u'☉'),repr(u'⊙')
u'\u2609' u'\u2299'

Most fonts are not optimized for differentiating these homographs, so we want ZeroNet to display them in puny coded form.

>>> u'⊙net.bit'.encode('idna'), u'☉net.bit'.encode('idna')
('xn--net-vr2a.bit', 'xn--net-gn5a.bit')

To not to get to the phishing site, one has to memorize the full puny coded form of the innocent site's domain name. In this case, one will choose to use public keys to access these sites instead.

I will create a separate pull request only to fix encoding issues.

@HelloZeroNet
Copy link
Copy Markdown
Owner

HelloZeroNet commented Jan 9, 2017

Yeah, I'm also uncertain about utf8 domain names, no one can tell the difference between Оnet, ⵔnet, Onet, Οnet or Onet (All uses different "0" character)

or for example: http://secret.ɢoogle.com

@MuxZeroNet
Copy link
Copy Markdown
Contributor Author

Closed. IDNA domains will not be supported due to security concerns. Discussion about string encoding bugs has been moved here: #765

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants