Skip to content

Fix encoding issues#765

Open
MuxZeroNet wants to merge 4 commits intoHelloZeroNet:masterfrom
MuxZeroNet:patch-8
Open

Fix encoding issues#765
MuxZeroNet wants to merge 4 commits intoHelloZeroNet:masterfrom
MuxZeroNet:patch-8

Conversation

@MuxZeroNet
Copy link
Copy Markdown
Contributor

@MuxZeroNet MuxZeroNet commented Jan 12, 2017

This pull request solves encoding issues only.

@MuxZeroNet
Copy link
Copy Markdown
Contributor Author

This pull request is a little bit outdated. Merging it seems to break UiRequest. However, the decoding error is still there, as of commit f74e939, rev 1861. Though we will not support IDNA domain names and non-ASCII path names, I believe it is better to return a 404 error rather than a 500 error.

@MuxZeroNet
Copy link
Copy Markdown
Contributor Author

MuxZeroNet commented Feb 6, 2017

General Guidelines

  • The path variable is URL decoded and UTF-8 encoded.

  • The return values of os.path APIs can be either in Unicode or in file system encoding. Make sure to use Unicode parameters so that the APIs will always return Unicode strings.

  • There cannot be Unicode strings in HTTP headers. The Gevent API self.start_response does not accept Unicode strings either.

  • Use .decode('encoding') to convert a binary string to a Unicode string. Use .encode('encoding') to convert a Unicode string to a binary string. Use repr(a_string) to see if a string is decoded correctly.

  • When testing, you have to make sure both the wrapper and the inner frame work without crashing.

Test cases

URLs:

http://127.0.0.1:43110/Tamás/Kocsis
http://127.0.0.1:43110/Tamás.bit/Kocsis/
http://127.0.0.1:43110/Tamás Kocsis/index.html
http://127.0.0.1:43110/1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D/fake/path/Tamás/Kocsis.html
http://127.0.0.1:43110/实用工具存档区.html
http://127.0.0.1:43110/实用工具存档区.bit/实用工具存档区.html
http://127.0.0.1:43110/实用工具存档区/ZeroMux.bit/
http://127.0.0.1:43110/1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D/?实用工具存档区

Path encodings:

Say, put ZeroNet in this directory:

/home/ubuntu/Tamás Kocsis/ZeroNet-master

@HelloZeroNet
Copy link
Copy Markdown
Owner

Currently only ascii filenames supported (non ascii files will not be included to content.json), because I think utf8 filenames and urls are unreliable, some browser/forum engine/etc. encode it as urlencode, some leave it as it is, somewhere it does not works at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants