|
|||
| Moderated by: Renate.Reinartz, Markus.Kreisel, Jaakko.Salmenius, Ilkka.Salmenius |
|
|||||||||||||
| Entities always encoded in XML? - Bugs and Quirks in Sisulizer - Technical Support (You need to be registered at the forum to write) - Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ... | ||||||||||||||
| Author | Post | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||||||||||||
|
starr Member
|
According to the help, "When creating localized XML file Sisulizer always encodes & and <. Encoding of other predefined entities depends on their usage in the original file. If the original file has encoded them then the localized files created by Sisulizer use the same encoding." However, the real result is that Sisulizer ALWAYS encode ALL common entities, like ", ' and etc. So is this a bug or design? I think it better to preserve the original format "as is". You can take the attached file for inspection. Attachment: English.zip (Downloaded 2 times)
|
|||||||||||||
| ||||||||||||||
| ||||||||||||||
|
Janusz Grzybek Super Moderator
|
Hello, Sisulizer doesn’t need convert all " or ' characters to numerical entities in all XML files. It depend on XML structure of original file and Sisulizer change these characters on numerical entities when it could be violate XML syntax. Below is screenshot from output FeedDemon language file edited in text editor. I’ve typed in Sisulizer translation sheet “Sample” text to two example cells and: - Quotation marks for cdata element aren’t encoded (indicated by red narrows on screenshot) - Quotation marks for control name attribute are encoded (indicated by blue narrows on screenshot). In this case “ characters should be changed, because otherwise XML will be contain syntax bug. You could to test it in easy way. Type our “Sample” text to original via text editor to this same string, and next re-scan project. Sisulizer probably won’t finish scan, because parser will find error. BTW: Latest original English.fdlang2 file contain syntax bug in line 1641 (standalone < character), but I think you also found it. I would like send information about it to Nick. Best regards, Janusz Attachment: screen.png (Downloaded 91 times)
____________________ http://www.sisulizer.com - Three simple steps to localize |
|||||||||||||
| ||||||||||||||
|
||||||||||||||
|
Janusz Grzybek Super Moderator
|
Below is screenshot from other output Sisulizer XML file (this same test). Attachment: screen1.png (Downloaded 91 times)
____________________ http://www.sisulizer.com - Three simple steps to localize |
|||||||||||||
| ||||||||||||||
| ||||||||||||||
|
starr Member
|
It's glad to see that you also use FeedDemon. But as for the issue I reported, I want to add that the original language file is maybe not fully valid. Take a look at line 1181. The original phrase reads like Unsubscribe from "%s?" while the target of Sisulizer converts it to Unsubscribe from &quot;%s?&quot; (I use full-width characters here to prevent from forum's auto conversion) The problem actually lies here. The texts of dialog are not enclosed by CDATA. Then Sisulizer encodes them. I think it should also keep the original format here though I believe it virtually won't cause compatibility problem neither. Last edited on Tue Nov 11th, 2008 05:12 am by starr |
|||||||||||||
| ||||||||||||||
|
||||||||||||||
|
starr Member
|
Is there any update there?
|
|||||||||||||
| ||||||||||||||
| ||||||||||||||
|
Janusz Grzybek Super Moderator
|
Hello Starr, I'll ask our developers about implementation following encoding/decoding method for " and ': - when source contains " or ' characters instead entities, Sisulizer will keep these characters, if it doesn't violate XML syntax - for entities in source, quotation marks/apostrophes not existing in source, and for quotation marks/apostrophes which could to violate XML syntax should be keep current method of encoding/decoding Last week our team was very busy on Tech-Ed EMEA conference in Barcelona, and we have long queue other requests, so potential change of encoding/decoding special characters could take some time, but I let you know about answer of our R&D. Best regards, Janusz
____________________ http://www.sisulizer.com - Three simple steps to localize |
|||||||||||||
| ||||||||||||||
|
||||||||||||||
|
starr Member
|
So dear Janusz Grzybek, how was going on with the topic? I updated to build 277 and the issue persists. The quotation marks enclosed in CDATA were still encoded in target file even if those in original weren't.
|
|||||||||||||
| ||||||||||||||
| ||||||||||||||
|
Jaakko.Salmenius Administrator
|
I am still working on this. Jaakko
____________________ http://www.sisulizer.com - Three simple steps to localize |
|||||||||||||
| ||||||||||||||
|
||||||||||||||
|
Jaakko.Salmenius Administrator
|
I got your point. Current Sisulizer works that way that it uses encoded characters if the original file anywhere uses that encoding. Lets have examples. 1) If we have an XML file that does not have encoding <?xml version="1.0" encoding="UTF-8"?> <sample> <plain>This is a "sample" too</plain> </sample> Then Sisulizer wont use them when writing. 2) If we have an XML fiel that has encoding then Sisulizer uses them always <?xml version="1.0" encoding="UTF-8"?> <sample> <plain>This is a "sample" too</plain> <plain>This is a "sample" too</plain> </sample> So this would come to <?xml version="1.0" encoding="UTF-8"?> <sample> <plain>This is a "sample" too</plain> <plain>This is a "sample" too</plain> </sample> This logic is OK. If you use encoding in one place it gives Sisulizer right to use them in another places too. However current Sisulizer does not make any different between attribute and element data. " must always be encoded inside attribute because " is attribute delimiter. So if we have the following code were attribute uses encoding but element does not. <?xml version="1.0" encoding="UTF-8"?> <sample> <plain data="Really "black" slope">This is a sample</plain> <plain>This is a "sample" too</plain> </sample> Currently Sisulizer creates <?xml version="1.0" encoding="UTF-8"?> <sample> <plain data="Really "black" slope">This is a sample</plain> <plain>This is a "sample" too</plain> </sample> If will fix this such way that Sisulizer handles the ecoding of atttribute and element data separately. Thank you for pointing this out. Jaakko
____________________ http://www.sisulizer.com - Three simple steps to localize |
|||||||||||||
| ||||||||||||||
| ||||||||||||||
|
starr Member
|
I'm glad that you find the problem and thanks for the upcoming fix in advance.
|
|||||||||||||
| ||||||||||||||
|
||||||||||||||
|
Jaakko.Salmenius Administrator
|
Fixed in 278. Sisulizer always uses the same enoding as the original item. 278 will come out this week. Jaakko
____________________ http://www.sisulizer.com - Three simple steps to localize |
|||||||||||||
| ||||||||||||||
| ||||||||||||||
|
starr Member
|
Thanks for the fix. Let's wait and see the magic.
|
|||||||||||||
| ||||||||||||||
|
||||||||||||||
|
Jaakko.Salmenius Administrator
|
One note. If the original item uses two or more encodings then the localized only uses one. Example <data>This is "new" and "bold" sample</data> The text is This is "new" and "bold" sample and it uses two encodings: plain (") and name ("). In this case SL will use only named on localized items. Finnish sample: <data>Tämä "uusi" ja "rohkea" esimerkki</data> Jaakko
____________________ http://www.sisulizer.com - Three simple steps to localize |
|||||||||||||
| ||||||||||||||
| Current time is 01:20 pm | |
| Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ... > Technical Support (You need to be registered at the forum to write) > Bugs and Quirks in Sisulizer > Entities always encoded in XML? | |
Sisulizer software localization tool - Three simple steps to localize