Support forum of the software localization tool Sisulizer


Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ... Home

Get in contact with the makers of Sisulizer.
Our forum is open for all questions around Sisulizer from customers and prospects.
Don't hesitate to register and ask. The Sisulizer team will answer ASAP.

Search     Help Home Sisulizer Website Download
Search by username
Not logged in - Login | Register 

 Moderated by: Renate.Reinartz, Markus.Kreisel, Jaakko.Salmenius, Ilkka.Salmenius
New Topic Reply Printer Friendly
Entities always encoded in XML? - Bugs and Quirks in Sisulizer - Technical Support (You need to be registered at the forum to write) - Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ...
AuthorPost
 Posted: Mon Nov 10th, 2008 06:18 am
PM Quote Reply
starr
Member
 

Joined: Sat Jan 6th, 2007
Location:  
Posts: 86
Status: 
Offline
According to the help, "When creating localized XML file Sisulizer always encodes & and <. Encoding of other predefined entities depends on their usage in the original file. If the original file has encoded them then the localized files created by Sisulizer use the same encoding." However, the real result is that Sisulizer ALWAYS encode ALL common  entities, like ", ' and etc. So is this a bug or design? I think it better to preserve the original format "as is".

You can take the attached file for inspection.

Attachment: English.zip (Downloaded 2 times)

Back To Top PM Quote Reply

 Posted: Mon Nov 10th, 2008 01:17 pm
PM Quote Reply
Janusz Grzybek
Super Moderator


Joined: Fri Dec 1st, 2006
Location: Zabrze, Poland
Posts: 561
Status: 
Offline
Hello,

Sisulizer doesn’t need convert all " or ' characters to numerical entities in all XML files. It depend on XML structure of original file and Sisulizer change these characters on numerical entities when it could be violate XML syntax. Below is screenshot from output FeedDemon language file edited in text editor. I’ve typed in Sisulizer translation sheet “Sample” text to two example cells and:
- Quotation marks for cdata element aren’t encoded (indicated by red narrows on screenshot)
- Quotation marks for control name attribute are encoded (indicated by blue narrows on screenshot). In this case “ characters should be changed, because otherwise XML will be contain syntax bug. You could to test it in easy way. Type our “Sample” text to original via text editor to this same string, and next re-scan project. Sisulizer probably won’t finish scan, because parser will find error.
BTW: Latest original English.fdlang2 file contain syntax bug in line 1641 (standalone < character), but I think you also found it. I would like send information about it to Nick.

Best regards,
Janusz

Attachment: screen.png (Downloaded 91 times)



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

 Posted: Mon Nov 10th, 2008 01:21 pm
PM Quote Reply
Janusz Grzybek
Super Moderator


Joined: Fri Dec 1st, 2006
Location: Zabrze, Poland
Posts: 561
Status: 
Offline
Below is screenshot from other output Sisulizer XML file (this same test).

Attachment: screen1.png (Downloaded 91 times)



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

 Posted: Tue Nov 11th, 2008 01:49 am
PM Quote Reply
starr
Member
 

Joined: Sat Jan 6th, 2007
Location:  
Posts: 86
Status: 
Offline
It's glad to see that you also use FeedDemon. But as for the issue I reported, I want to add that the original language file is maybe not fully valid. Take a look at line 1181. The original phrase reads like
Unsubscribe from "%s?"
while the target of Sisulizer converts it to
Unsubscribe from &quot;%s?&quot; (I use full-width characters here to prevent from forum's auto conversion)
The problem actually lies here. The texts of dialog are not enclosed by CDATA. Then Sisulizer encodes them. I think it should also keep the original format here though I believe it virtually won't cause compatibility problem neither.

Last edited on Tue Nov 11th, 2008 05:12 am by starr

Back To Top PM Quote Reply

 Posted: Sun Nov 16th, 2008 05:43 am
PM Quote Reply
starr
Member
 

Joined: Sat Jan 6th, 2007
Location:  
Posts: 86
Status: 
Offline
Is there any update there?

Back To Top PM Quote Reply

 Posted: Sun Nov 16th, 2008 08:11 am
PM Quote Reply
Janusz Grzybek
Super Moderator


Joined: Fri Dec 1st, 2006
Location: Zabrze, Poland
Posts: 561
Status: 
Offline
Hello Starr,

I'll ask our developers about implementation following encoding/decoding method for " and ':
- when source contains " or ' characters instead entities, Sisulizer will keep these characters, if it doesn't violate XML syntax
- for entities in source, quotation marks/apostrophes not existing in source, and for quotation marks/apostrophes which could to violate XML syntax should be keep current method of encoding/decoding

Last week our team was very busy on Tech-Ed EMEA conference in Barcelona, and we have long queue other requests, so potential change of encoding/decoding special characters could take some time, but I let you know about answer of our R&D.

Best regards,
Janusz



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

 Posted: Tue Dec 2nd, 2008 05:35 am
PM Quote Reply
starr
Member
 

Joined: Sat Jan 6th, 2007
Location:  
Posts: 86
Status: 
Offline
So dear Janusz Grzybek, how was going on with the topic? I updated to build 277 and the issue persists. The quotation marks enclosed in CDATA were still encoded in target file even if those in original weren't.

Back To Top PM Quote Reply

 Posted: Tue Dec 2nd, 2008 05:39 am
PM Quote Reply
Jaakko.Salmenius
Administrator


Joined: Sat Apr 8th, 2006
Location: Tokyo, Japan
Posts: 1641
Status: 
Offline
I am still working on this.

Jaakko



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

 Posted: Fri Dec 5th, 2008 12:35 am
PM Quote Reply
Jaakko.Salmenius
Administrator


Joined: Sat Apr 8th, 2006
Location: Tokyo, Japan
Posts: 1641
Status: 
Offline
I got your point. Current Sisulizer works that way that it uses encoded characters if the original file anywhere uses that encoding. Lets have examples.

1) If we have an XML file that does not have encoding

<?xml version="1.0" encoding="UTF-8"?>
<sample>
  <plain>This is a "sample" too</plain>
</sample>


Then Sisulizer wont use them when writing.

2) If we have an XML fiel that has encoding then Sisulizer uses them always

<?xml version="1.0" encoding="UTF-8"?>
<sample>
  <plain>This is a "sample" too</plain>
  <plain>This is a &quot;sample&quot; too</plain>
</sample>


So this would come to

<?xml version="1.0" encoding="UTF-8"?>
<sample>
  <plain>This is a &quot;sample&quot; too</plain>
  <plain>This is a &quot;sample&quot; too</plain>
</sample>


This logic is OK. If you use encoding in one place it gives Sisulizer right to use them in another places too.

However current Sisulizer does not make any different between attribute and element data. " must always be encoded inside attribute because " is attribute delimiter. So if we have the following code were attribute uses encoding but element does not.

<?xml version="1.0" encoding="UTF-8"?>
<sample>
  <plain data="Really &quot;black&quot; slope">This is a sample</plain>
  <plain>This is a "sample" too</plain>
</sample>


Currently Sisulizer creates

<?xml version="1.0" encoding="UTF-8"?>
<sample>
  <plain data="Really &quot;black&quot; slope">This is a sample</plain>
  <plain>This is a &quot;sample&quot; too</plain>
</sample>


If will fix this such way that Sisulizer handles the ecoding of atttribute and element data separately.

Thank you for pointing this out.

Jaakko



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

 Posted: Fri Dec 5th, 2008 12:28 pm
PM Quote Reply
starr
Member
 

Joined: Sat Jan 6th, 2007
Location:  
Posts: 86
Status: 
Offline
I'm glad that you find the problem and thanks for the upcoming fix in advance.

Back To Top PM Quote Reply

 Posted: Mon Dec 8th, 2008 06:51 am
PM Quote Reply
Jaakko.Salmenius
Administrator


Joined: Sat Apr 8th, 2006
Location: Tokyo, Japan
Posts: 1641
Status: 
Offline
Fixed in 278. Sisulizer always uses the same enoding as the original item. 278 will come out this week.

Jaakko



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

 Posted: Mon Dec 8th, 2008 06:56 am
PM Quote Reply
starr
Member
 

Joined: Sat Jan 6th, 2007
Location:  
Posts: 86
Status: 
Offline
Thanks for the fix. Let's wait and see the magic. :)

Back To Top PM Quote Reply

 Posted: Mon Dec 8th, 2008 07:03 am
PM Quote Reply
Jaakko.Salmenius
Administrator


Joined: Sat Apr 8th, 2006
Location: Tokyo, Japan
Posts: 1641
Status: 
Offline
One note. If the original item uses two or more encodings then the localized only uses one. Example

<data>This is "new" and &quot;bold&quot; sample</data>

The text is

This is "new" and "bold" sample

and it uses two encodings: plain (") and name (&quot). In this case SL will use only named on localized items. Finnish sample:

<data>Tämä &quot;uusi&quot; ja &quot;rohkea&quot; esimerkki</data>

Jaakko



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

Current time is 01:20 pm  
Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ... > Technical Support (You need to be registered at the forum to write) > Bugs and Quirks in Sisulizer > Entities always encoded in XML?



WowUltra 1.11 Copyright © 2007 by Jim Hale - Based on WowBB Copyright © 2003-2006 Aycan Gulez

Sisulizer software localization tool - Three simple steps to localize