Two problems with String::formatted (one a buffer overflow!)

34 posts / 0 new
Last post
TomSwirly
Offline
Last seen: 5 months 2 weeks ago
Joined: 8 Feb 2010 - 22:19
Two problems with String::formatted (one a buffer overflow!)

I know that Jules doesn't like String::formatted - but I have an internationalized(*) application which has a lot of strings like: "Unable to open file %s with error code %s" and there really isn't another way to do it.

First, there's a buffer overflow. The buffer used is 256 bytes long - if the results of String::formatted are greater than 256 bytes, it simply writes into "random memory". This 256 character limitation isn't documented, and as you know, buffer overflows are very dangerous...

So far, I haven't hit this limitation - I found it while debugging the next problem - but since I use String::formatted to make fairly long error messages that include full file paths it is a matter of certainty that if enough people use my program, someone will get an error with a file with a very long path and I'll have something completely unpredictable happen.

My second issue is that String::formatted seemed to work fine on the Mac to print strings, but I got non UTF-8 ("garbage") characters generated on the PC.

After some debugging, I narrowed it down to the fact that Juce's String::formatted only seems to accept wide characters on the PC, and only narrow characters on the Mac!

The following code works fine, but requires different actual calls on Mac and on PC...

String MAIL_SUBJECT("Support Request: %s");
String title = "Some title here";

const char* narrow = s.toUTF8().getAddress();
const wchar_t* wide = s.toWideCharPointer();

String res = 
#if JUCE_WINDOWS
    String::formatted(MAIL_SUBJECT, wide);
#else
    String::formatted(MAIL_SUBJECT, narrow);
#endif
The code sample works fine on both Mac and PC. If I change the condition to !JUCE_WINDOWS, it works incorrectly on both Mac and PC, so I can't use just one of the two alternatives...

This doesn't seem right. Is there a better way to do this, or am I missing something obvious?

(* - admittedly, we haven't prepared a translation yet, but it's all set up to do so...)

jules
Offline
Last seen: 7 hours 26 min ago
Joined: 29 Apr 2013 - 18:37
Re: Two problems with String::formatted (one a buffer overfl

Damn, I hate that function.

TomSwirly
Offline
Last seen: 5 months 2 weeks ago
Joined: 8 Feb 2010 - 22:19
Re: Two problems with String::formatted (one a buffer overfl

:-)

I knew that. But there really isn't an alternative for internationalized applications. You need SOME sort of templated output function - it's not just that word order changes from language to language, but if you try to split your messages up into tiny substrings and then put them together, it's almost impossible from the translator to do it correctly.

The only alternative is to use some full-scale templating language like Clearsilver - but that's a really heavy hammer for a problem that printf and its numerous inbred cousins do quite well.

I don't know how to fix the wide/narrow character problem, but the simple solution to the buffer overflow is to create another version where the first argument is a length, and then deprecate the original one.

Confess - you love C++, with all its warts. This is one of them - you should learn to love its expediency and its history, and accept its gnarliness.

jfitzpat
Offline
Last seen: 5 days 14 hours ago
Joined: 10 Jan 2012 - 05:29
Re: Two problems with String::formatted (one a buffer overfl

Really, formatted (like sprintf) kind of sucks for localization too.

Sure, you can take "%d nuns dancing on the head of %d pins" and translate it to some other western languages, but the order of the items inserted is fixed. That makes for really convoluted grammar in some languages depending on the subject. You generally want something like tokens: "%1 nuns dancing on the head of %2 pins". That way if the language is most natural with something like "On pins of %2, %1 nuns gyrate...", you can do it.

juce::String already has all the members you'd need to put together a pretty spiffy localized parameter string class.

TomSwirly
Offline
Last seen: 5 months 2 weeks ago
Joined: 8 Feb 2010 - 22:19
Re: Two problems with String::formatted (one a buffer overfl

Yes, absolutely good point. You absolutely can't be sure that the order of terms is the same. I'm planning to have six languages - English, French, German, Indonesian, Spanish, Italian - because I know the first five fairly well and the last one is dead easy - and in these languages the order of nouns is basically the same - but Japanese would be very important and I'm fairly sure that its noun order can be different.

Hmm... makes me think here a bit. In particular, you only need to have the equivalent of %s - because for these messages, numbers are rare, and because you can just pre-format them as strings. So all you really need is %1, %2, %3 and nothing else.

Didn't I send Jules something like this about a year ago?! But I can't find it.

I might whip something out this afternoon...

(* - that is, me)

TomSwirly
Offline
Last seen: 5 months 2 weeks ago
Joined: 8 Feb 2010 - 22:19
Re: Two problems with String::formatted (one a buffer overfl

Well, interesting - I ran into a limitation of juce::String that's preventing me from doing a really good job on this.

The issue is that there's no efficient way to set a single character in a string! operator[] returns a const juce_wchar and there seems to be no setter method - so building a string by adding one character at a time is potentially quadratic in time, even if you have a good maximum bound in advance as to the length of the string.

Such a setter (not necessarily operator[]) should be added to juce::String. For parsing purposes, you often need to do this...

TheVinn
Offline
Last seen: 1 month 1 week ago
Joined: 29 Aug 2009 - 11:31
Re: Two problems with String::formatted (one a buffer overfl

TomSwirly wrote:
Such a setter (not necessarily operator[]) should be added to juce::String. For parsing purposes, you often need to do this...

That's crazy talk... if the underlying juce::String is UTF8 or UTF16 encoded then there is no 1:1 mapping between logical characters and physical positions in the memory block used to store the string. Attempting to discover the physical index of a logical character would run in O(N) where N ~= logical index, and then actually changing the character would be either O(1) or O(N) where N ~= logical index depending on the difference between the original code point and the new code point.

However, a manly way to resolve this would be to provide a non-const operator[] ONLY for a juce::String which uses UTF32 (since there is a 1:1 mapping). I believe you can do this yourself by calling toUTF32(), removing the const, and doing the work using UTF32 code points. Then you could convert it back I suppose, and wrap this all up in a nice interface that hides the mess.

TomSwirly
Offline
Last seen: 5 months 2 weeks ago
Joined: 8 Feb 2010 - 22:19
Re: Two problems with String::formatted (one a buffer overfl

> if the underlying juce::String is UTF8 or UTF16 encoded

DOH! I should have realized that on my own, having done an awful lot with UTF-8 (ultra-quibble - note that there's a dash in the official name).

As an aside, I have zero understanding of why anyone would use UTF-16 - it seems to have the worst of all worlds, not being backward compatible with ASCII, not having a predictable character size, and being twice as long as UTF-8 for coding "plain old ASCII" strings.

TheVinn
Offline
Last seen: 1 month 1 week ago
Joined: 29 Aug 2009 - 11:31
Re: Two problems with String::formatted (one a buffer overfl

TomSwirly wrote:
I have zero understanding of why anyone would use UTF-16

It's a great choice if you want to call the Unicode version of the Win32 API functions.

In fact, it's the only choice.

TomSwirly
Offline
Last seen: 5 months 2 weeks ago
Joined: 8 Feb 2010 - 22:19
Re: Two problems with String::formatted (one a buffer overfl

:-( Quite so. Yes, I vaguely knew that, but that doesn't mean it's rational!

Perhaps I should have spoken slightly differently and said, "What was going through the mind of the people who invented UTF-16 is beyond me."

jfitzpat
Offline
Last seen: 5 days 14 hours ago
Joined: 10 Jan 2012 - 05:29
Re: Two problems with String::formatted (one a buffer overfl

TomSwirly wrote:
but Japanese would be very important and I'm fairly sure that its noun order can be different.

Yes, you can see it for yourself with something like Google Translate. The # pins would be first in Japanese, and Hebrew too. German would be something like 10 Nonnen tanzen... in this case, but I've run into grammatical problems with technical phrases before.

Pages