StringArray::addTokens() bug!

4 posts / 0 new
Last post
Randy
Offline
Last seen: 7 years 10 months ago
Joined: 21 Jun 2005 - 21:28
StringArray::addTokens() bug!

Look at these examples:

StringArray tokens;

tokens.addTokens(T("two words"), false);
// tokens.size() is 2 (correct!)

tokens.clear();
tokens.addTokens(T("five words with trailing space "), false);
// tokens.size() is 6 (incorrect!)

tokens.clear();
tokens.addTokens(T("six words with two trailing spaces  "), false);
// tokens.size() is 8 (incorrect!)

tokens.clear();
tokens.addTokens(T(" five words with leading space"), false);
// tokens.size() is 6 (incorrect!)

tokens.clear();
tokens.addTokens(T("seven words with three   spaces in middle"), false);
// tokens.size() is 9 (incorrect!)

I don't think we want leading, trailing, and multiple spaces included when tokenizing, do we?

It's easily handled with a call to StringArray::removeEmptyStrings(true) when tokenizing whitespace, but not as easily remedied when tokenizing with other break characters.

OvermindDL1
Offline
Last seen: 7 years 1 month ago
Joined: 3 Jun 2005 - 11:58

I'd say just use spirit to parse it, would be worlds easier.

Actually, spirit is probobly overkill, boost:tokanizer is made for *exactly* that kind of thing though, and its been through its paces years ago so there are no known bugs. And it would let you parse tokens based on whitespace or any other seperator. Would work quite well inside addTokens() in the StringArray class.

jules
Online
Last seen: 10 min 6 sec ago
Joined: 29 Apr 2013 - 18:37
Re: StringArray::addTokens() bug!

Randy wrote:
Look at these examples:

StringArray tokens;

tokens.addTokens(T("two words"), false);
// tokens.size() is 2 (correct!)

tokens.clear();
tokens.addTokens(T("five words with trailing space "), false);
// tokens.size() is 6 (incorrect!)

tokens.clear();
tokens.addTokens(T("six words with two trailing spaces  "), false);
// tokens.size() is 8 (incorrect!)

tokens.clear();
tokens.addTokens(T(" five words with leading space"), false);
// tokens.size() is 6 (incorrect!)

tokens.clear();
tokens.addTokens(T("seven words with three   spaces in middle"), false);
// tokens.size() is 9 (incorrect!)

I don't think we want leading, trailing, and multiple spaces included when tokenizing, do we?

It's easily handled with a call to StringArray::removeEmptyStrings(true) when tokenizing whitespace, but not as easily remedied when tokenizing with other break characters.

that's usually true for whitespace, but if I was tokenising with other separators, e.g.

"a,b,,c,,"

then I'd be quite annoyed if it didn't return 6 items, three of them empty.. And of course removeEmptyStrings would clean this up too, if you just want the non-empty tokens.

Randy
Offline
Last seen: 7 years 10 months ago
Joined: 21 Jun 2005 - 21:28
Re: StringArray::addTokens() bug!

jules wrote:
that's usually true for whitespace, but if I was tokenising with other separators, e.g.

"a,b,,c,,"

then I'd be quite annoyed if it didn't return 6 items, three of them empty.. And of course removeEmptyStrings would clean this up too, if you just want the non-empty tokens.

A very good point I'd completely overlooked as I'm only dealing with whitespace! I've just thrown in a bunch of removeEmptyStrings() calls into my parsing routines to clean it up.