Split a large text string into variable length lines without breaking words and keeping line breaks and spaces
I am trying to split a large line of text into several smaller lines of text and define each smaller line length to be different. eg:
"The quick brown fox jumped over the red fence.
The blue dog dug under the fence."
I would like to have code that can break it down into smaller lines and have the first line with max. 5 characters, second line - max. 11, and the rest is a maximum of 20, resulting in:
Line 1: The
Line 2: quick brown
Line 3: fox jumped over the
Line 4: red fence.
Line 5: The blue dog
Line 6: dug under the fence.
All this in C # or MSSQL, is it possible?
a source to share
public List<String> SplitString(String text, int [] lengths)
{
List<String> output = new List<String>();
List<String> words = Split(text);
int i = 0;
int lineNum = 0;
string s = string.empty;
while(i<words.Length)
{
if(s.Length+words[i].Length <lengths[lineNum])
{
s+=words[i];
i++;
if(lineNum<lengths.Length-1)
lineNum++;
}
else
{
output.Add(s);
s=String.Empty;
}
}
s.Remove(S.length-1,1);// deletes last extra space.
return output;
}
public static List<string> Split(string text)
{
List<string> result = new List<string>();
StringBuilder sb = new StringBuilder();
foreach (var letter in text)
{
if (letter != ' ' && letter != '\t' && letter != '\n')
{
sb.Append(letter);
}
else
{
if (sb.Length > 0)
{
result.Add(sb.ToString());
}
result.Add(letter.ToString());
sb = new StringBuilder();
}
}
return result;
}
This is untested / compiled code, but you should get the idea.
I also think that you should be using StringBuilder, but I don't remember how to use it.
a source to share
\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z
up to five characters will be recorded in group 1, up to 11 in group 2, and fragments up to 20 in group 3. Matches will be divided by word separators to avoid splitting in the middle of a word. Spaces, line breaks, etc. They are considered symbols and will be saved.
The trick is to get individual matches in a repeating group, which can only be done in .NET and Perl 6:
Match matchResults = null;
Regex paragraphs = new Regex(@"\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z", RegexOptions.Singleline);
matchResults = paragraphs.Match(subjectString);
if (matchResults.Success) {
String line1 = matchResults.Groups[1].Value;
String line2 = matchResults.Groups[2].Value;
Capture line3andup = matchResults.Groups[3].Captures;
// you now need to iterate over line3andup, extracting the lines.
} else {
// Match attempt failed
}
I don't know C # at all and am trying to build this from RegexBuddy templates and the VB code here , so please feel free to point out coding errors.
Note that spaces at the beginning of line two are captured at the end of the previous match.
a source to share