Convert plain text list to html
I have a plain text list:
I am the first top-level list item
I am his son
Me too
Second one here
His son
His daughter
I am the son of the one above
Me too because of the indentation
Another one
And I would like to turn this into:
<ul>
<li>I am the first top-level list-item
<ul>
<li>I am his son</li>
<li>Me too</li>
</ul>
</li>
<li>Second one here
<ul>
<li>His son</li>
<li>His daughter
<ul>
<li>I am the son of the one above</li>
<li>Me too because of the indentation</li>
</ul>
</li>
<li>Another one</li>
</ul>
</li>
</ul>
How to do it?
a source to share
This code works as expected, but the headers are printed on a new line.
require "rubygems"
require "builder"
def get_indent(line)
line.to_s =~ /(\s*)(.*)/
$1.size
end
def create_list(lines, list_indent = -1,
b = Builder::XmlMarkup.new(:indent => 2, :target => $stdout))
while not lines.empty?
line_indent = get_indent lines.first
if line_indent == list_indent
b.li {
b.text! lines.shift.strip + $/
if get_indent(lines.first) > line_indent
create_list(lines, line_indent, b)
end
}
elsif line_indent < list_indent
break
else
b.ul {
create_list(lines, line_indent, b)
}
end
end
end
a source to share
I've never used ruby, but the usual algorithm remains the same:
- Create a data structure like this:
Node => (Text => string, Children => array of nodes) - Read line
- Check if the indent is greater than the current indent
- If so, add a line to the children of the current Node and call the method recursively with Node as active. Continue with 2.
- Check if the indent matches the current indentation.
- If so, add a line to the active node. Continue with 2.
- Check if the indent is below the current indent.
- If so, return from the method.
- Repeat until EOF.
For withdrawal:
1. print <ul>
2. Take the first node, print <li>node.Text
3. If there are child nodes (count of node.Children > 0) recurse to 1.
4. print </li>
5. take next node, continue from 2.
6. print </ul>
a source to share
convert the input to Haml then render it as HTML
require 'haml'
def text_to_html(input)
indent = -1
haml = input.gsub(/^( *)/) do |match|
line_indent = $1.length
repl = line_indent > indent ? "#{$1}%ul\n" : ''
indent = line_indent
repl << " #{$1}%li "
end
Haml::Engine.new(haml).render
end
puts text_to_html(<<END)
I am the first top-level list item
I am his son
Me too
Second one here
His son
His daughter
I am the son of the one above
Me too because of the indentation
Another one
END
leads to
<ul>
<li>I am the first top-level list item</li>
<ul>
<li>I am his son</li>
<li>Me too</li>
</ul>
<li>Second one here</li>
<ul>
<li>His son</li>
<li>His daughter</li>
<ul>
<li>I am the son of the one above</li>
<li>Me too because of the indentation</li>
</ul>
<li>Another one</li>
</ul>
</ul>
a source to share
Old topic, but ... Looks like I found a way to make Glenn Jackman HTML valid (avoid <ul>
with child <ul>
).
I am using tab indented lines.
require 'haml'
class String
def text2htmllist
tabs = -1
topUL=true
addme=''
haml = self.gsub(/^([\t]*)/) do |match|
line_tabs = match.length
if ( line_tabs > tabs )
if topUL
repl = "#{match}#{addme}%ul\n"
topUL=false
else
repl = "#{match}#{addme}%li\n"
addme += "\t"
repl += "#{match}#{addme}%ul\n"
end
else
repl = ''
addme = addme.gsub(/^[\t]/,'') if ( line_tabs < tabs ) #remove one \t
end
tabs = line_tabs
repl << "\t#{match}#{addme}%li "
end
puts haml
Haml::Engine.new(haml).render
end
end #String class
str = <<FIM
I am the first top-level list item
I am his son
Me too
Second one here
His son
His daughter
I am the son of the one above
Me too because of the indentation
Another one
FIM
puts str.text2htmllist
Outputs:
%ul
%li I am the first top-level list item
%li
%ul
%li I am his son
%li Me too
%li Second one here
%li
%ul
%li His son
%li His daughter
%li
%ul
%li I am the son of the one above
%li Me too because of the indentation
%li Another one
<ul>
<li>I am the first top-level list item</li>
<li>
<ul>
<li>I am his son</li>
<li>Me too</li>
</ul>
</li>
<li>Second one here</li>
<li>
<ul>
<li>His son</li>
<li>His daughter</li>
<li>
<ul>
<li>I am the son of the one above</li>
<li>Me too because of the indentation</li>
</ul>
</li>
<li>Another one</li>
</ul>
</li>
</ul>
a source to share
Perhaps you could do this by doing some simple finds and replacing the material. Programs like TextWrangler on Mac, Notepad ++ on Windows, and possibly gedit on linux (not sure how well its find works with tricky things) can look for newlines and replace them with other things. Start with the highest level material and work your way (start with things with no spaces in front and work). You may have to experiment a little to get the material you want. If you want to do this on a regular basis, you can probably do a little script, but I doubt it is.
a source to share