Convert plain text list to html

I have a plain text list:

I am the first top-level list item
  I am his son
  Me too
Second one here
  His son
  His daughter
    I am the son of the one above
    Me too because of the indentation
  Another one

And I would like to turn this into:

<ul>
  <li>I am the first top-level list-item
    <ul>
      <li>I am his son</li>
      <li>Me too</li>
    </ul>
  </li>
  <li>Second one here
    <ul>
      <li>His son</li>
      <li>His daughter
        <ul>
          <li>I am the son of the one above</li>
          <li>Me too because of the indentation</li>
        </ul>
      </li>
      <li>Another one</li>
    </ul>
  </li>
</ul>

      

How to do it?

+2


a source to share


5 answers


This code works as expected, but the headers are printed on a new line.



require "rubygems"
require "builder"

def get_indent(line)
  line.to_s =~ /(\s*)(.*)/
  $1.size
end

def create_list(lines, list_indent = -1, 
       b = Builder::XmlMarkup.new(:indent => 2, :target => $stdout))
  while not lines.empty?
    line_indent = get_indent lines.first

    if line_indent == list_indent
      b.li {
        b.text! lines.shift.strip + $/
        if get_indent(lines.first) > line_indent
          create_list(lines, line_indent, b)
        end
      }
    elsif line_indent < list_indent
      break
    else
      b.ul {
        create_list(lines, line_indent, b)
      }
    end
  end
end

      

+1


a source


I've never used ruby, but the usual algorithm remains the same:

  • Create a data structure like this:
    Node => (Text => string, Children => array of nodes)
  • Read line
  • Check if the indent is greater than the current indent
  • If so, add a line to the children of the current Node and call the method recursively with Node as active. Continue with 2.
  • Check if the indent matches the current indentation.
  • If so, add a line to the active node. Continue with 2.
  • Check if the indent is below the current indent.
  • If so, return from the method.
  • Repeat until EOF.


For withdrawal:

1. print <ul>
2. Take the first node, print <li>node.Text
3. If there are child nodes (count of node.Children > 0) recurse to 1.
4. print </li>
5. take next node, continue from 2.
6. print </ul>

      

+5


a source


convert the input to Haml then render it as HTML

require 'haml'

def text_to_html(input)
  indent = -1
  haml = input.gsub(/^( *)/) do |match|
    line_indent = $1.length
    repl = line_indent > indent ? "#{$1}%ul\n" : ''
    indent = line_indent
    repl << "  #{$1}%li "
  end
  Haml::Engine.new(haml).render
end

puts text_to_html(<<END)
I am the first top-level list item
  I am his son
  Me too
Second one here
  His son
  His daughter
    I am the son of the one above
    Me too because of the indentation
  Another one
END

      

leads to

<ul>
  <li>I am the first top-level list item</li>
  <ul>
    <li>I am his son</li>
    <li>Me too</li>
  </ul>
  <li>Second one here</li>
  <ul>
    <li>His son</li>
    <li>His daughter</li>
    <ul>
      <li>I am the son of the one above</li>
      <li>Me too because of the indentation</li>
    </ul>
    <li>Another one</li>
  </ul>
</ul>

      

+1


a source


Old topic, but ... Looks like I found a way to make Glenn Jackman HTML valid (avoid <ul>

with child <ul>

).
I am using tab indented lines.

    require 'haml'
    class String
       def text2htmllist
         tabs = -1
         topUL=true
         addme=''

         haml = self.gsub(/^([\t]*)/) do |match|
           line_tabs = match.length

           if ( line_tabs > tabs )
                if topUL
                    repl = "#{match}#{addme}%ul\n"
                    topUL=false
                else
                    repl = "#{match}#{addme}%li\n"
                    addme += "\t"
                    repl += "#{match}#{addme}%ul\n"
                end
           else
              repl = ''
              addme = addme.gsub(/^[\t]/,'') if ( line_tabs < tabs ) #remove one \t 
           end
           tabs = line_tabs
           repl << "\t#{match}#{addme}%li "

         end
         puts haml
         Haml::Engine.new(haml).render
       end
    end #String class

    str = <<FIM
    I am the first top-level list item
        I am his son
        Me too
    Second one here
        His son
        His daughter
            I am the son of the one above
            Me too because of the indentation
        Another one
    FIM

    puts str.text2htmllist

      

Outputs:

%ul
    %li I am the first top-level list item
    %li
        %ul
            %li I am his son
            %li Me too
    %li Second one here
    %li
        %ul
            %li His son
            %li His daughter
            %li
                %ul
                    %li I am the son of the one above
                    %li Me too because of the indentation
            %li Another one
<ul>
  <li>I am the first top-level list item</li>
  <li>
    <ul>
      <li>I am his son</li>
      <li>Me too</li>
    </ul>
  </li>
  <li>Second one here</li>
  <li>
    <ul>
      <li>His son</li>
      <li>His daughter</li>
      <li>
        <ul>
          <li>I am the son of the one above</li>
          <li>Me too because of the indentation</li>
        </ul>
      </li>
      <li>Another one</li>
    </ul>
  </li>
</ul>

      

+1


a source


Perhaps you could do this by doing some simple finds and replacing the material. Programs like TextWrangler on Mac, Notepad ++ on Windows, and possibly gedit on linux (not sure how well its find works with tricky things) can look for newlines and replace them with other things. Start with the highest level material and work your way (start with things with no spaces in front and work). You may have to experiment a little to get the material you want. If you want to do this on a regular basis, you can probably do a little script, but I doubt it is.

0


a source







All Articles