How to convert MS Word 2003 document to HTML in C #?

I would like to extract the content of a MS Word 2003 document to HTML in C #.

Any ideas?

+1


a source to share


2 answers


I think this is the easiest way to do it

http://asptutorials.net/C-SHARP/convert-ms-word-docs-to-html/

They indicate in the article that they are using the SaveAs function http://msdn.microsoft.com/en-us/library/aa220734.aspx

Like this:



    string newfilename = folder_to_save_in + FileUpload1.FileName.Replace(".doc", ".html");
    object o_nullobject = System.Reflection.Missing.Value;    
    object o_newfilename = newfilename;
    object o_format = Word.WdSaveFormat.wdFormatHTML;
    object o_encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
    object o_endings = Word.WdLineEndingType.wdCRLF;
    // SaveAs requires lots of parameters, but we can leave most of them empty:
    wordApplication.ActiveDocument.SaveAs(ref o_newfilename, ref o_format, ref o_nullobject,
    ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
    ref o_nullobject, ref o_nullobject, ref o_encoding, ref o_nullobject,
    ref o_nullobject, ref o_endings, ref o_nullobject);

      

In the Microsoft.Office.Interop.Word library;

If I recall correctly, Word is required on the machine where the code is running. If ASP.NET is required on the server.

+3


a source


Three ways: 1. save as HTML as described by napster 2. convert Open XML to HTML; XSLT is available at http://www.codeplex.com/OpenXMLViewer 3. for pure HTML, write code to convert each style to document in CSS and put any direct formatting in @style.



Is Word installed on a computer with C # code?

0


a source







All Articles