I had to process a lot of Word .docx files into readable content for use in a searchable database.  Docx files are basically xml files in a zipfile container (as described by wikipedia).  Here is my solution, it’s pretty straight forward.  Just pass in the server file path to the read_docx() function and it will return the text from that file.

 

Leave a Reply

Your email address will not be published. Required fields are marked *