Word Filter
License
Title remains with the author. The licensee has the right to use
the software for personal purposes on a single computer at a time.
The licensee does not have the right to modify the software or create
derivative works. The licensee has the right to sublicense this
software provided these restrictions are included in the sublicense.
Introduction
The filter program was developed to clean up documents created
in Word for posting on the net.
Word is a great program for creating documents. However, not everyone
has Word nor does every piece of software understand how to open
up and interpret Word documents.
The most universally understood format for reading text is ASCII
(American Standard Code for Information Interchange) in other
words, plain text.
The problem with Word documents is that in their *.doc form, they
contain a lot of "control" information that appears as
garbage when displayed by software which can handle ASCII only,
such as news readers, email programs, and browsers.
Even when you save Word documents as "Text only with line
breaks," you still have problems. About 95% of the text translates
OK, but some of the special coding remains behind. Things like trademark
symbols, copyright symbols, and smart (curly) quotes are left "dirty."
That is, they show up as vertical bars or boxes or some other symbol
on the screen.
Furthermore, most Word fonts are proportional. That is, an "i"
takes up less space than an "M" so the software closes
up the extra space. The end result is that you wind up with a variable
number of characters on different lines, even though the lines are
of equal length (same number of inches or centimeters). When displayed
by software that understands ASCII only, each character takes up
the same amount of space. So, some lines may run over the edge of
the page, or wrap inappropriately.
The filter program was written to filter out special characters
in Word documents that are saved as text-only with line breaks.
Filter eliminates "dirty" characters and replaces them
with their plain ASCII equivalents. It also formats lines of variable
length to 65 characters per line, so they read well on just about
any screen.
There are two versions of the program: one for Unix, one for DOS.
The DOS version has every bit of the functionality of the Unix version,
but is not as user friendly. The reason for this is that I am much
better at shell programming than I am writing a *.bat file. If anyone
wants to write a more friendly "front end" perhaps even
a GUI, please be my guest, and share with the community.
About the filter "set" of programs
The filter set of programs consist of the following set of files.
Unix |
DOS |
Purpose |
filter.sh |
filter.bat |
the "overseer" program that kicks the
others off with the right arguments |
msed |
msed.exe |
a program that translates hexadecimal values into
user-defined ASCII representations |
mfmt |
mfmt.exe |
a program that formats variable length lines to
word wrap at the specified character length. |
mhyphen |
mhyphen.exe |
program that removes middle-of-the-line hyphens
that occur as the result of reformatting text |
word60.pat |
word60.pat |
a file containing the hexadecimal tokens and their
ASCII representations |
Note: msed, mfmt, and mhyphen can be run "stand
alone" or as part of other applications. Manual pages are included
for these executables.
Instructions for using filter
Create your document using MS-Word.
Save the word file as text only with line breaks.
Move the text file to the directory in which you would like to
work.
Type "filter.sh <filename>" (Unix) or "filter
<filename> (DOS).
The Unix version will do some file renaming of the input file.
It will strip off any suffixes, and replace them with ".in"
It will then create an output file name with a ".txt"
extension. You will have an option to overwrite this file name,
or merely accept it by pressing <ENTER>.
Note: If you supply an input file name with a ".txt"
extension, it will be translated to a ".in" extension
before processing so the original file will not be overwritten.
The DOS version is not as user friendly. It will leave the input
file name alone, and output everything to a file called filter.out.
Except for renaming the file (if required), youre done!
Download here:
Note: Instructions for making, installing
and configuring the software are contained in the download packages.
Click here to download the DOS version.
(Unzip into a directory called C:\bin and save a step).
Click here to download the Unix sources.
(Unzip into a build directory).
This program has been compiled successfully under several flavors
of Unix and in Linux.
|