The Ten Commandments For Comparison Of A Legal Document in MS Word


1. Thou Shalt follow "The Seven Laws Of Outline Numbering".
2. Thou Shalt Not Boast a Bad Binary File Header, or Use "Fast Saves".
3. Thou Shalt Thoroughly Cleanse Thy Scanned Documents.
4. Thou Shalt Properly Paginate Any Document To Be Compared.
5. Thou Shalt Unlink All Field Codes, Turning Them to Text – and While You’re At It: Remove Any {TA}/{TOA} Field Codes from Thy Content.
6. Thou Shalt Practice Safe Pasting From Other Windows Applications.
7. Thou Shalt Not Permit Large Populations of Directly-Formatted Paragraphs, Empty Paragraph
8. Markers or "Creepy" Section Breaks to Remain in Thy Documents.
9. Thou Shalt Be Respectful When Comparing Large Tables and Linked or Embedded "Stuff".
10. Thou Shalt Be Mindful That Special Characters be Consistently Created, Especially in Footnotes.
11. Thou Shalt Unburden All Documents of Active Or Dormant Revision Tracking.

Punishment For Ignoring The Ten Commandments For Comparison Of A Legal Document In Word 97:
Abiding by "The Ten Commandments" is clearly a choice, not a mandate. If ignored, at a minimum one might suffer output which is undesirable, requires further editing, or does not log all changes; at worst, an illegal operation (e.g., crash) of the application. When obeyed, however, the payback yields more than a desirable document comparison moment. Subscription to "The Ten Commandments" reinforces document production skills which bring us – and our documents – out of this century, and, gratefully, onto the next…

Definition Of Terms

*Binary File Header – What’s left of the document when you remove its content. In tangible terms, this refers to the style list, properties of the document’s first section, numbering defined in the document (e.g., the "ListTemplate" collection), or information stuffed into the File/Properties fields – in short: the stuff held in the all-powerful final paragraph marker of a Word document.

*RTF File Format – This stands for "Rich Text Format" and it has been an evolving Microsoft standard since Word’s early days. Originally devised as an ASCII file format which provided a mechanism for transporting files across multiple platforms where binary file transfers were compromised by differences in operating and file systems, RTF grew into the standard interchange format used by application vendors to read and write Word documents. Interestingly, it is also the ‘standard’ text format used by the Windows Clipboard. With the release of Office 97, the RTF file format specification was updated to provide for the expanded attributes of the Word 97 binary file format.

*The ListTemplate – Word-speak for individual numbering schemes defined within any document.

* Directly Applied Paragraph Formats – Any paragraph format that is contrary to the style below. In the context of this article, the troublesome attributes generally refer to paragraph indents, line spacing choices or custom tabstops applied directly to the majority of paragraphs in the document. In this case, directly-applied formats would be more efficiently inherited by text in the document as properties of a paragraph style. The command, [Shift-F1], reveals the formatting attributes of any text on which you click. To strip away directly-applied paragraph formatting use [Ctrl-Q] ([Ctrl-SpaceBar] clears character formats), or simply modify the underlying style to include the attributes that are directly applied. Either technique will more efficiently format the document. PLEASE NOTE: This in no way indicates that directly-applied attributes are not allowed in a document, but rather invites consideration that the use of styles invites a significant gain in productivity. For example, a document where double-spacing has been directly-applied to all paragraphs would behave more efficiently in Word if this attribute were added to the underlying paragraph style.

* Unlink Field Codes – Word’s field codes – those commands encased in the { and } curly braces – are displayed with [Alt-F9], updated when selected with the [F9] key, and unlinked when selected using [Ctrl-Shift-F9] When a field code is unlinked, it becomes text – the dynamic nature of the field code, then, is removed.

1. Thou Shalt follow "The Seven Laws Of Outline Numbering".
Without argument, the power of outline numbering introduced in Word 97 has provided compelling reasons for our nation’s law firms to migrate away from their previous legacy of a code-based word processor to the styles-based document production environment afforded by Microsoft Word. Sadly, our users – not to mention their supporting cast – are not adequately prepared to use nor troubleshoot this powerful "new" feature. Microsoft uses a term, the "Futz Factor", to describe the inappropriate, unanticipated, and largely unrepeatable application of a new software feature. Indeed, in the context, and in our community, Word 97’s numbering features have invited a fair amount of "Futz"!

As usability issues associated to Outline Numbering began to overwhelm the community late last year, Microsystems released a document titled "The Seven Laws of Word 97’s Outline Numbering Linked to Styles". Its intent was not only to explain the feature’s use, but to promote "Best Practices" for training, supporting, troubleshooting and applying Outline Numbering.

While disappearing numbers, unlinked styles and other nuisances occur in a document where the uninformed user has attempted to apply Outline Numbering, the more debilitating moment in which the feature’s misuse turns up is at the moment when executing Lexis-Nexis’ CompareRite product. Those who are less document-inclined might be quick to point to CompareRite as the guilty application here, but a saucer-deep search through the document will more often than not, reveal that one or more of "The Seven Laws" has been violated.

Most common outcomes of numbering gone awry in a document are:

Crashing of CompareRite.
Textual outline numbers in the resulting document, but inaccurate {LISTNUM} values.
Mixture of directly applied numbering and textual numbering in the resulting document, thus producing inaccurate numbering results.
Bulleted lists producing odd font formatting, and/or inappropriately-mapped bullet characters.
A study of "The Seven Laws" might bring some insight to your numbering practices, and indeed could point out some of the compromises encountered when the document travels through the CompareRite process.

2. Thou Shalt Not Boast a Bad Binary File Header.
Community-wide concerns have been expressed speaking to the lack of reliability of CompareRite in a Word 97 environment – most especially in comparison to (pun intended!) the outstanding reliability we saw in the WordPerfect 5.1-era. To shed some perspective here, the statistics say it all: the ‘weight’ of a binary file created in WordPerfect 5.1 was just 314 bytes, but in Word 97, it is nearly 20,000 bytes – almost double its own preceding version (Word 6/7) which weighed in at 11,000 bytes. The complexities associated to the document itself – before content has been added – have increased nearly 63 to 1! This articulates, too, the increased opportunities for the comparison – and a myriad of other conversion moments – to become ‘less reliable’ in the eyes of our users, and expertly defines what has happened to document production requirements since the fondly recalled days of the ‘blue screen’. The intent of this commandment, then, is to bring awareness to identifying a troubled document – one whose binary file header is of suspect origins – which, when compared, will undoubtedly undermine the application’s attempts to produce an output document.



To ‘do its thing’, CompareRite must spawn a conversion of your Word 97 document. To do so, it uses an updated RTF format released by Microsoft in conjunction with the update to Word 97. As such, the integrity of the binary file header of the document is of paramount importance. To the layman, a binary file header means

*The document’s style list. This cannot include previously-converted styles which appear ‘suspect’. Most often, these come from WordPerfect documents, or the abused, round-tripped document. Suspect style names might include odd symbols, truncated names ("Default Para" rather than "Default Paragraph Font"), or no name at all. You might also verify that non-Standard collections of styles do not appear in the document – most often generated by a scanner session gone weird, or by a further conversion of a scanner session gone awry ("OmniPage#1", "OmniPage#2", … or ., "P1", "P2",…) …

* The document’s first section. The properties of this section – nor any other section, for that matter – must not contain poorly-converted headers or footers filled with ill-placed Text Frames or a significant number of superfluous or aborted field codes (refer to the Fifth Commandment:).

* Numbering. If numbering is inconsistently-applied or incompletely-defined, the RTF filter can be thwarted at converting the document. Remember: in Word 97 numbering, whether it be bullets, numbers, or outline schemes, is a property of the document not of the paragraph. Any incompleteness here will surface when the document is prepared for comparison.

* The bookmark list. A bookmark list piled full of entries (we’ve seen as many as 996!), does not provide an efficient – nor successful – CompareRite moment.

One way to test whether the binary file header (e.g., all parts of the file, irrespective of its content) is the culprit would be to delete all but a few words of the content of the original and revised files, then SaveAs to new names. If these files crash CompareRite, you know the file itself, not its content, is the real problem. The most popular reasons for this event are documents whose binary file header were manufactured by an application other than Word 97, or – and we’ve seen a fair number of these – documents in which numbering has been incorrectly or incompletely defined.

One additional side note about Word’s own Compare feature: The feature was introduced way back in the Word 1.0/2.0 era. Since that time few enhancements have been made to the utility, while many enhancements have been made to Word. itself Many folks report that the feature will often miss whole hunks of changes, or may even hang on large comparisons. Once again, the increased complexities of the document are at work here: Word 2.0 files, were indeed simpler to deal with.

3. Thou Shalt Thoroughly Cleanse Thy Scanned Documents.
With Versions 6 and 7 of Microsoft Word, we saw scanning utilities which created files containing unique style names applied to every paragraph: "P1", "P2", "P3", or "OmniPage#1", "OmniPage#2", etc. What we’re seeing is that when these documents are brought forward to Word 97 – which means they’ve undergone yet again, an internal binary file conversion – the result is a document which can behave oddly. These documents exhibit such symptoms as a Normal style will not stick – changing without reason – and numbering definitions which are troublesome to maintain. Indeed when these documents are compared (that is, converted to compare), the CompareRite application will crash.

A hybrid mutant file type is now evolving where newer scanning software produces either a binary Word 97 output format or an accelerated RTF file. In these documents, the software has shifted to using bookmarks in lieu of style placeholders. Bookmarks, in and of themselves, are no problem – nor, for that matter are the series of style definitions produced by these applications. What is the problem, however, is that should CompareRite have to sift through all this ‘stuff’ in order to reach the real heart of a meaty legal document, this level of compromise increases the application’s difficulties at getting there! We’ve seen CompareRite crash on bad style names (e.g., "no name"), and we’ve seen it crash on a list of bookmarks that were 996 entries long. In both of these situations, the user was not at fault: some other application had manufactured the file – and the user simply was not aware that this ‘stuff’ needed to be removed.

It goes without saying that scanned document content should be cleaned up – the more important consideration, however, would be to clean up beyond the content. SPECIAL ALERT: It is best to utilize scanned documents using the following procedure: FileNew, letting Word create the binary file header, then InsertFile to retrieve only the content of the scanned result. FileOpen invites too many opportunities for failure. Even this method does not preclude the need to perform thorough "document forensics" before cloning the result and editing it!

4. Thou Shalt Properly Paginate Any Document To Be Compared, OR Thou Shalt Make Sure the Document is Completely Saved Or Thou Shalt Make Sure ‘Keep With Next’ Isn’t Assigned on More Paragraphs Than It Should Be.
These situations run hand in hand. We’ve heard lots of folks advising comparisons of Word 97 documents, but only after the document has been saved back to a Word 6/95 format. While indeed, many situations can be alleviated by doing so, please know: upon performing such a conversion, the resulting file is not completely ‘reflowed’ – what we as users know of as ‘paginated’ – and CompareRite needs page numbers in order to do its stuff.

The other problem is that, often, our documents are not ‘completely’ saved. This means that the binary file which CompareRite is trying to operate on is primarily strewn with document content, followed by a series of pointers and complete user edits. Can you imagine how unwieldy – not to mention huge – a document can get? This saving technique is Word’s "Fast Save" technology, introduced way back in the days of saving documents to floppy disks. While most installations have this setting turned off upon migration to Word 97, and most of us know it is a ‘bad’ feature to implement, we don’t account for the fact that we inherit documents from clients, or from our home PCs, where this setting is turned on as an installation default (Tools/Options/Save)! And once a document has been Fast Saved, it requires intervention to force a complete save, meshing all the user edits with the remainder of the file.

Finally, any FileOpened converted document, by default, is filled with ‘bloat’ – occurring because the converter needs to reserve work space in which to perform its revisions to the file format and content. Typically, a user does not force a save on this document which their cursor positioned at the end of the file – they continue to work on the document, saving only at the points at which they’ve made edits. The file remains bloated, then, for all of eternity.

To force a complete save, any of the following methods would work:

Move to the end of the document, enter a change on the last line, delete it, and save at the final paragraph marker. This forces the document to ‘reflow’, issues a pagination pass, and a complete save.
If you have saved back to Word 6/7, open the result in this application, allowing the application to completely flow (paginate) the document, move your cursor to the last paragraph marker, make an edit, delete it, then save.

5. Thou Shalt Unlink All Field Codes, Turning Them to Text – and While You’re At It: Remove Any {TA}/{TOA} Field Codes from Thy Document Content.
The suggestion for unlinking of field codes is made for two primary reasons: a) to reduce the possibility that field codes in the document have been incorrectly created by the user: we’ve seen field codes embedded within field codes, and we’ve also seen the user erroneously edit them so that they are open-ended; and b) to alleviate issues associated with Word 97’s {LISTNUM} field code.

CompareRite is most sensitive to erroneous field codes because it uses the RTF file format as both its source and its destination formats. The RTF file format, largely an ASCII file format, uses the "{" and "}" characters which introduce such features as bold, or to define or apply styles. If a field code in the original document is un-paired, it can then cause the RTF conversion itself to go awry, since terminating "}" characters have become out-of-step. Most often what occurs in this situation is that you’ll end up with chunks of text missing from your result.

In the case of Word’s {LISTNUM} codes, what happens is that the RTF result delivers this display field, but, alas, the definition of the scheme (the very list the {LISTNUM} code is trying to display) is missing. What’s more, it’s incorrect because the outline numbering linked to styles in which has been sequenced have all been turned to text! It’s levels, then, become one continuum list, and the result is even more incorrect.

To unlink field codes in a document, work on SaveAs versions of the originals, [Ctrl-A] to SelectAll, then [Ctrl-Shift-F9], turning all field codes to text in the main body of the document. Use these versions for your comparison, leaving the original documents to maintain their dynamic formatting.

{TOA}/{TA} field codes in a document can cause the document to report incorrect page numbers in Full Authority, and can cause CompareRite to work harder, as well. It would be wise to delete all {TA} fields codes (if you’re using Full Authority, they’re needless!), and to unlink, or turn to text, the results of any generated TOA you want to preserve. To do so:

Making sure that field code results are displayed, locate the generated TOA, and select it. Press [Ctrl-Shift-F9]. This unlinks the dynamic field code and turns it to formatted text.
Now show field codes with [Alt-F9].
Edit/Find, looking for {^d TA} (some converted codes may or many not have the space so you may perform two searches).
Delete and/or replace all {TA} fields with nothing.

6. Thou Shalt Practice Safe Pasting From Other Documents And Windows Applications.

If you’re pasting text from word processing applications which perform formatting (WordPerfect for Windows, or previous versions of Word, for example), be mindful that the codes you carry into the target document can indeed impact it in a negative way.

Why does this happen? What many people do not realize is that the Clipboard – the location where copied or cut text is stored – actually performs a conversion on the data you’re pasting. To check it out: copy or cut something in Word 97, then do Edit/PasteSpecial. You should see that, by default, it navigates to "Formatted (RTF)". Edit/PasteSpecial/Formatted Text and a regular Edit/Paste, then are one and the same thing!

If you’re pasting from applications which contain formatting, consider using Edit/PasteSpecial as "Unformatted Text" – if you’re copying from a tortured Word 97 file, in fact, consider using Edit/PasteSpecial as Unformatted text. At a minimum, be aware that the codes and/or formatting you’re moving into your documents may be not be desired and should be averted or cleaned out.

7. Thou Shalt Not Permit Large Populations of Directly-Formatted Paragraph Attributes, Empty Paragraph Markers, or "Creepy" Section Breaks to Remain in Thy Documents.
Why? Word is a Styles-based word processor. It is designed to work efficiently by managing the pagination and scrolling routines based on the properties of the underlying style. If paragraph and font formats are applied primarily using direct formatting, Word has to work harder at managing these tasks. These sloppy habits, then, can add unnecessary overhead to working in Word 97. What’s more, this unnecessary formatting ‘bloats’ the saved version of the document – exactly the file CompareRite must convert and compare. Use of Styles, cleanly-formatted section breaks – and using only those section breaks required in the document – make the document more efficient, thus easier to convert, compare and edit.

8. Thou Shalt Be Respectful When Comparing Large Tables and Linked or Embedded "Stuff".
Indeed, CompareRite doesn’t compare large tabled documents well in Word 97 – for that matter, it never did in WordPerfect 5.1, either. You might consider comparisons on these types of document with the tables turned to text, or with table text marked to be ‘skipped’.

While our testing isn’t complete, we’re somewhat suspicious, too, of lots of ‘linked’ stuff: hyperlinks (which create hidden bookmarks), links to other file types – particularly graphics – or links to other documents. If your document contains these complex features, the RTF file may contain more binary data than ASCII, thus can be more problematic. We’ll update you as we conclude our testing.

9. Thou Shalt Be Mindful That Special Characters be Consistently Created, Especially in Footnotes.
Word 97 introduced an exciting new way of handling the myriad of special symbols and accented language characters required in word processing documents: Unicode. This emerging standard creates a grid of 65,536 characters, each with its own, "unique code". This enables applications to cease ‘sharing’ decimal values as we have in the past, in order to produce the different symbols we need to use. If you think about it, our keyboards can’t possibly contain them all, so they have to be handled somehow!

The problem is that we bring text into our documents from applications which pre-date the Unicode grid, and one of the formats that’s had a hard time keeping up is RTF itself. Remember that RTF is used by the Clipboard, thus pasting special symbols into a document may or may not produce the appropriate character in the document. (There’s also a whole lot of incorrectly mapped WordPerfect symbols lying around in our documents, too…) This problem will generally exhibit itself by displaying and/or printing a small, square graphic box: "o ", or by displaying the ‘correct’ symbol, yet when the character formatting is stripped off ([Ctrl-SpaceBar]), this special character becomes a square graphic box: "o "

We’ve seen a few CompareRite results which, when run against documents where these special symbols were never correct to begin with, produced pages of square graphic boxes where alphabetic text should have been! What we can do to minimize this kind of problem is to make sure that all characters in a document are true Unicode symbols, and not allow converted symbols to remain incorrect in our Word 97 documents. That, and remaining patient until all the kinks associated to this significant change – and this is indeed a significant change – are worked out. Already Lexis and Microsoft have resolved many symbol issues: early ones included the Copyright, Trademark, and open and closed quotes.

10. Thou Shalt Unburden All Documents of Active or Dormant Revision Tracking.
We can’t tell you how many Word documents – some of them coming all the way back from Word 6/7 lineage – cart around ‘hidden’ native Word revisions. What generally happens is that someone using the document turned on Revision Tracking, then turned off Revision tracking without accepting or rejecting the revisions they had made. What’s more, they’ve gone so far as suppressing the changes from view! In the life cycle of the document, no one notices they are there, and they’re carted around in the file – and all of its derivatives – for life. Make sure all Tracked Revisions are "Accepted". Not only does this help embarrassing moments with the client who discovers such buried treasure (!), but it helps CompareRite by eliminating a whole lot of superfluous junk it has to plow through in order to compare the text of the document.





Acknowledgements

Dedicated to:
Those Who Have Painfully Ferreted Out These Answers,
And To All Those Who must Troubleshoot, Defend, Minister,
uphold, or Recover From Violation of The Ten Commandments

Reported BY:
Sherry Kappel
Microsystems Engineering Company
sherryk@microsystems.com
February 12, 1999


With Affirmations From:
Amy Karnham, Lexis-Nexis (OH)
Doris Hibner, Rogers & Wells (NY)
Marc Harris, Winthrop Stimson (NY)
Bill Robertson, Softwise Consulting (NY)
Rolanda Carter, O’Melveny and Meyers (LA)
Fred Feldt, Haythe Curley (NY)
Judy Caputo, Kelley Drye (NY)
Suellen Miller, Sonnenschein (Chicago)
Ellen .Rosenstiel, Kutak Rock (Omaha)

and, last but not least:

Each of the 50 Million DocXchanged
Documents Created To Date

Copyright © 1999
Microsystems Engineering Company
All Rights Reserved

Microsoft, Word, Word97 and Windows95 are trademarks or registered trademarks of Microsoft Corporation. Word Perfect is a trademark or registered trademark of Corel Corporation. CompareRite is a trademark or registered trademark of Lexis-Nexis.

back to top