首页 > ������������������������������docx > ������������������������������docx,Working with Text — python

������������������������������docx,Working with Text — python

DOCX document file – The quintessential guide | CantoDOCX Viewer Online Aspose Words Free AppsDOCX Repair Kit an advanced online .docx repair tool DOCX to DOC Word.toDOCX to PDF: Free webbased DOCX to PDF ConverterPDF to DOCX Convert PDF to DOCX Online for Free Soda How to convert doc. to docx. in Word ExtendOfficePDF to DOCX online: Make PDFs editable by converting

Working with Text¶

To work effectively with text, it’s important to first understand a littleabout block-level elements like paragraphs and inline-level objects likeruns.

Block-level vs. inline text objects¶

The paragraph is the primary block-level object in Word.

A block-level item flows the text it contains between its left and rightedges, adding an additional line each time the text extends beyond its rightboundary. For a paragraph, the boundaries are generally the page margins, butthey can also be column boundaries if the page is laid out in columns, orcell boundaries if the paragraph occurs inside a table cell.

A table is also a block-level object.

An inline object is a portion of the content that occurs inside a block-levelitem. An example would be a word that appears in bold or a sentence inall-caps. The most common inline object is a run. All content withina block container is inside of an inline object. Typically, a paragraphcontains one or more runs, each of which contain some part of the paragraph’stext.

The attributes of a block-level item specify its placement on the page, suchitems as indentation and space before and after a paragraph. The attributesof an inline item generally specify the font in which the content appears,things like typeface, font size, bold, and italic.

Paragraph properties¶

A paragraph has a variety of properties that specify its placement within itscontainer (typically a page) and the way it divides its content into separatelines.

In general, it’s best to define a paragraph style collecting theseattributes into a meaningful group and apply the appropriate style to eachparagraph, rather than repeatedly apply those properties directly to eachparagraph. This is analogous to how Cascading Style Sheets (CSS) work withHTML. All the paragraph properties described here can be set using a style aswell as applied directly to a paragraph.

The formatting properties of a paragraph are accessed using theParagraphFormat object available using the paragraph’sparagraph_format property.

Horizontal alignment (justification)¶

Also known as justification, the horizontal alignment of a paragraph can beset to left, centered, right, or fully justified (aligned on both the leftand right sides) using values from the enumerationWD_PARAGRAPH_ALIGNMENT:

>>> from docx.enum.text import WD_ALIGN_PARAGRAPH>>> document = Document()>>> paragraph = document.add_paragraph()>>> paragraph_format = paragraph.paragraph_format>>> paragraph_format.alignmentNone# indicating alignment is inherited from the style hierarchy>>> paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER>>> paragraph_format.alignmentCENTER (1)Indentation¶

Indentation is the horizontal space between a paragraph and edge of itscontainer, typically the page margin. A paragraph can be indented separatelyon the left and right side. The first line can also have a differentindentation than the rest of the paragraph. A first line indented furtherthan the rest of the paragraph has first line indent. A first line indentedless has a hanging indent.

Indentation is specified using a Length value, such as Inches, Pt, orCm. Negative values are valid and cause the paragraph to overlap the marginby the specified amount. A value of None indicates the indentation value isinherited from the style hierarchy. Assigning None to an indentationproperty removes any directly-applied indentation setting and restoresinheritance from the style hierarchy:

>>> from docx.shared import Inches>>> paragraph = document.add_paragraph()>>> paragraph_format = paragraph.paragraph_format>>> paragraph_format.left_indentNone# indicating indentation is inherited from the style hierarchy>>> paragraph_format.left_indent = Inches(0.5)>>> paragraph_format.left_indent457200>>> paragraph_format.left_indent.inches0.5

Right-side indent works in a similar way:

>>> from docx.shared import Pt>>> paragraph_format.right_indentNone>>> paragraph_format.right_indent = Pt(24)>>> paragraph_format.right_indent304800>>> paragraph_format.right_indent.pt24.0

First-line indent is specified using thefirst_line_indent property and is interpretedrelative to the left indent. A negative value indicates a hanging indent:

>>> paragraph_format.first_line_indentNone>>> paragraph_format.first_line_indent = Inches(-0.25)>>> paragraph_format.first_line_indent-228600>>> paragraph_format.first_line_indent.inches-0.25Tab stops¶

A tab stop determines the rendering of a tab character in the text ofa paragraph. In particular, it specifies the position where the textfollowing the tab character will start, how it will be aligned to thatposition, and an optional leader character that will fill the horizontalspace spanned by the tab.

The tab stops for a paragraph or style are contained in a TabStops objectaccessed using the tab_stops property onParagraphFormat:

>>> tab_stops = paragraph_format.tab_stops>>> tab_stops

A new tab stop is added using the add_tab_stop() method:

>>> tab_stop = tab_stops.add_tab_stop(Inches(1.5))>>> tab_stop.position1371600>>> tab_stop.position.inches1.5

Alignment defaults to left, but may be specified by providing a member of theWD_TAB_ALIGNMENT enumeration. The leader character defaults to spaces,but may be specified by providing a member of the WD_TAB_LEADERenumeration:

>>> from docx.enum.text import WD_TAB_ALIGNMENT, WD_TAB_LEADER>>> tab_stop = tab_stops.add_tab_stop(Inches(1.5), WD_TAB_ALIGNMENT.RIGHT, WD_TAB_LEADER.DOTS)>>> print(tab_stop.alignment)RIGHT (2)>>> print(tab_stop.leader)DOTS (1)

Existing tab stops are accessed using sequence semantics on TabStops:

>>> tab_stops[0]

More details are available in the TabStops and TabStop API documentation

Paragraph spacing¶

The space_before andspace_after properties control the spacing betweensubsequent paragraphs, controlling the spacing before and after a paragraph,respectively. Inter-paragraph spacing is collapsed during page layout,meaning the spacing between two paragraphs is the maximum of thespace_after for the first paragraph and the space_before of the secondparagraph. Paragraph spacing is specified as a Length value, often usingPt:

>>> paragraph_format.space_before, paragraph_format.space_after(None, None)# inherited by default>>> paragraph_format.space_before = Pt(18)>>> paragraph_format.space_before.pt18.0>>> paragraph_format.space_after = Pt(12)>>> paragraph_format.space_after.pt12.0Line spacing¶

Line spacing is the distance between subsequent baselines in the lines ofa paragraph. Line spacing can be specified either as an absolute distance orrelative to the line height (essentially the point size of the font used).A typical absolute measure would be 18 points. A typical relative measurewould be double-spaced (2.0 line heights). The default line spacing issingle-spaced (1.0 line heights).

Line spacing is controlled by the interaction of theline_spacing andline_spacing_rule properties.line_spacing is either a Length value,a (small-ish) float, or None. A Length value indicates an absolutedistance. A float indicates a number of line heights. None indicates linespacing is inherited. line_spacing_rule is a memberof the WD_LINE_SPACING enumeration or None:

>>> from docx.shared import Length>>> paragraph_format.line_spacingNone>>> paragraph_format.line_spacing_ruleNone>>> paragraph_format.line_spacing = Pt(18)>>> isinstance(paragraph_format.line_spacing, Length)True>>> paragraph_format.line_spacing.pt18.0>>> paragraph_format.line_spacing_ruleEXACTLY (4)>>> paragraph_format.line_spacing = 1.75>>> paragraph_format.line_spacing1.75>>> paragraph_format.line_spacing_ruleMULTIPLE (5)Pagination properties¶

Four paragraph properties, keep_together,keep_with_next,page_break_before, andwidow_control control aspects of how the paragraphbehaves near page boundaries.

keep_together causes the entire paragraph to appearon the same page, issuing a page break before the paragraph if it wouldotherwise be broken across two pages.

keep_with_next keeps a paragraph on the same pageas the subsequent paragraph. This can be used, for example, to keep a sectionheading on the same page as the first paragraph of the section.

page_break_before causes a paragraph to be placedat the top of a new page. This could be used on a chapter heading to ensurechapters start on a new page.

widow_control breaks a page to avoid placing thefirst or last line of the paragraph on a separate page from the rest of theparagraph.

All four of these properties are tri-state, meaning they can take the valueTrue, False, or None. None indicates the property value is inheritedfrom the style hierarchy. True means “on” and False means “off”:

>>> paragraph_format.keep_togetherNone# all four inherit by default>>> paragraph_format.keep_with_next = True>>> paragraph_format.keep_with_nextTrue>>> paragraph_format.page_break_before = False>>> paragraph_format.page_break_beforeFalseApply character formatting¶

Character formatting is applied at the Run level. Examples include fonttypeface and size, bold, italic, and underline.

A Run object has a read-only font property providing accessto a Font object. A run’s Font object provides properties for gettingand setting the character formatting for that run.

Several examples are provided here. For a complete set of the availableproperties, see the Font API documentation.

The font for a run can be accessed like this:

>>> from docx import Document>>> document = Document()>>> run = document.add_paragraph().add_run()>>> font = run.font

Typeface and size are set like this:

>>> from docx.shared import Pt>>> font.name = 'Calibri'>>> font.size = Pt(12)

Many font properties are tri-state, meaning they can take the valuesTrue, False, and None. True means the property is “on”, False meansit is “off”. Conceptually, the None value means “inherit”. A run exists inthe style inheritance hierarchy and by default inherits its characterformatting from that hierarchy. Any character formatting directly appliedusing the Font object overrides the inherited values.

Bold and italic are tri-state properties, as are all-caps, strikethrough,superscript, and many others. See the Font API documentation for a fulllist:

>>> font.bold, font.italic(None, None)>>> font.italic = True>>> font.italicTrue>>> font.italic = False>>> font.italicFalse>>> font.italic = None>>> font.italicNone

Underline is a bit of a special case. It is a hybrid of a tri-state propertyand an enumerated value property. True means single underline, by far themost common. False means no underline, but more often None is the rightchoice if no underlining is wanted. The other forms of underlining, such asdouble or dashed, are specified with a member of the WD_UNDERLINEenumeration:

>>> font.underlineNone>>> font.underline = True>>> # or perhaps>>> font.underline = WD_UNDERLINE.DOT_DASHFont color¶

Each Font object has a ColorFormat object that provides access to itscolor, accessed via its read-only color property.

Apply a specific RGB color to a font:

>>> from docx.shared import RGBColor>>> font.color.rgb = RGBColor(0x42, 0x24, 0xE9)

A font can also be set to a theme color by assigning a member of theMSO_THEME_COLOR_INDEX enumeration:

>>> from docx.enum.dml import MSO_THEME_COLOR>>> font.color.theme_color = MSO_THEME_COLOR.ACCENT_1

A font’s color can be restored to its default (inherited) value by assigningNone to either the rgb ortheme_color attribute of ColorFormat:

>>> font.color.rgb = None

Determining the color of a font begins with determining its color type:

>>> font.color.typeRGB (1)

The value of the type property can be a member of theMSO_COLOR_TYPE enumeration or None. MSO_COLOR_TYPE.RGB indicates it isan RGB color. MSO_COLOR_TYPE.THEME indicates a theme color.MSO_COLOR_TYPE.AUTO indicates its value is determined automatically by theapplication, usually set to black. (This value is relatively rare.) Noneindicates no color is applied and the color is inherited from the stylehierarchy; this is the most common case.

When the color type is MSO_COLOR_TYPE.RGB, the rgbproperty will be an RGBColor value indicating the RGB color:

>>> font.color.rgbRGBColor(0x42, 0x24, 0xe9)

When the color type is MSO_COLOR_TYPE.THEME, thetheme_color property will be a member ofMSO_THEME_COLOR_INDEX indicating the theme color:

>>> font.color.theme_colorACCENT_1 (5)
免责声明:非注明原创的信息,皆为程序自动获取自互联网,目的在于传递更多信息,不代表本网赞同其观点和对其真实性负责;如此页面有侵犯到您的权益,请给网站管理员发送电子邮件,并提供相关证明(版权证明、身份证正反面、侵权链接),网站管理员将在收到邮件24小时内删除。