Error message

  • Warning: Illegal string offset 'field' in DatabaseCondition->__clone() (line 1895 of /home/magiksys/sites/blog.magiksys.net/includes/database/query.inc).
  • Warning: Illegal string offset 'field' in DatabaseCondition->__clone() (line 1895 of /home/magiksys/sites/blog.magiksys.net/includes/database/query.inc).
  • Warning: Illegal string offset 'field' in DatabaseCondition->__clone() (line 1895 of /home/magiksys/sites/blog.magiksys.net/includes/database/query.inc).
  • Warning: Illegal string offset 'field' in DatabaseCondition->__clone() (line 1895 of /home/magiksys/sites/blog.magiksys.net/includes/database/query.inc).

generate and send mail with python: tutorial

The code and tips here, are all included in the new pyzmail library. More samples and tips can be found in the API documentation.

This article follows two other articles (1, 2) about how to parse emails in Python.
These articles describe very well mail usages and rules and can be helpful for non Python developer too.

The goal here is to generate a valid email including internationalized content, attachments and even inline images, and send it to a SMTP server.

The procedure can be achieved in 3 steps :

  • compose the body of the email.
  • compose the header of the email.
  • send the email to a SMTP server
At the end you will find the sources.

The first rule to keep in mind all the time, is that emails are 7bits only, they can contains us-ascii characters only ! A lot of RFCs define rules to encode non us-ascii characters when they are required: RFC2047 to encode headers, RFC2045 for encoding body using the MIME format or RFC2231 for MIME parameters like the attachment filename.

The mail header

The header is composed of lines having a name and a value. RFC2047 let you use quoted or binary encoding to encode non us-ascii characters in it.
For example:

Subject: Courrier électronique en Français
become using the quoted format
Subject: =?iso-8859-1?q?Courrier_=E8lectronique_en_Fran=E7ais?=
or using the binary one.
Subject: =?iso-8859-1?b?Q291cnJpZXIg6GxlY3Ryb25pcXVlIGVuIEZyYW7nYWlz?=

Here the quoted format is readable and shorter. Python email.header.Header object generates user friendly header by choosing the quoted format when the encoding shares characters with us-ascii, and the binary one for encoding like the Chinese big5 that requires the encoding of all characters.

>>> str(Header(u'Courrier \xe8lectronique en Fran\xe7ais', 'iso-8859-1'))
'=?iso-8859-1?q?Courrier_=E8lectronique_en_Fran=E7ais?='

Header values have different types (text, date, address, ...) that all require different encoding rules. For example, in an address like :

Sender: Alain Spineux <alain.spineux@gmail.com>

The email part <alain.spineux@gmail.com> has to be in us-ascii and cannot be encoded. The name part cannot contains some special characters without quoting : []\()<>@,:;".
We can easily understand why "<>" are in the list, others have all their own story.

For example, the use of the "Dr." prefix requires to quote the name because of the '.':

Sender: "Dr. Alain Spineux" <alain.spineux@gmail.com>

For a name with non us-ascii characters like (look at the "ï" in Alaïn), the name must be encoded.

Sender: Dr. Alïan Spineux <alain.spineux@gmail.com>
must be written :
Sender: =?iso-8859-1?q?Dr=2E_Ala=EFn_Spineux?=  <alain.spineux@gmail.com>

Notice that the '.' in the prefix is replaced by "=2E", because Header preventively encode all non alpha or digit characters to match the most restrictive header rules.

The Python function email.utils.formataddr() quotes the special characters but don't encode non us-ascii characters. On the other hand, email.header.Header can encode non us-ascii characters but ignore all specials rule about address encoding.
Let see how both work:

>>> email.Utils.formataddr(('Alain Spineux', 'alain.spineux@gmail.com'))
'Alain Spineux <alain.spineux@gmail.com>'
This is a valid header value for a To: field
>>> str(Header('Dr. Alain Spineux <alain.spineux@gmail.com>'))
'Dr. Alain Spineux <alain.spineux@gmail.com>'
Here the '.' should be escaped like these 13 characters: []\()<>@,:;".
>>> email.Utils.formataddr(('"Alain" Spineux', 'alain.spineux@gmail.com'))
'"\\"Alain\\" Spineux" <alain.spineux@gmail.com>'
Here '"' is escaped using '\', this is fine.
>>> email.Utils.formataddr((u'Ala\xefn Spineux', 'alain.spineux@gmail.com'))
u'Ala\xefn Spineux <alain.spineux@gmail.com>'
formataddr() don't handle non us-ascii string, this must be done by Header object
>>> str(Header(email.Utils.formataddr((u'Ala\xefn Spineux', 'alain.spineux@gmail.com'))))
'=?utf-8?q?Ala=C3=AFn_Spineux_=3Calain=2Espineux=40gmail=2Ecom=3E?='

This is not valid because the address is also encoded and an old or some recent MUA will't handle this. The good form here is :

=?utf-8?q?Ala=C3=AFn_Spineux?= <alain.spineux@gmail.com>'

Function format_addresses(addresses, header_name=None, charset=None) handle carefully the encoding of the addresses.

>>> str(format_addresses([ (u'Ala\xefn Spineux', 'alain.spineux@gmail.com'), ('John', 'John@smith.com'), ], 'to', 'iso-8859-1'))
'=?iso-8859-1?q?Ala=EFn_Spineux?= <alain.spineux@gmail.com> ,\n John <John@smith.com>'

Bytes and unicode string can be mixed. Addresses must always be us-ascii. Byte string must be encoded using charset or be us-ascii. Unicode strings that cannot be encoded using charset will be encoded using utf-8 automatically by Header.

For dates, use the email.utils.formatdate() this way.

>>> email.utils.formatdate(time.time(), localtime=True)
'Wed, 10 Aug 2011 16:46:30 +0200'

The mail body

Depending of your need :

  • text and/or html version of the message
  • related inline images
  • attached files

the structure of the MIME email may vary, but the general one is as follow:

multipart/mixed
 |
 +-- multipart/related
 |    |
 |    +-- multipart/alternative
 |    |    |
 |    |    +-- text/plain
 |    |    +-- text/html
 |    |     
 |    +-- image/gif
 |
 +-- application/msword         

Un-needed parts will be removed by function gen_mail(text, html=None, attachments=[], relateds=[]) regarding your parameters.

>>> print gen_mail(text=(u'Bonne journ\xe8e', 'iso-8859-1'), \ 
                   html=None, \
                   attachments=[ (u'Text attach\xe8'.encode('iso-8859-1'), 'text', 'plain', 'filename.txt', 'iso-8859-1'), ] )
From nobody Thu Aug 11 08:05:14 2011
Content-Type: multipart/mixed; boundary="===============0992984520=="
MIME-Version: 1.0

--===============0992984520==
Content-Type: text/plain; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Bonne journ=E8e
--===============0992984520==
Content-Type: text/plain; charset="iso-8859-1"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="filename.txt"

Text attach=E8
--===============0992984520==--

text and html are tuple of the form (content, encoding).
Items of the attachments list can be tuples of the form (content, maintype, subtype, filename, charset) or can be MIME object inheriting from MIMEBase. If maintype is 'text', charset must match the encoding of the content, or if content is unicode, charset will be used to encode it. For other value of maintype, charset is not used.
relateds is similar to attachments but content is related to the message in HTML to allow embedding of images or other contents. filename is replaced by content_id that must match the cid: references inside the HTML message.

Attachments can have non us-ascii filename, but this is badly supported by some MUA and then discouraged. Anyway, if you want to use non us-ascii filename, RFC2231 need the encoding but also the language. Replace filename, by a tuple of the form (encoding, language, filename), for example, use ('iso-8859-1', 'fr', u'r\xe9pertoir.png'.encode('iso-8859-1')) instead of 'filename.txt'.

The attached source code provide more samples. Look carefully in it how I encode or not the unicode string and content before to use them.

Send your email

The Python smtplib library handles SSL, STARTLS, and authentication, all you need to connect to any SMTP server. The library also reports all errors carefully.

The send_mail(smtp_host, sender, recipients, subject, msg, default_charset, cc=[], bcc=[], smtp_port=25, smtp_mode='normal', smtp_login=None, smtp_password=None, message_id_string=None) function first fill in the header about sender and recipients using format_addresses() function, and send the email to the SMTP using the protocols and credentials you have choose with smtp_* variables.
sender is a tuples and recipients, cc, bcc are list of tuple of the form [ (name, address), .. ] like expected by format_addresses().
default_charset will be used as default encoding when generating the email (to encode unicode string), and as default encoding for byte string.
subject is the subject of the email.
msg is a MIME object like returned by gen_mail().
smtp_mode can be 'normal', 'ssl' or 'tls'.

For example if you want to use your GMAIL account to send your emails, use this setup:

smtp_host='smtp.gmail.com'
smtp_port=587
smtp_mode='tls'
smtp_login='your.address@gmail.com'
smtp_password='yourpassword'
sender=('Your Name', smtp_login)

Most of the time you will just need to specify smtp_host

Source

#!/bin/env/python
# 
# sendmail.py
# (c) Alain Spineux <alain.spineux@gmail.com>
# http://blog.magiksys.net/sending-email-using-python-tutorial
# Released under GPL
#
import os, sys
import time
import base64
import smtplib
import email
import email.header
import email.utils
import email.mime
import email.mime.base
import email.mime.text
import email.mime.image
import email.mime.multipart

def format_addresses(addresses, header_name=None, charset=None):
    """This is an extension of email.utils.formataddr.
       Function expect a list of addresses [ ('name', 'name@domain'), ...].
       The len(header_name) is used to limit first line length.
       The function mix the use Header(), formataddr() and check for 'us-ascii'
       string to have valid and friendly 'address' header.
       If one 'name' is not unicode string, then it must encoded using 'charset', 
       Header will use 'charset' to decode it.
       Unicode string will be encoded following the "Header" rules : (
       try first using ascii, then 'charset', then 'uft8')
       'name@address' is supposed to be pure us-ascii, it can be unicode 
       string or not (but cannot contains non us-ascii) 
       
       In short Header() ignore syntax rules about 'address' field, 
       and formataddr() ignore encoding of non us-ascci chars.
    """
    header=email.header.Header(charset=charset, header_name=header_name)
    for i, (name, addr) in enumerate(addresses):
        if i!=0:
            # add separator between addresses
            header.append(',', charset='us-ascii') 
        # check if address name is a unicode or byte string in "pure" us-ascii 
        try:
            if isinstance(name, unicode):
                # convert name in byte string
                name=name.encode('us-ascii')
            else:
                # check id byte string contains only us-ascii chars
                name.decode('us-ascii')
        except UnicodeError:
            # Header will use "RFC2047" to encode the address name
            # if name is byte string, charset will be used to decode it first
            header.append(name)
            # here us-ascii must be used and not default 'charset'  
            header.append('<%s>' % (addr,), charset='us-ascii') 
        else:
            # name is a us-ascii byte string, i can use formataddr
            formated_addr=email.utils.formataddr((name, addr))
            # us-ascii must be used and not default 'charset'  
            header.append(formated_addr, charset='us-ascii') 
            
    return header


def gen_mail(text, html=None, attachments=[], relateds=[]):
    """generate the core of the email message.
    text=(encoded_content, encoding)
    html=(encoded_content, encoding)
    attachments=[(data, maintype, subtype, filename, charset), ..]
     if maintype is 'text', data lmust be encoded using charset
     filename can be us-ascii or (charset, lang, encoded_filename)
     where encoded_filename is encoded into charset, lang can be empty but 
     usually the language related to the charset. 
    relateds=[(data, maintype, subtype, content_id, charset), ..]
     idem attachment above, but content_id is related to the "CID" reference
     in the html version of the message.
    """

    main=text_part=html_part=None
    if text:
        content, charset=text
        main=text_part=email.mime.text.MIMEText(content, 'plain', charset)
    
    if html:
        content, charset=html
        main=html_part=email.mime.text.MIMEText(content, 'html', charset)
        
    if not text_part and not html_part:
        main=text_part=email.mime.text.MIMEText('', 'plain', 'us-ascii')
    elif text_part and html_part:
        # need to create a multipart/alternative to include text and html version
        main=email.mime.multipart.MIMEMultipart('alternative', None, [text_part, html_part])
    
    if relateds:
        related=email.mime.multipart.MIMEMultipart('related')
        related.attach(main)
        for part in relateds:
            if not isinstance(part, email.mime.base.MIMEBase):
                data, maintype, subtype, content_id, charset=part
                if (maintype=='text'):
                    part=email.mime.text.MIMEText(data, subtype, charset)
                else:
                    part=email.mime.base.MIMEBase(maintype, subtype)
                    part.set_payload(data)
                    email.Encoders.encode_base64(part)
                part.add_header('Content-ID', '<'+content_id+'>')
                part.add_header('Content-Disposition', 'inline')
            related.attach(part)
        main=related
            
    if attachments:
        mixed=email.mime.multipart.MIMEMultipart('mixed')
        mixed.attach(main)
        for part in attachments:
            if not isinstance(part, email.mime.base.MIMEBase):
                data, maintype, subtype, filename, charset=part
                if (maintype=='text'):
                    part=email.mime.text.MIMEText(data, subtype, charset)
                else:
                    part=email.mime.base.MIMEBase(maintype, subtype)
                    part.set_payload(data)
                    email.Encoders.encode_base64(part)
                part.add_header('Content-Disposition', 'attachment', filename=filename)
            mixed.attach(part)
        main=mixed
        
    return main

def send_mail(smtp_host, sender, recipients, subject, msg, default_charset, cc=[], bcc=[], smtp_port=25, smtp_mode='normal', smtp_login=None, smtp_password=None, message_id_string=None):
    """
    
    """

    mail_from=sender[1]
    rcpt_to=map(lambda x:x[1], recipients)
    rcpt_to.extend(map(lambda x:x[1], cc))
    rcpt_to.extend(map(lambda x:x[1], bcc))

    msg['From'] = format_addresses([ sender, ], header_name='from', charset=default_charset)
    msg['To'] = format_addresses(recipients, header_name='to', charset=default_charset)
    msg['Cc'] = format_addresses(cc, header_name='cc', charset=default_charset)
    msg['Subject'] = email.header.Header(subject, default_charset)
    utc_from_epoch=time.time()
    msg['Date'] = email.utils.formatdate(utc_from_epoch, localtime=True)
    msg['Messsage-Id'] =email.utils.make_msgid(message_id_string)

    # Send the message
    errmsg=''
    failed_addresses=[]
    try:
        if smtp_mode=='ssl':
            smtp=smtplib.SMTP_SSL(smtp_host, smtp_port)
        else:
            smtp=smtplib.SMTP(smtp_host, smtp_port)
            if smtp_mode=='tls':
                smtp.starttls()
                
        if smtp_login and smtp_password:
            # login and password must be encoded 
            # because HMAC used in CRAM_MD5 require non unicode string
            smtp.login(smtp_login.encode('utf-8'), smtp_password.encode('utf-8'))

        ret=smtp.sendmail(mail_from, rcpt_to, msg.as_string())
        smtp.quit()
    except (socket.error, ), e:
        errmsg='server %s:%s not responding: %s' % (smtp_host, smtp_port, e)
    except smtplib.SMTPAuthenticationError, e:
        errmsg='authentication error: %s' % (e, )
    except smtplib.SMTPRecipientsRefused, e:
        # code, errmsg=e.recipients[recipient_addr]
        errmsg='recipients refused: '+', '.join(e.recipients.keys())
    except smtplib.SMTPSenderRefused, e:
        # e.sender, e.smtp_code, e.smtp_error
        errmsg='sender refused: %s' % (e.sender, )
    except smtplib.SMTPDataError, e:
        errmsg='SMTP protocol mismatch: %s' % (e, )
    except smtplib.SMTPHeloError, e:
        errmsg="server didn't reply properly to the HELO greeting: %s" % (e, )
    except smtplib.SMTPException, e:
        errmsg='SMTP error: %s' % (e, )
    except Exception, e:
        errmsg=str(e)
    else:
        if ret:
            failed_addresses=ret.keys()
            errmsg='recipients refused: '+', '.join(failed_addresses)
                    
    return msg, errmsg, failed_addresses

And how to use it :

smtp_host='max'
smtp_port=25

if False:
    smtp_host='smtp.gmail.com'
    smtp_port='587'
    smtp_login='your.addresse@gmail.com'
    smtp_passwd='your.password'

sender=(u'Ala\xefn Spineux', 'alain.spineux@gmail.com')
sender=(u'Alain Spineux', u'alain.spineux@gmail.com')

root_addr='root@max.asxnet.loc'
recipients=[ ('Alain Spineux', root_addr),
#             (u'Alain Spineux', root_addr),
#             ('Dr. Alain Sp<i>neux', root_addr),
#             (u'Dr. Alain Sp<i>neux', root_addr),
#             (u'Dr. Ala\xefn Spineux', root_addr),
#             (u'Dr. Ala\xefn Spineux', root_addr),
#             ('us_ascii_name_with_a_space_some_where_in_the_middle to_allow_python_Header._split()_to_split_according_RFC2822', root_addr),
#             (u'This-is-a-very-long-unicode-name-with-one-non-ascii-char-\xf4-to-force-Header()-to-use-RFC2047-encoding-and-split-in-multi-line', root_addr),
#             ('this_line_is_too_long_and_dont_have_any_white_space_to_allow_Header._split()_to_split_according_RFC2822', root_addr),
#             ('Alain Spineux', root_addr),             
              ]

smile_png=base64.b64decode(
"""iVBORw0KGgoAAAANSUhEUgAAAA4AAAAOBAMAAADtZjDiAAAAMFBMVEUQEAhaUjlaWlp7e3uMezGU
hDGcnJy1lCnGvVretTnn5+/3pSn33mP355T39+//75SdwkyMAAAACXBIWXMAAA7EAAAOxAGVKw4b
AAAAB3RJTUUH2wcJDxEjgefAiQAAAAd0RVh0QXV0aG9yAKmuzEgAAAAMdEVYdERlc2NyaXB0aW9u
ABMJISMAAAAKdEVYdENvcHlyaWdodACsD8w6AAAADnRFWHRDcmVhdGlvbiB0aW1lADX3DwkAAAAJ
dEVYdFNvZnR3YXJlAF1w/zoAAAALdEVYdERpc2NsYWltZXIAt8C0jwAAAAh0RVh0V2FybmluZwDA
G+aHAAAAB3RFWHRTb3VyY2UA9f+D6wAAAAh0RVh0Q29tbWVudAD2zJa/AAAABnRFWHRUaXRsZQCo
7tInAAAAaElEQVR4nGNYsXv3zt27TzHcPup6XDBmDsOeBvYzLTynGfacuHfm/x8gfS7tbtobEM3w
n2E9kP5n9N/oPZA+//7PP5D8GSCYA6RPzjlzEkSfmTlz+xkgffbkzDlAuvsMWAHDmt0g0AUAmyNE
wLAIvcgAAAAASUVORK5CYII=
""")
angry_gif=base64.b64decode(
"""R0lGODlhDgAOALMAAAwMCYAAAACAAKaCIwAAgIAAgACAgPbTfoR/YP8AAAD/AAAA//rMUf8A/wD/
//Tw5CH5BAAAAAAALAAAAAAOAA4AgwwMCYAAAACAAKaCIwAAgIAAgACAgPbTfoR/YP8AAAD/AAAA
//rMUf8A/wD///Tw5AQ28B1Gqz3S6jop2sxnAYNGaghAHirQUZh6sEDGPQgy5/b9UI+eZkAkghhG
ZPLIbMKcDMwLhIkAADs=
""")
pingu_png=base64.b64decode(
"""iVBORw0KGgoAAAANSUhEUgAAABoAAAATBAMAAAB8awA1AAAAMFBMVEUQGCE5OUJKa3tSUlJSrdZj
xu9rWjl7e4SljDGlnHutnFK9vbXGxsbWrTHW1tbv7+88a/HUAAAACXBIWXMAAA7EAAAOxAGVKw4b
AAAAB3RJTUUH2wgJDw8mp5ycCAAAAAd0RVh0QXV0aG9yAKmuzEgAAAAMdEVYdERlc2NyaXB0aW9u
ABMJISMAAAAKdEVYdENvcHlyaWdodACsD8w6AAAADnRFWHRDcmVhdGlvbiB0aW1lADX3DwkAAAAJ
dEVYdFNvZnR3YXJlAF1w/zoAAAALdEVYdERpc2NsYWltZXIAt8C0jwAAAAh0RVh0V2FybmluZwDA
G+aHAAAAB3RFWHRTb3VyY2UA9f+D6wAAAAh0RVh0Q29tbWVudAD2zJa/AAAABnRFWHRUaXRsZQCo
7tInAAAA0klEQVR4nE2OsYrCUBBFJ42woOhuq43f4IfYmD5fsO2yWNhbCFZ21ovgFyQYG1FISCxt
Xi8k3KnFx8xOTON0Z86d4ZKKiNowM5QEUD6FU9uCSWwpYThrSQuj2fjLso2DqB9OBqqvJFiVllHa
usJedty1NFe2brRbs7ny5aIP8dSXukmyUABQ0CR9AU9d1IOO1EZgjg7n+XEBpOQP0E/rUz2Rlw09
Amte7bdVSs/s7py5i1vFRsnFuW+gdysSu4vzv9vm57faJuY0ywFhAFmupO/zDzDcxlhVE/gbAAAA
AElFTkSuQmCC
""")

text_utf8="""This is the the text part.
With a related picture: cid:smile.png
and related document: cid:related.txt

Bonne journ\xc3\xa9ee.
"""
utext=u"""This is the the text part.
With a related picture: cid:smile.png
and related document: cid:related.txt
Bonne journ\xe9e.
"""
data_text=u'Text en Fran\xe7ais'
related_text=u'Document relatif en Fran\xe7ais'

html="""<html><body>
This is the html part with a related picture: <img src="cid:smile.png" />
and related document: <a href="cid:related.txt">here</a><br>
Bonne journ&eacute;e.
</body></html>
"""
relateds=[ (smile_png, 'image', 'png', 'smile.png', None),  
           (related_text.encode('iso-8859-1'), 'text', 'plain', 'related.txt', 'iso-8859-1'),  
          ]

pingu_att=email.mime.image.MIMEImage(pingu_png, 'png')
pingu_att.add_header('Content-Disposition', 'attachment', filename=('iso-8859-1', 'fr', u'ping\xfc.png'.encode('iso-8859-1')))

pingu_att2=email.mime.image.MIMEImage(pingu_png, 'png')
pingu_att2.add_header('Content-Disposition', 'attachment', filename='pingu.png')

attachments=[ (angry_gif, 'image', 'gif', ('iso-8859-1', 'fr', u'\xe4ngry.gif'.encode('iso-8859-1')), None),
              (angry_gif, 'image', 'gif', 'angry.gif', None),
             
              (data_text.encode('iso-8859-1'), 'text', 'plain', 'document.txt', 'iso-8859-1'),  
              pingu_att,
              pingu_att2]

mail=gen_mail((utext.encode('iso-8859-1'), 'iso-8859-1'), (html, 'us-ascii'), attachments, relateds)

msg, errmsg, failed_addresses=send_mail(smtp_host, \
          sender, \
          recipients, \
          u'My Subject', \
          mail, \
          default_charset='iso-8859-1',
          cc=[('Gama', root_addr), ],
          bcc=[('Colombus', root_addr), ],
          smtp_port=smtp_port,
          )

print msg
print errmsg
print failed_addresses
    
Attachment: 

Add new comment