guile-email discussion
* [guile-email] parse-email-headers returns just “fields”
@ 2020-04-21 12:24 Ricardo Wurmus
  2020-04-23  1:26 ` Arun Isaac
From: Ricardo Wurmus @ 2020-04-21 12:24 UTC (permalink / raw)
  To: guile-email

I’m currently trying to parse the debbugs bug log files directly.  They
contain emails (and other information), so I cut out the email text and
feed it to parse-email.  In some cases the emails don’t seem to have a
content-transfer-encoding header, which leads to an error when trying to
decode the body.

So instead of using parse-email directly I use

   (string->bytevector content "utf-8"))

and run parse-email-headers over the first value, add a dummy
content-transfer-encoding header with value 'binary if it’s missing and
then parse the body.

Now I noticed two odd things:

* I sometimes need to discard the first two lines of the raw email to
  get the headers to be fully parsed
* in some cases the result of parse-email-headers is a literal “fields”,
  not an alist.

Attached is one of these emails.


From Sat Sep 13 09:41:10 2008
X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02
	(2007-08-08) on
X-Spam-Status: No, score=-6.7 required=4.0 tests=AWL,BAYES_00,
Received: (at submit) by; 13 Sep 2008 16:41:10 +0000
Received: from ( [])
	by (8.13.8/8.13.8/Debian-3) with ESMTP id m8DGf5S7011950
	for <>; Sat, 13 Sep 2008 09:41:07 -0700
Received: from ([]:57008
	by with esmtp (Exim 4.67)
	(envelope-from <>)
	id 1KeY9S-0000Yg-Bg
	for; Sat, 13 Sep 2008 12:39:14 -0400
Received: from Debian-exim by with spam-scanned (Exim 4.60)
	(envelope-from <>)
	id 1KeYBB-0004EU-3I
	for; Sat, 13 Sep 2008 12:41:04 -0400
Received: from ([]:17202)
	by with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60)
	(envelope-from <>)
	id 1KeYBA-0004Cs-4Y
	for; Sat, 13 Sep 2008 12:41:00 -0400
Received: from ( [])
	by (Switch-3.2.4/Switch-3.1.7) with ESMTP id m8DGeoWf024917
	for <>; Sat, 13 Sep 2008 11:40:50 -0500
Received: from ( [])
	by (Switch-3.2.0/Switch-3.2.0) with ESMTP id m8DGensx018593
	for <>; Sat, 13 Sep 2008 10:40:50 -0600
Received: from dradamslap1 (/
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Sat, 13 Sep 2008 09:40:49 -0700
From: "Drew Adams" <>
To: <>
Subject: 23.0.60; incorrect code for filesets-get-filelist
Date: Sat, 13 Sep 2008 09:40:59 -0700
Message-ID: <002901c915bf$811df210$>
MIME-Version: 1.0
Content-Type: text/plain;
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 11
Thread-Index: AckVv4Ci1hEH5okgSQ2EUXtBuurWgg==
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350
X-Brightmail-Tracker: AAAAAQAAAAI=
X-Brightmail-Tracker: AAAAAQAAAAI=
X-Whitelist: TRUE
X-Whitelist: TRUE
X-detected-operating-system: by GNU/Linux 2.4-2.6

The part that treats a :tree of the code defining
`filesets-get-filelist' is not correct and never could have been
correct. And it does not correspond to the (correct) code from the
filesets author.  One wonders if the GNU Emacs code was ever tested.
This is the `case' clause that treats :tree in the definition
of `filesets-get-filelist':
 (let ((dir  (nth 0 entry))
       (patt (nth 1 entry)))
   (filesets-directory-files dir patt ':files t)))
But `entry' here is a complete fileset, which is of the form
("my-fs" (:tree "/some/directory" "^.+\.suffix$"))
The above code thus tries to use "my-fs" as the directory, whereas it
should use "/some/directory".
This is the (correct) code in the latest version from the author
( (The comment is
from the author.)
 ;;well, the way trees are handled is a mess +++
 (let* ((dirpatt (if (consp (nth 1 entry))
                     (filesets-entry-get-tree entry)
        (dir     (nth 0 dirpatt))
        (patt    (nth 1 dirpatt)))
   (filesets-list-dir dir patt ':files t)))
However, I think the following would be sufficient:
 (let* ((dirpatt (filesets-entry-get-tree entry))
        (dir  (nth 0 dirpatt))
        (patt (nth 1 dirpatt)))
   (filesets-directory-files dir patt ':files t)))
I don't see why the author's more complex treatment would ever be
needed, since in order for the :tree clause of the `case' to be
reached (consp (nth 1 entry)) must be a cons, AFAICT.
At any rate, either the author's code or what I suggest immediately
above is needed. There is no way that the current GNU Emacs code can
work with a :tree fileset.

In GNU Emacs (i386-mingw-nt5.1.2600)
 of 2008-09-03 on LENNART-69DE564
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (3.4) --no-opt --cflags -Ic:/g/include

2020-04-21 12:24 [guile-email] parse-email-headers returns just "fields" Ricardo Wurmus
2020-04-23  1:26 ` Arun Isaac
2020-04-23  1:26 ` Arun Isaac
2020-04-23  1:26   ` Arun Isaac
2020-04-23  6:35   ` Ricardo Wurmus
2020-04-23 11:31     ` Arun Isaac
2020-04-23 11:31       ` Arun Isaac
2020-04-23 14:40       ` Ricardo Wurmus
2020-04-23 21:54         ` Arun Isaac
2020-04-23 21:54           ` Arun Isaac

