guile-email discussion
 help / color / mirror / Atom feed
* [bug] Character display problem in mumi and my mail client
       [not found] ` <87zhlbg4aw.fsf@elephly.net>
@ 2019-07-18 10:16   ` Arun Isaac
  0 siblings, 0 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-18 10:16 UTC (permalink / raw)
  To: guile-email


[-- Attachment #1.1: Type: message/rfc822, Size: 4765 bytes --]

[-- Attachment #1.1.1.1: Type: text/plain, Size: 1620 bytes --]

Hi Guix,

It appears that mumi (or at least the instance of it running on 
issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
noticed it with '’'. Compare

https://issues.guix.gnu.org/issue/36207

with

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

In the former, Ludo's name displays as 'Ludo¢' while it the latter, it 
displays correctly as 'Ludo’'.

However, in Ludo's reply the character is displayed correctly.

Does this indicate that my mail client (alpine on Debian Stretch) is doing 
the wrong thing? It seems that mumi could handle this situation better 
since it debbugs appears to handle it correctly.

Looking at the raw mail downloaded from debbugs, I see that I'm sending 
mail as with the following encoding:

```
Content-Type: text/plain; FORMAT=flowed; CHARSET=ISO-8859-7
Content-Transfer-Encoding: 8BIT
```

while Ludo's mail is:

```
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
```

Interestingly, when I download the mbox file for my mail from debbugs, and 
looking at it Emacs with my en_us.UTF-8 locale, the cents-symbol appears. 
Using iconv to convert the file from ISO-8859-7 to UTF8, causes the 
correct character to display. So what looks to be happening is that mumi 
is interpreting my messages using the wrong encoding.

Thoughts? Is this something we want to fix?

This reminds me of rjbs's talk, "Email Hates the Living". [0]

Best,
Jack

[0] http://yapcasia.org/2011/talk/59

P.S. How are we tracking issues and patches for mumi? bug-guix@ and 
guix-patches?

[-- Attachment #1.2: Type: message/rfc822, Size: 6190 bytes --]

From: Ricardo Wurmus <rekado@elephly.net>
To: Jack Hill <jackhill@jackhill.us>
Cc: guix-devel@gnu.org
Subject: Re: Character display problem in mumi and my mail client
Date: Thu, 18 Jul 2019 10:00:55 +0200
Message-ID: <87zhlbg4aw.fsf@elephly.net>


Hi Jack,

thanks for the report.

> It appears that mumi (or at least the instance of it running on
> issues.guix.gnu.org) has problems displaying some non-ASCII
> characters. I noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

The problem here is that the Debbugs SOAP service (which we use in
guile-debbugs and thus mumi) serves up a base64-encoded blob for the
email body without any information about encodings, so we just try UTF-8
and fall back to ISO 8859-1 if there’s an error.  If we could operate on
the actual email that would be different.

Unfortunately, the SOAP service does not provide access to the actual
emails.  That’s only available through … the official Debbugs web
interface.

I’m currently in the process of moving away from the SOAP service for
fetching message contents, because it’s just too painful.  There are too
many truncated or otherwise mangled messages, and in the end we are just
assembling them back to a good old email to parse them with Arun’s
guile-email.  Going forward mumi will only use the SOAP service to get
bug status information and pointers to messages.  The actual emails will
be fetched through the Debbugs web interface with much cursing.

> P.S. How are we tracking issues and patches for mumi? bug-guix@ and
> guix-patches?

Either of them would be fine.  Please add [mumi] to the subject line so
that it’s easier to distinguish them.

--
Ricardo



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [guile-email] Character display problem in mumi and my mail client
       [not found] <alpine.DEB.2.20.1907171636590.9756@marsh.hcoop.net>
       [not found] ` <87zhlbg4aw.fsf@elephly.net>
@ 2019-07-18 10:23 ` Arun Isaac
  2019-07-18 10:23   ` Arun Isaac
  2019-07-18 10:23   ` Arun Isaac
  2019-07-28  7:33 ` [guile-email] " Arun Isaac
  2 siblings, 2 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-18 10:23 UTC (permalink / raw)
  To: Jack Hill, guix-devel; +Cc: Ricardo Wurmus, guile-email


[-- Attachment #1.1: Type: text/plain, Size: 1746 bytes --]


Hi,

Thanks for the bug report!

> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207
>
> In the former, Ludo's name displays as 'Ludo¢' while it the latter, it 
> displays correctly as 'Ludo’'.
>
> However, in Ludo's reply the character is displayed correctly.
>
> Does this indicate that my mail client (alpine on Debian Stretch) is doing 
> the wrong thing? It seems that mumi could handle this situation better 
> since it debbugs appears to handle it correctly.
>
> Looking at the raw mail downloaded from debbugs, I see that I'm sending 
> mail as with the following encoding:
>
> ```
> Content-Type: text/plain; FORMAT=flowed; CHARSET=ISO-8859-7
> Content-Transfer-Encoding: 8BIT
> ```
>
> while Ludo's mail is:
>
> ```
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
> ```
>
> Interestingly, when I download the mbox file for my mail from debbugs, and 
> looking at it Emacs with my en_us.UTF-8 locale, the cents-symbol appears. 
> Using iconv to convert the file from ISO-8859-7 to UTF8, causes the 
> correct character to display. So what looks to be happening is that mumi 
> is interpreting my messages using the wrong encoding.
>
> Thoughts? Is this something we want to fix?

I think this is a bug in guile-email. On brief examination, I found that
guile-email is assuming charset of UTF-8 when the
Content-Transfer-Encoding is 8bit. This is incorrect behaviour. I will
fix this soon.

Regards,
Arun.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 110 bytes --]

-- 
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-18 10:23 ` [guile-email] " Arun Isaac
@ 2019-07-18 10:23   ` Arun Isaac
  2019-07-18 10:23   ` Arun Isaac
  1 sibling, 0 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-18 10:23 UTC (permalink / raw)
  To: Jack Hill, guix-devel; +Cc: guile-email, Ricardo Wurmus

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]


Hi,

Thanks for the bug report!

> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207
>
> In the former, Ludo's name displays as 'Ludo¢' while it the latter, it 
> displays correctly as 'Ludo’'.
>
> However, in Ludo's reply the character is displayed correctly.
>
> Does this indicate that my mail client (alpine on Debian Stretch) is doing 
> the wrong thing? It seems that mumi could handle this situation better 
> since it debbugs appears to handle it correctly.
>
> Looking at the raw mail downloaded from debbugs, I see that I'm sending 
> mail as with the following encoding:
>
> ```
> Content-Type: text/plain; FORMAT=flowed; CHARSET=ISO-8859-7
> Content-Transfer-Encoding: 8BIT
> ```
>
> while Ludo's mail is:
>
> ```
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
> ```
>
> Interestingly, when I download the mbox file for my mail from debbugs, and 
> looking at it Emacs with my en_us.UTF-8 locale, the cents-symbol appears. 
> Using iconv to convert the file from ISO-8859-7 to UTF8, causes the 
> correct character to display. So what looks to be happening is that mumi 
> is interpreting my messages using the wrong encoding.
>
> Thoughts? Is this something we want to fix?

I think this is a bug in guile-email. On brief examination, I found that
guile-email is assuming charset of UTF-8 when the
Content-Transfer-Encoding is 8bit. This is incorrect behaviour. I will
fix this soon.

Regards,
Arun.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-18 10:23 ` [guile-email] " Arun Isaac
  2019-07-18 10:23   ` Arun Isaac
@ 2019-07-18 10:23   ` Arun Isaac
  1 sibling, 0 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-18 10:23 UTC (permalink / raw)
  To: Jack Hill, guix-devel; +Cc: guile-email

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]


Hi,

Thanks for the bug report!

> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207
>
> In the former, Ludo's name displays as 'Ludo¢' while it the latter, it 
> displays correctly as 'Ludo’'.
>
> However, in Ludo's reply the character is displayed correctly.
>
> Does this indicate that my mail client (alpine on Debian Stretch) is doing 
> the wrong thing? It seems that mumi could handle this situation better 
> since it debbugs appears to handle it correctly.
>
> Looking at the raw mail downloaded from debbugs, I see that I'm sending 
> mail as with the following encoding:
>
> ```
> Content-Type: text/plain; FORMAT=flowed; CHARSET=ISO-8859-7
> Content-Transfer-Encoding: 8BIT
> ```
>
> while Ludo's mail is:
>
> ```
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
> ```
>
> Interestingly, when I download the mbox file for my mail from debbugs, and 
> looking at it Emacs with my en_us.UTF-8 locale, the cents-symbol appears. 
> Using iconv to convert the file from ISO-8859-7 to UTF8, causes the 
> correct character to display. So what looks to be happening is that mumi 
> is interpreting my messages using the wrong encoding.
>
> Thoughts? Is this something we want to fix?

I think this is a bug in guile-email. On brief examination, I found that
guile-email is assuming charset of UTF-8 when the
Content-Transfer-Encoding is 8bit. This is incorrect behaviour. I will
fix this soon.

Regards,
Arun.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [guile-email] Character display problem in mumi and my mail client
       [not found] <alpine.DEB.2.20.1907171636590.9756@marsh.hcoop.net>
       [not found] ` <87zhlbg4aw.fsf@elephly.net>
  2019-07-18 10:23 ` [guile-email] " Arun Isaac
@ 2019-07-28  7:33 ` Arun Isaac
  2019-07-28  7:33   ` Arun Isaac
                     ` (2 more replies)
  2 siblings, 3 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-28  7:33 UTC (permalink / raw)
  To: Jack Hill, Ricardo Wurmus; +Cc: guix-devel, guile-email


[-- Attachment #1.1: Type: text/plain, Size: 1631 bytes --]


> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

I have fixed this bug. See
https://git.systemreboot.net/guile-email/commit/?id=ac83c2a00c13702bc365cd0f3074239fa63d743f
and
https://git.systemreboot.net/guile-email/commit/?id=1f7c45fa0b71bd137e4b661e0d473c3eb9c98f48

guile-email's parse-email and parse-email-body functions now prefer to
operate on bytevectors, rather than on strings. Likewise, mbox->emails
now returns a list of bytevectors, not a list of strings.

I have updated the API documentation at
https://guile-email.systemreboot.net/manual/Parsing-e_002dmail.html#Parsing-e_002dmail

@Ricardo:

I think you are using a git checkout of guile-email for the mumi hosted
at issues.guix.info. Could you use the latest guile-email commit on
master (specifically, c85e6917ea21631857d93f58e60d910e07317131)? That
should fix this bug. No other changes are required in mumi. I will
release guile-email 0.2.0 in another week's time.

> This reminds me of rjbs's talk, "Email Hates the Living". [0]
>
> [0] http://yapcasia.org/2011/talk/59

Indeed, email drags along a lot of backward compatibility baggage from
the past. In the future, I'll add some of the pathological examples
mentioned in this talk as test cases.

Also, if you know of any corpus of email parser test cases, please let
me know. I will use them to expand guile-email's test suite.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 110 bytes --]

-- 
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-28  7:33 ` [guile-email] " Arun Isaac
@ 2019-07-28  7:33   ` Arun Isaac
  2019-07-28  7:33   ` Arun Isaac
  2019-07-28  8:36   ` Ricardo Wurmus
  2 siblings, 0 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-28  7:33 UTC (permalink / raw)
  To: Jack Hill, Ricardo Wurmus; +Cc: guile-email, guix-devel

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]


> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

I have fixed this bug. See
https://git.systemreboot.net/guile-email/commit/?id=ac83c2a00c13702bc365cd0f3074239fa63d743f
and
https://git.systemreboot.net/guile-email/commit/?id=1f7c45fa0b71bd137e4b661e0d473c3eb9c98f48

guile-email's parse-email and parse-email-body functions now prefer to
operate on bytevectors, rather than on strings. Likewise, mbox->emails
now returns a list of bytevectors, not a list of strings.

I have updated the API documentation at
https://guile-email.systemreboot.net/manual/Parsing-e_002dmail.html#Parsing-e_002dmail

@Ricardo:

I think you are using a git checkout of guile-email for the mumi hosted
at issues.guix.info. Could you use the latest guile-email commit on
master (specifically, c85e6917ea21631857d93f58e60d910e07317131)? That
should fix this bug. No other changes are required in mumi. I will
release guile-email 0.2.0 in another week's time.

> This reminds me of rjbs's talk, "Email Hates the Living". [0]
>
> [0] http://yapcasia.org/2011/talk/59

Indeed, email drags along a lot of backward compatibility baggage from
the past. In the future, I'll add some of the pathological examples
mentioned in this talk as test cases.

Also, if you know of any corpus of email parser test cases, please let
me know. I will use them to expand guile-email's test suite.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-28  7:33 ` [guile-email] " Arun Isaac
  2019-07-28  7:33   ` Arun Isaac
@ 2019-07-28  7:33   ` Arun Isaac
  2019-07-28  8:36   ` Ricardo Wurmus
  2 siblings, 0 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-28  7:33 UTC (permalink / raw)
  To: Jack Hill, Ricardo Wurmus; +Cc: guix-devel, guile-email

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]


> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

I have fixed this bug. See
https://git.systemreboot.net/guile-email/commit/?id=ac83c2a00c13702bc365cd0f3074239fa63d743f
and
https://git.systemreboot.net/guile-email/commit/?id=1f7c45fa0b71bd137e4b661e0d473c3eb9c98f48

guile-email's parse-email and parse-email-body functions now prefer to
operate on bytevectors, rather than on strings. Likewise, mbox->emails
now returns a list of bytevectors, not a list of strings.

I have updated the API documentation at
https://guile-email.systemreboot.net/manual/Parsing-e_002dmail.html#Parsing-e_002dmail

@Ricardo:

I think you are using a git checkout of guile-email for the mumi hosted
at issues.guix.info. Could you use the latest guile-email commit on
master (specifically, c85e6917ea21631857d93f58e60d910e07317131)? That
should fix this bug. No other changes are required in mumi. I will
release guile-email 0.2.0 in another week's time.

> This reminds me of rjbs's talk, "Email Hates the Living". [0]
>
> [0] http://yapcasia.org/2011/talk/59

Indeed, email drags along a lot of backward compatibility baggage from
the past. In the future, I'll add some of the pathological examples
mentioned in this talk as test cases.

Also, if you know of any corpus of email parser test cases, please let
me know. I will use them to expand guile-email's test suite.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-28  7:33 ` [guile-email] " Arun Isaac
  2019-07-28  7:33   ` Arun Isaac
  2019-07-28  7:33   ` Arun Isaac
@ 2019-07-28  8:36   ` Ricardo Wurmus
  2 siblings, 0 replies; 8+ messages in thread
From: Ricardo Wurmus @ 2019-07-28  8:36 UTC (permalink / raw)
  To: Arun Isaac; +Cc: Jack Hill, guile-email, guix-devel


Arun Isaac <arunisaac@systemreboot.net> writes:

> I have fixed this bug. See
> https://git.systemreboot.net/guile-email/commit/?id=ac83c2a00c13702bc365cd0f3074239fa63d743f
> and
> https://git.systemreboot.net/guile-email/commit/?id=1f7c45fa0b71bd137e4b661e0d473c3eb9c98f48
[…]
> @Ricardo:
>
> I think you are using a git checkout of guile-email for the mumi hosted
> at issues.guix.info. Could you use the latest guile-email commit on
> master (specifically, c85e6917ea21631857d93f58e60d910e07317131)? That
> should fix this bug. No other changes are required in mumi. I will
> release guile-email 0.2.0 in another week's time.

Thank you.  I’ve updated the variant of guile-email that’s used by mumi.

-- 
Ricardo


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-28  9:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.20.1907171636590.9756@marsh.hcoop.net>
     [not found] ` <87zhlbg4aw.fsf@elephly.net>
2019-07-18 10:16   ` [bug] Character display problem in mumi and my mail client Arun Isaac
2019-07-18 10:23 ` [guile-email] " Arun Isaac
2019-07-18 10:23   ` Arun Isaac
2019-07-18 10:23   ` Arun Isaac
2019-07-28  7:33 ` [guile-email] " Arun Isaac
2019-07-28  7:33   ` Arun Isaac
2019-07-28  7:33   ` Arun Isaac
2019-07-28  8:36   ` Ricardo Wurmus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox