guile-email discussion
 help / color / Atom feed
* [guile-email] slow parse of email with huge attachment
@ 2020-05-17 22:06 Ricardo Wurmus
  2020-05-18  1:13 ` Arun Isaac
  0 siblings, 1 reply; 10+ messages in thread
From: Ricardo Wurmus @ 2020-05-17 22:06 UTC (permalink / raw)
  To: guile-email

Hi,

bug 35518 contains an email with a 11+MB attachment.  Viewing this in
the classic Debbugs web interface takes around 7 seconds:

    https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518

On mumi this times out.  I benchmarked just the parsing step for all
emails in that bug log and I see around 0.1s used by my pre-processing
step (splitting the log at control characters) and around 8 seconds for
parse-email.

With all the slow, unoptimized post processing that I’m doing to render
things we hit around 14 seconds for this issue.

I wonder if parse-email could possibly be made even faster.

-- 
Ricardo

-- 
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-17 22:06 [guile-email] slow parse of email with huge attachment Ricardo Wurmus
@ 2020-05-18  1:13 ` Arun Isaac
  2020-05-18  1:13   ` Arun Isaac
  2020-05-18 20:50   ` Ricardo Wurmus
  0 siblings, 2 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-18  1:13 UTC (permalink / raw)
  To: Ricardo Wurmus, guile-email

[-- Attachment #1.1: Type: text/plain, Size: 414 bytes --]


Hi,

Yes, guile-email is quite slow. I think it is because it does multiple
passes of each email trying to split MIME entities and what not. For a
long time, I've been meaning to rewrite guile-email to only do a single
pass. It's quite a bit of work, but I'll try to finish it in a week.

Meanwhile, if you have any benchmarking scripts, do share those. It
would help me get a head start on the problem.

Thanks!

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 110 bytes --]

-- 
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-18  1:13 ` Arun Isaac
@ 2020-05-18  1:13   ` Arun Isaac
  2020-05-18 20:50   ` Ricardo Wurmus
  1 sibling, 0 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-18  1:13 UTC (permalink / raw)
  To: Ricardo Wurmus, guile-email


[-- Attachment #1: Type: text/plain, Size: 414 bytes --]


Hi,

Yes, guile-email is quite slow. I think it is because it does multiple
passes of each email trying to split MIME entities and what not. For a
long time, I've been meaning to rewrite guile-email to only do a single
pass. It's quite a bit of work, but I'll try to finish it in a week.

Meanwhile, if you have any benchmarking scripts, do share those. It
would help me get a head start on the problem.

Thanks!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-18  1:13 ` Arun Isaac
  2020-05-18  1:13   ` Arun Isaac
@ 2020-05-18 20:50   ` Ricardo Wurmus
  2020-05-19  2:33     ` Arun Isaac
  2020-05-25 14:52     ` Arun Isaac
  1 sibling, 2 replies; 10+ messages in thread
From: Ricardo Wurmus @ 2020-05-18 20:50 UTC (permalink / raw)
  To: Arun Isaac; +Cc: guile-email


Hi Arun,

> Yes, guile-email is quite slow. I think it is because it does multiple
> passes of each email trying to split MIME entities and what not. For a
> long time, I've been meaning to rewrite guile-email to only do a single
> pass. It's quite a bit of work, but I'll try to finish it in a week.

Wow, I appreciate your work!

> Meanwhile, if you have any benchmarking scripts, do share those. It
> would help me get a head start on the problem.

All I did was wrapping the parse-email call for the extracted email from
that Debbugs log file in statprof to see what took the most time.  Since
that one email is pretty big it took a few seconds to complete.

You can get the mbox here:

    https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes

-- 
Ricardo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-18 20:50   ` Ricardo Wurmus
@ 2020-05-19  2:33     ` Arun Isaac
  2020-05-19  2:33       ` Arun Isaac
  2020-05-25 14:52     ` Arun Isaac
  1 sibling, 1 reply; 10+ messages in thread
From: Arun Isaac @ 2020-05-19  2:33 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guile-email


[-- Attachment #1: Type: text/plain, Size: 330 bytes --]


> All I did was wrapping the parse-email call for the extracted email from
> that Debbugs log file in statprof to see what took the most time.  Since
> that one email is pretty big it took a few seconds to complete.
>
> You can get the mbox here:
>
>     https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes

Ok, thanks!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-19  2:33     ` Arun Isaac
@ 2020-05-19  2:33       ` Arun Isaac
  0 siblings, 0 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-19  2:33 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guile-email

[-- Attachment #1.1: Type: text/plain, Size: 330 bytes --]


> All I did was wrapping the parse-email call for the extracted email from
> that Debbugs log file in statprof to see what took the most time.  Since
> that one email is pretty big it took a few seconds to complete.
>
> You can get the mbox here:
>
>     https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes

Ok, thanks!

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 110 bytes --]

-- 
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-18 20:50   ` Ricardo Wurmus
  2020-05-19  2:33     ` Arun Isaac
@ 2020-05-25 14:52     ` Arun Isaac
  2020-05-25 14:52       ` Arun Isaac
  2020-05-25 15:08       ` Arun Isaac
  1 sibling, 2 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-25 14:52 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guile-email

[-- Attachment #1.1: Type: text/plain, Size: 968 bytes --]


Hi,

I've made some improvements. I got the following snippet down from
around 16s to 6s.

--8<---------------cut here---------------start------------->8---
(statprof (lambda ()
            (map parse-email
                 (call-with-input-file "large-base64-attachment.mbox"
                   mbox->emails))))
--8<---------------cut here---------------end--------------->8---

Specifically, I made the following improvements to guile-email.

- I rewrote the base64 decoder from scratch to be a little
  faster. Earlier, I was using the decoder I had copied from Guix.
- I eliminated several unneccessary bytevector<->string conversions.
- I rewrote read-bytes-till in (email utils) to process multiple bytes
  at a time, instead of byte by byte.

There is still scope for improvement, but do test and let me know if
this serves your purpose for now. Also, I am curious to see your new
benchmark of parse-email to compare with the 8s you reported earlier.

Cheers!

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 110 bytes --]

-- 
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-25 14:52     ` Arun Isaac
@ 2020-05-25 14:52       ` Arun Isaac
  2020-05-25 15:08       ` Arun Isaac
  1 sibling, 0 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-25 14:52 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guile-email


[-- Attachment #1: Type: text/plain, Size: 968 bytes --]


Hi,

I've made some improvements. I got the following snippet down from
around 16s to 6s.

--8<---------------cut here---------------start------------->8---
(statprof (lambda ()
            (map parse-email
                 (call-with-input-file "large-base64-attachment.mbox"
                   mbox->emails))))
--8<---------------cut here---------------end--------------->8---

Specifically, I made the following improvements to guile-email.

- I rewrote the base64 decoder from scratch to be a little
  faster. Earlier, I was using the decoder I had copied from Guix.
- I eliminated several unneccessary bytevector<->string conversions.
- I rewrote read-bytes-till in (email utils) to process multiple bytes
  at a time, instead of byte by byte.

There is still scope for improvement, but do test and let me know if
this serves your purpose for now. Also, I am curious to see your new
benchmark of parse-email to compare with the 8s you reported earlier.

Cheers!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-25 14:52     ` Arun Isaac
  2020-05-25 14:52       ` Arun Isaac
@ 2020-05-25 15:08       ` Arun Isaac
  2020-05-25 15:08         ` Arun Isaac
  1 sibling, 1 reply; 10+ messages in thread
From: Arun Isaac @ 2020-05-25 15:08 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guile-email

[-- Attachment #1.1: Type: text/plain, Size: 427 bytes --]


> --8<---------------cut here---------------start------------->8---
> (statprof (lambda ()
>             (map parse-email
>                  (call-with-input-file "large-base64-attachment.mbox"
>                    mbox->emails))))
> --8<---------------cut here---------------end--------------->8---

I forgot to mention. The "large-base64-attachment.mbox" in this snippet
is the mbox of bug 35518 that you mentioned earlier.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 110 bytes --]

-- 
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [guile-email] slow parse of email with huge attachment
  2020-05-25 15:08       ` Arun Isaac
@ 2020-05-25 15:08         ` Arun Isaac
  0 siblings, 0 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-25 15:08 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guile-email


[-- Attachment #1: Type: text/plain, Size: 427 bytes --]


> --8<---------------cut here---------------start------------->8---
> (statprof (lambda ()
>             (map parse-email
>                  (call-with-input-file "large-base64-attachment.mbox"
>                    mbox->emails))))
> --8<---------------cut here---------------end--------------->8---

I forgot to mention. The "large-base64-attachment.mbox" in this snippet
is the mbox of bug 35518 that you mentioned earlier.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-17 22:06 [guile-email] slow parse of email with huge attachment Ricardo Wurmus
2020-05-18  1:13 ` Arun Isaac
2020-05-18  1:13   ` Arun Isaac
2020-05-18 20:50   ` Ricardo Wurmus
2020-05-19  2:33     ` Arun Isaac
2020-05-19  2:33       ` Arun Isaac
2020-05-25 14:52     ` Arun Isaac
2020-05-25 14:52       ` Arun Isaac
2020-05-25 15:08       ` Arun Isaac
2020-05-25 15:08         ` Arun Isaac

guile-email discussion

Archives are clonable:
	git clone --mirror http://lists.systemreboot.net/guile-email/0 guile-email/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 guile-email guile-email/ http://lists.systemreboot.net/guile-email \
		guile-email@systemreboot.net
	public-inbox-index guile-email

Example config snippet for mirrors


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git