* [guile-email] slow parse of email with huge attachment
@ 2020-05-17 22:06 Ricardo Wurmus
2020-05-18 1:13 ` Arun Isaac
0 siblings, 1 reply; 10+ messages in thread
From: Ricardo Wurmus @ 2020-05-17 22:06 UTC (permalink / raw)
To: guile-email
Hi,
bug 35518 contains an email with a 11+MB attachment. Viewing this in
the classic Debbugs web interface takes around 7 seconds:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518
On mumi this times out. I benchmarked just the parsing step for all
emails in that bug log and I see around 0.1s used by my pre-processing
step (splitting the log at control characters) and around 8 seconds for
parse-email.
With all the slow, unoptimized post processing that I’m doing to render
things we hit around 14 seconds for this issue.
I wonder if parse-email could possibly be made even faster.
--
Ricardo
--
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [guile-email] slow parse of email with huge attachment
2020-05-17 22:06 [guile-email] slow parse of email with huge attachment Ricardo Wurmus
@ 2020-05-18 1:13 ` Arun Isaac
2020-05-18 1:13 ` Arun Isaac
2020-05-18 20:50 ` Ricardo Wurmus
0 siblings, 2 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-18 1:13 UTC (permalink / raw)
To: Ricardo Wurmus, guile-email
[-- Attachment #1.1: Type: text/plain, Size: 414 bytes --]
Hi,
Yes, guile-email is quite slow. I think it is because it does multiple
passes of each email trying to split MIME entities and what not. For a
long time, I've been meaning to rewrite guile-email to only do a single
pass. It's quite a bit of work, but I'll try to finish it in a week.
Meanwhile, if you have any benchmarking scripts, do share those. It
would help me get a head start on the problem.
Thanks!
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
[-- Attachment #2: Type: text/plain, Size: 110 bytes --]
--
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [guile-email] slow parse of email with huge attachment
2020-05-18 1:13 ` Arun Isaac
@ 2020-05-18 1:13 ` Arun Isaac
2020-05-18 20:50 ` Ricardo Wurmus
1 sibling, 0 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-18 1:13 UTC (permalink / raw)
To: Ricardo Wurmus, guile-email
[-- Attachment #1: Type: text/plain, Size: 414 bytes --]
Hi,
Yes, guile-email is quite slow. I think it is because it does multiple
passes of each email trying to split MIME entities and what not. For a
long time, I've been meaning to rewrite guile-email to only do a single
pass. It's quite a bit of work, but I'll try to finish it in a week.
Meanwhile, if you have any benchmarking scripts, do share those. It
would help me get a head start on the problem.
Thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [guile-email] slow parse of email with huge attachment
2020-05-18 1:13 ` Arun Isaac
2020-05-18 1:13 ` Arun Isaac
@ 2020-05-18 20:50 ` Ricardo Wurmus
2020-05-19 2:33 ` Arun Isaac
2020-05-25 14:52 ` Arun Isaac
1 sibling, 2 replies; 10+ messages in thread
From: Ricardo Wurmus @ 2020-05-18 20:50 UTC (permalink / raw)
To: Arun Isaac; +Cc: guile-email
Hi Arun,
> Yes, guile-email is quite slow. I think it is because it does multiple
> passes of each email trying to split MIME entities and what not. For a
> long time, I've been meaning to rewrite guile-email to only do a single
> pass. It's quite a bit of work, but I'll try to finish it in a week.
Wow, I appreciate your work!
> Meanwhile, if you have any benchmarking scripts, do share those. It
> would help me get a head start on the problem.
All I did was wrapping the parse-email call for the extracted email from
that Debbugs log file in statprof to see what took the most time. Since
that one email is pretty big it took a few seconds to complete.
You can get the mbox here:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes
--
Ricardo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [guile-email] slow parse of email with huge attachment
2020-05-18 20:50 ` Ricardo Wurmus
@ 2020-05-19 2:33 ` Arun Isaac
2020-05-19 2:33 ` Arun Isaac
2020-05-25 14:52 ` Arun Isaac
1 sibling, 1 reply; 10+ messages in thread
From: Arun Isaac @ 2020-05-19 2:33 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guile-email
[-- Attachment #1: Type: text/plain, Size: 330 bytes --]
> All I did was wrapping the parse-email call for the extracted email from
> that Debbugs log file in statprof to see what took the most time. Since
> that one email is pretty big it took a few seconds to complete.
>
> You can get the mbox here:
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35518;mbox=yes
Ok, thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [guile-email] slow parse of email with huge attachment
2020-05-18 20:50 ` Ricardo Wurmus
2020-05-19 2:33 ` Arun Isaac
@ 2020-05-25 14:52 ` Arun Isaac
2020-05-25 14:52 ` Arun Isaac
2020-05-25 15:08 ` Arun Isaac
1 sibling, 2 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-25 14:52 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guile-email
[-- Attachment #1.1: Type: text/plain, Size: 968 bytes --]
Hi,
I've made some improvements. I got the following snippet down from
around 16s to 6s.
--8<---------------cut here---------------start------------->8---
(statprof (lambda ()
(map parse-email
(call-with-input-file "large-base64-attachment.mbox"
mbox->emails))))
--8<---------------cut here---------------end--------------->8---
Specifically, I made the following improvements to guile-email.
- I rewrote the base64 decoder from scratch to be a little
faster. Earlier, I was using the decoder I had copied from Guix.
- I eliminated several unneccessary bytevector<->string conversions.
- I rewrote read-bytes-till in (email utils) to process multiple bytes
at a time, instead of byte by byte.
There is still scope for improvement, but do test and let me know if
this serves your purpose for now. Also, I am curious to see your new
benchmark of parse-email to compare with the 8s you reported earlier.
Cheers!
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
[-- Attachment #2: Type: text/plain, Size: 110 bytes --]
--
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [guile-email] slow parse of email with huge attachment
2020-05-25 14:52 ` Arun Isaac
@ 2020-05-25 14:52 ` Arun Isaac
2020-05-25 15:08 ` Arun Isaac
1 sibling, 0 replies; 10+ messages in thread
From: Arun Isaac @ 2020-05-25 14:52 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guile-email
[-- Attachment #1: Type: text/plain, Size: 968 bytes --]
Hi,
I've made some improvements. I got the following snippet down from
around 16s to 6s.
--8<---------------cut here---------------start------------->8---
(statprof (lambda ()
(map parse-email
(call-with-input-file "large-base64-attachment.mbox"
mbox->emails))))
--8<---------------cut here---------------end--------------->8---
Specifically, I made the following improvements to guile-email.
- I rewrote the base64 decoder from scratch to be a little
faster. Earlier, I was using the decoder I had copied from Guix.
- I eliminated several unneccessary bytevector<->string conversions.
- I rewrote read-bytes-till in (email utils) to process multiple bytes
at a time, instead of byte by byte.
There is still scope for improvement, but do test and let me know if
this serves your purpose for now. Also, I am curious to see your new
benchmark of parse-email to compare with the 8s you reported earlier.
Cheers!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [guile-email] slow parse of email with huge attachment
2020-05-25 14:52 ` Arun Isaac
2020-05-25 14:52 ` Arun Isaac
@ 2020-05-25 15:08 ` Arun Isaac
2020-05-25 15:08 ` Arun Isaac
1 sibling, 1 reply; 10+ messages in thread
From: Arun Isaac @ 2020-05-25 15:08 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guile-email
[-- Attachment #1.1: Type: text/plain, Size: 427 bytes --]
> --8<---------------cut here---------------start------------->8---
> (statprof (lambda ()
> (map parse-email
> (call-with-input-file "large-base64-attachment.mbox"
> mbox->emails))))
> --8<---------------cut here---------------end--------------->8---
I forgot to mention. The "large-base64-attachment.mbox" in this snippet
is the mbox of bug 35518 that you mentioned earlier.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
[-- Attachment #2: Type: text/plain, Size: 110 bytes --]
--
guile-email mailing list
guile-email@systemreboot.net
https://lists.systemreboot.net/listinfo/guile-email
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-05-25 18:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-17 22:06 [guile-email] slow parse of email with huge attachment Ricardo Wurmus
2020-05-18 1:13 ` Arun Isaac
2020-05-18 1:13 ` Arun Isaac
2020-05-18 20:50 ` Ricardo Wurmus
2020-05-19 2:33 ` Arun Isaac
2020-05-19 2:33 ` Arun Isaac
2020-05-25 14:52 ` Arun Isaac
2020-05-25 14:52 ` Arun Isaac
2020-05-25 15:08 ` Arun Isaac
2020-05-25 15:08 ` Arun Isaac
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox