From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.systemreboot.net (mugam.systemreboot.net [139.59.75.54]) by localhost (mpop-1.4.7) with POP3 for ; Fri, 24 Apr 2020 04:22:07 +0530 Return-path: Envelope-to: arunisaac@systemreboot.net Delivery-date: Fri, 24 Apr 2020 03:24:50 +0530 Received: from localhost.localdomain ([127.0.0.1] helo=[192.168.2.12]) by systemreboot.net with esmtp (Exim 4.93) (envelope-from ) id 1jRjo1-0004KD-Tg for arunisaac@systemreboot.net; Fri, 24 Apr 2020 03:24:49 +0530 Received: from [192.168.2.1] (helo=steel) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1jRjny-0004K5-TA; Fri, 24 Apr 2020 03:24:46 +0530 From: Arun Isaac To: Ricardo Wurmus In-Reply-To: <874ktal0zb.fsf@elephly.net> References: <87k129kowf.fsf@elephly.net> <875zdqlnfs.fsf@elephly.net> <874ktal0zb.fsf@elephly.net> Date: Fri, 24 Apr 2020 03:24:36 +0530 Message-ID: MIME-Version: 1.0 Subject: Re: [guile-email] =?utf-8?q?parse-email-headers_returns_just_?= =?utf-8?b?4oCcZmllbGRz4oCd?= X-BeenThere: guile-email@systemreboot.net X-Mailman-Version: 2.1.29 Precedence: list List-Id: guile-email discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guile-email@systemreboot.net Content-Type: multipart/mixed; boundary="===============0221153968428739058==" Errors-To: guile-email-bounces@systemreboot.net --===============0221153968428739058== Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Yeah, I figured as much, but I shied away from reading the file in > bytevector chunks that would then need to be searched for control > characters to split the parts of the log file. I=E2=80=99ll probably do = that > later, but for the first pass I just decided to use read-line. I understand. Binary read is a little painful. Perhaps you could use the read-bytes-till function in email/utils.scm of guile-email. It is kinda internal to guile-email. So, if you're using it, you should probably copy it into your source tree. > The mbox begins at the first ^G and ends at the next ^C. Ah, I see the problem. This is actually a bug on debbugs' part. The mbox/email starting at line 194 is invalid. It is neither a valid email nor a valid mbox. For it to be a valid mbox, the "From ..." line (currently at line 195) should be the first line. It should not occur in between the email headers as it does now. For it to be a valid email, the "From ..." line should not occur at all. I guess the only workaround is to find and delete the "From ..." line. Here's one possible way to do it. =2D-8<---------------cut here---------------start------------->8--- (use-modules (email utils)) (parse-email (call-with-input-file "/path/to/40755.log" (lambda (port) (read-bytes-till port (make-bytevector 1 #x07)) (get-line port) (get-line port) (let ((possible-from-line (get-line port))) (unless (string-prefix? "From " possible-from-line) (unget-string port possible-from-line)) (read-bytes-till port (make-bytevector 1 #x03)))))) =2D-8<---------------cut here---------------end--------------->8--- --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl6iDpwACgkQLiXui2GA K7OMtggAnAyuvmMMNk8/cmmH3J4zhUaQu6/lu3+88yh6thAIHNXwhOu9C/S5PJrH X+gF11tLAOlsIYSiB6f9zia0aPkCSN8CIvt5ekVOh/g7dSnaiC+1AcO0CzspyBWz ONobfkwLNWZ9EhsrMvE29LMkbNSpjdhbW+WycfjgfP5RBTbyf/BHRHG08g+NutiE P+qsetFR4csRbXUXSRv5ykg+uWNa5RUuTjwxPbVkC7fIfDGkqJ90arIgyd7EpGDH 7cEUlmYktn89p0aCXlRttnCApKAaHHjAOBZ29/01VSl6nmG3J44Y2Fm9Hai8IXPo w6ayjuaBIez1MXICdoYAiXhq5B2/hw== =Uh65 -----END PGP SIGNATURE----- --=-=-=-- --===============0221153968428739058== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline -- guile-email mailing list guile-email@systemreboot.net https://lists.systemreboot.net/listinfo/guile-email --===============0221153968428739058==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arun Isaac To: Ricardo Wurmus Cc: guile-email@systemreboot.net Subject: Re: [guile-email] parse-email-headers returns just =?utf-8?B?4oCc?= =?utf-8?B?ZmllbGRz4oCd?= In-Reply-To: <874ktal0zb.fsf@elephly.net> References: <87k129kowf.fsf@elephly.net> <875zdqlnfs.fsf@elephly.net> <874ktal0zb.fsf@elephly.net> Date: Fri, 24 Apr 2020 03:24:36 +0530 Message-ID: MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" List-Id: Message-ID: <20200423215436.rNr6ZqEWEK2Ew4x-EfZN2SPz_TQLJSrTwrF2K-J5biY@z> --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Yeah, I figured as much, but I shied away from reading the file in > bytevector chunks that would then need to be searched for control > characters to split the parts of the log file. I=E2=80=99ll probably do = that > later, but for the first pass I just decided to use read-line. I understand. Binary read is a little painful. Perhaps you could use the read-bytes-till function in email/utils.scm of guile-email. It is kinda internal to guile-email. So, if you're using it, you should probably copy it into your source tree. > The mbox begins at the first ^G and ends at the next ^C. Ah, I see the problem. This is actually a bug on debbugs' part. The mbox/email starting at line 194 is invalid. It is neither a valid email nor a valid mbox. For it to be a valid mbox, the "From ..." line (currently at line 195) should be the first line. It should not occur in between the email headers as it does now. For it to be a valid email, the "From ..." line should not occur at all. I guess the only workaround is to find and delete the "From ..." line. Here's one possible way to do it. =2D-8<---------------cut here---------------start------------->8--- (use-modules (email utils)) (parse-email (call-with-input-file "/path/to/40755.log" (lambda (port) (read-bytes-till port (make-bytevector 1 #x07)) (get-line port) (get-line port) (let ((possible-from-line (get-line port))) (unless (string-prefix? "From " possible-from-line) (unget-string port possible-from-line)) (read-bytes-till port (make-bytevector 1 #x03)))))) =2D-8<---------------cut here---------------end--------------->8--- --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl6iDqcACgkQLiXui2GA K7O/rgf/Xin+Tkuuhw6Vh5zIKc4zVFctJ/UKBnhoT0DJnq1WuhcbziAiEMD2OSlJ 9mMiBmG7uH3x02dDSVz1N9Nn1OlbBDtpj1jATWQJdG2SrrVxzctYcQFBdwxoygcl rgquWorH8MJHaFNhLIlK90aYju7oje3iKEY7qIZTQSCWA8wqS5Qjkq46GzQO+52q zUiXKTr1gkcaHqWsVL6ZJCpnURU649slDlOsGeD+J4WPudmwvan8d/l+BfwfYTx0 TVWzC0DCgu6OK5MCvWZxvMtzKoG1G2idvKvoW7zBYTEnhxBjHIGZsz+22RSUwi1t VhY1h1Ti1vDfje82RPJdJh5POaP/+g== =qOMT -----END PGP SIGNATURE----- --=-=-=--