Language:
switch to room list switch to menu My folders
Go to page: First ... 21 22 23 24 [25] 26 27 28 29 ... Last
[#] Sat Oct 24 2020 08:40:11 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Commenting in-place.

Fri Oct 23 2020 17:32:34 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 I added your feed.  The subject is present.. it's just Mojibake.

I am not following. Are you saying that on your system you have the Subject: headers? That would be pretty wild. Are ALL Subject headers present or just some?

Similar to what is happening to my From and Return-Path headers... although mysteriously, my list/header problem has cleared up for now.

I am not aware of your From and Return-Path headers problem. What was it?

I think we've got Gremlin.

Wooo. That's bad news.

The thing is I originally did a fresh Citadel install (on a new node running Ubuntu 20.04) using easyinstall procedure. So, where did this problem come from in the first place and how come it wasn't showing up when I did ctdlmigrate from 8.24 to 9.29 even though this should not have been done.

This kinda "rings the bell" in terms of my theory regarding some configuration parameter. Btw, I wanted to ask about the place where configuration parms are stored. Is it in the database or some config files? Is it possible that some config parameter is not available via GUI, but can be configured by editing some file or doing a call to citserver?

This is getting bewildering!

see below:

Return-Path: rss
Date: Fri, 23 Oct 2020 20:26:18 -0000
Subject: Спортсмены салютуют Великой Победе. Праздничное шоу состоялось в "Мегаспорте"
Message-ID: 
From: "rss" 
Content-type: text/html

Again, are ALL Subject headers present in your case?

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation. 


 



 



 



[#] Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking. 

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation. 


 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking. 

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation. 


 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking. 

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation. 


 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 12:58:08 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.


 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 12:59:56 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

http://tass.ru/rss/v2.xml says it's sending utf-8, but also uses a non-standard ISO language string.. ru-ru .. it should probably be ru_RU or just ru .. this might be confusing the XML Parser, see rss_parse_feed() in serv_rssclient.c 296. see below try

wget -q -S -O - http://tass.ru/rss/v2.xml
  HTTP/1.1 200 OK
  Server: nginx/1.19.0
  Date: Sun, 25 Oct 2020 15:50:26 GMT
  Content-Type: application/rss+xml, application/rdf+xml;q=0.8, application/atom+xml;q=0.6, application/xml;q=0.4, text/xml;q=0.4; charset=utf-8
  Transfer-Encoding: chunked
  Connection: keep-alive
  Set-Cookie: tass_uuid=4B9117AF-A95A-42C5-B5A5-ECDF06C230FB; Path=/; Expires=Mon, 25-Oct-21 15:50:26 GMT
  X-Frame-Options: SAMEORIGIN
  X-XSS-Protection: 1; mode=block
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  X-Frame-Options: SAMEORIGIN
  X-Content-Type-Options: nosniff
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="https://www.w3.org/2005/Atom"><channel><title>ТАСС</title>
<description>ИНФОРМАЦИОННОЕ АГЕНТСТВО РОССИИ ТАСС</description>
<language>ru-ru</language>
<link><![CDATA[https://tass.ru]]></link>

We should investigate XML_SetCharacterDataHandler() .. because it really does look like the utf-8 title is being converted back into KOI-8, which is the "mojibake" .. then it's probably stripped out somewhere else (but only apparantly in the title/subject).

This is pretty much what we're looking at. You might try to log the entire xml body in rss_pull_one_feed() at about 366.. then log it again after it is parsed, and find out what is actually going on.

Sun Oct 25 2020 11:04:26 AM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking. 

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation. 


 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 13:05:38 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

 

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.


 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

 

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.


 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 14:08:23 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 14:27:23 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 15:49:24 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 16:34:17 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Still trying some tests.. I've got "Network run frequency" set down to 10 minutes.. (as low as I can go safely.. ) and still trying to make sense of it. 

Problem does not affect Tass English. https://tass.ru/en/rss/v2.xml at all..

Feed Validator had a few complaints about the russian feed ..  https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ftass.ru%2Frss%2Fv2.xml

There may be some other issues in the XML. 

I am trying some other Russian language feeds to see if can get anything to parse.. do you know of ANY RUSSIAN LANGUAGE FEED that currently PARSE (successfully) in CITADEL?

It MAY be a problem (or, something requiring further configuration) in libexpat - the XML Parser..

Hopefully, if we can figure out the issue, you can recreate the feed with a wrapper script..

 

Sun Oct 25 2020 02:29:17 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 16:35:43 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

I think you can take a break for now platonov.. although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро


It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 17:14:25 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

 

Sun Oct 25 2020 16:34:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Still trying some tests.. I've got "Network run frequency" set down to 10 minutes.. (as low as I can go safely.. ) and still trying to make sense of it. 

Problem does not affect Tass English. https://tass.ru/en/rss/v2.xml at all..

Feed Validator had a few complaints about the russian feed ..  https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ftass.ru%2Frss%2Fv2.xml

This issue exists on ANY RSS feed, not only russian one. Take for example this one: http://feeds.feedburner.com/blacklistednews/hKxa and look at shortened or absent Subject headers and click [headers] article menu choice. That one has few very visible damaged article Subjects.

There may be some other issues in the XML. 

I am trying some other Russian language feeds to see if can get anything to parse.. do you know of ANY RUSSIAN LANGUAGE FEED that currently PARSE (successfully) in CITADEL?

It MAY be a problem (or, something requiring further configuration) in libexpat - the XML Parser..

Well, I'd still like to know how come the Content-type headers in citadel's version of articles is set to Content-type: text/html and not to Content-Type: text/html; charset=UTF-8. What could be the reason for not specifying the charset? Basically, as far as I can see, the UTF-8 is pretty much default for everything nowadays. For ASCII characters it is a single byte, which is fast and simple string operations.

Hopefully, if we can figure out the issue, you can recreate the feed with a wrapper script..

Get yourself cscope, which is available in Linux and Windows. Should take a few minutes to build on Linux. It needs libncurses5, which you can install on Ubuntu:

sudo apt-get install libncurses5-dev libncursesw5-dev

Here's Cscope Tutorial: https://courses.cs.washington.edu/courses/cse451/12sp/tutorials/tutorial_cscope.html

 

Sun Oct 25 2020 02:29:17 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Mon Oct 26 2020 09:05:29 EDT from HenkK @ Uncensored

Subject: IMAP Access via IOS

[Reply] [ReplyQuoted] [Headers] [Print]

Hallo,

I have Citadel running on a RPI 4.

I am not an IT specialist and used the "Easy Install script".

The RPI does receive mails destined for user@mydomain.com, I can access them via Webcit on my local network.

I was also able to pick up the messages via my MacBook Air (Catalina) and my iPhone and iPad outside my local network.

I used port 143 without SSL.

I never could get IMAP via SSL to work via port 993.

 

From one day to another IMAP via iPhone and iPad stopped working. My MacBook Air still works OK.(IMAP and SMTP).

Deleting the IMAP account on iPhone and iPad and setting up again gives me the error "The IMAP server mydomain.com doesn't support Password authentication "

Trying other authentications methods like MD5 and NTLM do not work either.

 

My question what can cause this or how can I found out?

Why did it change and why is it still working on my MacBook?

 

The log files I found under /usr/local/citadel/data do not help.

 

My second question how to set up SSL so it will work from my Apple devices

 

I am grateful for any suggestion, unfortunately I did not find a solution here or elsewhere in the internet.

Thanks, Henk



[#] Mon Oct 26 2020 09:10:35 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Just wanted to let you know that <p>VTimes</p> is in the XML source.

Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

It is in the <author> field in XML source. So, it isn't really "injected", it was there to begin with.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
Go to page: First ... 21 22 23 24 [25] 26 27 28 29 ... Last