Language:
switch to room list switch to menu My folders
Go to page: First ... 21 22 23 24 [25] 26 27 28 29 ... Last
[#] Sun Oct 25 2020 16:35:43 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

I think you can take a break for now platonov.. although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро


It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 17:14:25 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

 

Sun Oct 25 2020 16:34:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Still trying some tests.. I've got "Network run frequency" set down to 10 minutes.. (as low as I can go safely.. ) and still trying to make sense of it. 

Problem does not affect Tass English. https://tass.ru/en/rss/v2.xml at all..

Feed Validator had a few complaints about the russian feed ..  https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ftass.ru%2Frss%2Fv2.xml

This issue exists on ANY RSS feed, not only russian one. Take for example this one: http://feeds.feedburner.com/blacklistednews/hKxa and look at shortened or absent Subject headers and click [headers] article menu choice. That one has few very visible damaged article Subjects.

There may be some other issues in the XML. 

I am trying some other Russian language feeds to see if can get anything to parse.. do you know of ANY RUSSIAN LANGUAGE FEED that currently PARSE (successfully) in CITADEL?

It MAY be a problem (or, something requiring further configuration) in libexpat - the XML Parser..

Well, I'd still like to know how come the Content-type headers in citadel's version of articles is set to Content-type: text/html and not to Content-Type: text/html; charset=UTF-8. What could be the reason for not specifying the charset? Basically, as far as I can see, the UTF-8 is pretty much default for everything nowadays. For ASCII characters it is a single byte, which is fast and simple string operations.

Hopefully, if we can figure out the issue, you can recreate the feed with a wrapper script..

Get yourself cscope, which is available in Linux and Windows. Should take a few minutes to build on Linux. It needs libncurses5, which you can install on Ubuntu:

sudo apt-get install libncurses5-dev libncursesw5-dev

Here's Cscope Tutorial: https://courses.cs.washington.edu/courses/cse451/12sp/tutorials/tutorial_cscope.html

 

Sun Oct 25 2020 02:29:17 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Mon Oct 26 2020 09:05:29 EDT from HenkK @ Uncensored

Subject: IMAP Access via IOS

[Reply] [ReplyQuoted] [Headers] [Print]

Hallo,

I have Citadel running on a RPI 4.

I am not an IT specialist and used the "Easy Install script".

The RPI does receive mails destined for user@mydomain.com, I can access them via Webcit on my local network.

I was also able to pick up the messages via my MacBook Air (Catalina) and my iPhone and iPad outside my local network.

I used port 143 without SSL.

I never could get IMAP via SSL to work via port 993.

 

From one day to another IMAP via iPhone and iPad stopped working. My MacBook Air still works OK.(IMAP and SMTP).

Deleting the IMAP account on iPhone and iPad and setting up again gives me the error "The IMAP server mydomain.com doesn't support Password authentication "

Trying other authentications methods like MD5 and NTLM do not work either.

 

My question what can cause this or how can I found out?

Why did it change and why is it still working on my MacBook?

 

The log files I found under /usr/local/citadel/data do not help.

 

My second question how to set up SSL so it will work from my Apple devices

 

I am grateful for any suggestion, unfortunately I did not find a solution here or elsewhere in the internet.

Thanks, Henk



[#] Mon Oct 26 2020 09:10:35 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Just wanted to let you know that <p>VTimes</p> is in the XML source.

Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

It is in the <author> field in XML source. So, it isn't really "injected", it was there to begin with.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Mon Oct 26 2020 18:06:17 EDT from warbaby @ Uncensored

Subject: Re: IMAP Access via IOS

[Reply] [ReplyQuoted] [Headers] [Print]

The authentication method is 'plain' or 'plain text'.  The username should not be sent with the domain part .. it should be just the bare username (not user@host) and the password as plain text being sent for authentication.  This is the same for both IMAP and SMTP.

You'll have to determine if your client is able to do that. 

There are a variety of android mail email clients I'm familiar with, not so much the iPhone stuff.. but I'm sure there are a few out there that will do it.

 

Mon Oct 26 2020 09:05:29 AM EDT from HenkK @ Uncensored Subject: IMAP Access via IOS

Hallo,

I have Citadel running on a RPI 4.

I am not an IT specialist and used the "Easy Install script".

The RPI does receive mails destined for user@mydomain.com, I can access them via Webcit on my local network.

I was also able to pick up the messages via my MacBook Air (Catalina) and my iPhone and iPad outside my local network.

I used port 143 without SSL.

I never could get IMAP via SSL to work via port 993.

 

From one day to another IMAP via iPhone and iPad stopped working. My MacBook Air still works OK.(IMAP and SMTP).

Deleting the IMAP account on iPhone and iPad and setting up again gives me the error "The IMAP server mydomain.com doesn't support Password authentication "

Trying other authentications methods like MD5 and NTLM do not work either.

 

My question what can cause this or how can I found out?

Why did it change and why is it still working on my MacBook?

 

The log files I found under /usr/local/citadel/data do not help.

 

My second question how to set up SSL so it will work from my Apple devices

 

I am grateful for any suggestion, unfortunately I did not find a solution here or elsewhere in the internet.

Thanks, Henk



 



[#] Tue Oct 27 2020 04:34:28 EDT from HenkK @ Uncensored

Subject: Re: IMAP Access via IOS

[Reply] [ReplyQuoted] [Headers] [Print]

Thanks a lot for the tip.

Although I have a slight different result then you say.

I noticed that in my Mac the user name was set as user@mydomain.com

In my IOS devices however only "user".

I changed it and now it works again.

I have no idea if this is special to Apple or that my provider does some filtering.

Again thanks, Now I will see if I can get SSL also to work. Maybe that has something to do with the self signed certificate.

 

Mon Oct 26 2020 18:06:17 EDT from warbaby @ Uncensored Subject: Re: IMAP Access via IOS

The authentication method is 'plain' or 'plain text'.  The username should not be sent with the domain part .. it should be just the bare username (not user@host) and the password as plain text being sent for authentication.  This is the same for both IMAP and SMTP.

You'll have to determine if your client is able to do that. 

There are a variety of android mail email clients I'm familiar with, not so much the iPhone stuff.. but I'm sure there are a few out there that will do it.

 

Mon Oct 26 2020 09:05:29 AM EDT from HenkK @ Uncensored Subject: IMAP Access via IOS

Hallo,

I have Citadel running on a RPI 4.

I am not an IT specialist and used the "Easy Install script".

The RPI does receive mails destined for user@mydomain.com, I can access them via Webcit on my local network.

I was also able to pick up the messages via my MacBook Air (Catalina) and my iPhone and iPad outside my local network.

I used port 143 without SSL.

I never could get IMAP via SSL to work via port 993.

 

From one day to another IMAP via iPhone and iPad stopped working. My MacBook Air still works OK.(IMAP and SMTP).

Deleting the IMAP account on iPhone and iPad and setting up again gives me the error "The IMAP server mydomain.com doesn't support Password authentication "

Trying other authentications methods like MD5 and NTLM do not work either.

 

My question what can cause this or how can I found out?

Why did it change and why is it still working on my MacBook?

 

The log files I found under /usr/local/citadel/data do not help.

 

My second question how to set up SSL so it will work from my Apple devices

 

I am grateful for any suggestion, unfortunately I did not find a solution here or elsewhere in the internet.

Thanks, Henk



 



 



[#] Wed Oct 28 2020 02:27:20 EDT from HenkK @ Uncensored

Subject: Re: IMAP Access via IOS

[Reply] [ReplyQuoted] [Headers] [Print]

Thanks a lot for the tip.

Although I have a slight different result then you say.

I noticed that in my Mac the user name was set as user@mydomain.com

In my IOS devices however only "user".

I changed it and now it works again.

I have no idea if this is special to Apple or that my provider does some filtering.

Again thanks, Now I will see if I can get SSL also to work. Maybe that has something to do with the self signed certificate.

 

Mon Oct 26 2020 18:06:17 EDT from warbaby @ Uncensored Subject: Re: IMAP Access via IOS

The authentication method is 'plain' or 'plain text'.  The username should not be sent with the domain part .. it should be just the bare username (not user@host) and the password as plain text being sent for authentication.  This is the same for both IMAP and SMTP.

You'll have to determine if your client is able to do that. 

There are a variety of android mail email clients I'm familiar with, not so much the iPhone stuff.. but I'm sure there are a few out there that will do it.

 

Mon Oct 26 2020 09:05:29 AM EDT from HenkK @ Uncensored Subject: IMAP Access via IOS

Hallo,

I have Citadel running on a RPI 4.

I am not an IT specialist and used the "Easy Install script".

The RPI does receive mails destined for user@mydomain.com, I can access them via Webcit on my local network.

I was also able to pick up the messages via my MacBook Air (Catalina) and my iPhone and iPad outside my local network.

I used port 143 without SSL.

I never could get IMAP via SSL to work via port 993.

 

From one day to another IMAP via iPhone and iPad stopped working. My MacBook Air still works OK.(IMAP and SMTP).

Deleting the IMAP account on iPhone and iPad and setting up again gives me the error "The IMAP server mydomain.com doesn't support Password authentication "

Trying other authentications methods like MD5 and NTLM do not work either.

 

My question what can cause this or how can I found out?

Why did it change and why is it still working on my MacBook?

 

The log files I found under /usr/local/citadel/data do not help.

 

My second question how to set up SSL so it will work from my Apple devices

 

I am grateful for any suggestion, unfortunately I did not find a solution here or elsewhere in the internet.

Thanks, Henk



 



 



[#] Wed Oct 28 2020 14:03:07 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

The Subject: header is shown correctly while viewing the articles using textclient. You can view the same article from webcit and text client and see the difference.

Mon Oct 26 2020 09:10:35 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Just wanted to let you know that <p>VTimes</p> is in the XML source.

Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

It is in the <author> field in XML source. So, it isn't really "injected", it was there to begin with.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Wed Oct 28 2020 15:23:52 EDT from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

I'd like to ask if this subject header issue is being handled as an active issue, or it is being placed in the "backburner" or labeled as "non-issue"? In case it is being handled, here's some more info.

The message below with the Subject header correctly displayed was created with text client as a followup to an existing article and, even though the Subject header would not show up in webcit, it got "resurrected" when replying to the article.

The message header is a copy of [headers] article menu choice.
The Subject in the header below, even though it looks garbled, is actually displayed correctly as can be seen in the article during the article view (below).

Here is a copy of a complete article, preceeded by the article headers.
Notice the =?UTF-8 at the start of the Subject: header. That is the correct way of specifying the charset as far as I recall.

Return-Path: user
Date: Wed, 28 Oct 2020 20:22:30 +0200
References: 
Subject: =?UTF-8?B?UmU6INCi0LDQutGC0LDRgNC+0LI6INGD0LzQtdC90LjRjyDQk9GN0YLQttC4INCyINCx0L7RgNGM0LHQtSDQvtGH0LXQvdGMINC/0LXRgNC10L7RhtC10L3QuNC70LgsINC90L4g0Y3RgtC+INC90LUg0YPQvNCw0LvRj9C10YIg0LfQsNGB0LvRg9CzINCd0YPRgNC80LDQs9C+0LzQtdC00L7QstCw?=
Message-ID: 
From: "user" 

And this is the article itself. So, the Subject looks good indeed.

Wed Oct 28 2020 20:22:30 EET from adminski Subject: Re: Тактаров: умения Гэтжи в борьбе очень переоценили, но это не умаляет заслуг Нурмагомедова

Trying to type in Russian would garble the characters.
 
Wed Oct 28 2020 14:03:07 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

The Subject: header is shown correctly while viewing the articles using textclient. You can view the same article from webcit and text client and see the difference.

Mon Oct 26 2020 09:10:35 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Just wanted to let you know that <p>VTimes</p> is in the XML source.

Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

It is in the <author> field in XML source. So, it isn't really "injected", it was there to begin with.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Nov 01 2020 10:19:18 EST from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Yes, it has been logged in project management, and goes along with some other things we're working on..

but having said that, please continue your research. Any documentation we can get is helpful and saves time.

There will be some other things that go along with this.. namely getting expat errors to bubble up and explain why certain feeds are failing. 

 

Wed Oct 28 2020 03:23:52 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'd like to ask if this subject header issue is being handled as an active issue, or it is being placed in the "backburner" or labeled as "non-issue"? In case it is being handled, here's some more info.

The message below with the Subject header correctly displayed was created with text client as a followup to an existing article and, even though the Subject header would not show up in webcit, it got "resurrected" when replying to the article.

The message header is a copy of [headers] article menu choice.
The Subject in the header below, even though it looks garbled, is actually displayed correctly as can be seen in the article during the article view (below).

Here is a copy of a complete article, preceeded by the article headers.
Notice the =?UTF-8 at the start of the Subject: header. That is the correct way of specifying the charset as far as I recall.

Return-Path: user
Date: Wed, 28 Oct 2020 20:22:30 +0200
References: 
Subject: =?UTF-8?B?UmU6INCi0LDQutGC0LDRgNC+0LI6INGD0LzQtdC90LjRjyDQk9GN0YLQttC4INCyINCx0L7RgNGM0LHQtSDQvtGH0LXQvdGMINC/0LXRgNC10L7RhtC10L3QuNC70LgsINC90L4g0Y3RgtC+INC90LUg0YPQvNCw0LvRj9C10YIg0LfQsNGB0LvRg9CzINCd0YPRgNC80LDQs9C+0LzQtdC00L7QstCw?=
Message-ID: 
From: "user" 

And this is the article itself. So, the Subject looks good indeed.

Wed Oct 28 2020 20:22:30 EET from adminski Subject: Re: Тактаров: умения Гэтжи в борьбе очень переоценили, но это не умаляет заслуг Нурмагомедова

Trying to type in Russian would garble the characters.
 
Wed Oct 28 2020 14:03:07 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

The Subject: header is shown correctly while viewing the articles using textclient. You can view the same article from webcit and text client and see the difference.

Mon Oct 26 2020 09:10:35 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Just wanted to let you know that <p>VTimes</p> is in the XML source.

Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

It is in the <author> field in XML source. So, it isn't really "injected", it was there to begin with.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sun Nov 01 2020 11:52:36 EST from warbaby @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

platonov, I replied to this which should be in your inbox here, and would like to move the conversation out of Support for now.  

I think we're making progress, but probably not of interest to many people who are subscribed to Citadel Support. 

Please continue to correspond with me here on uncensored, by email.  We can post an update here after we have a patch or some other solution.

Wed Oct 28 2020 03:23:52 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

The message below with the Subject header correctly displayed was created with text client as a followup to an existing article and, even though the Subject header would not show up in webcit, it got "resurrected" when replying to the article.

The message header is a copy of [headers] article menu choice.
The Subject in the header below, even though it looks garbled, is actually displayed correctly as can be seen in the article during the article view (below).

Here is a copy of a complete article, preceeded by the article headers.
Notice the =?UTF-8 at the start of the Subject: header. That is the correct way of specifying the charset as far as I recall.

Return-Path: user
Date: Wed, 28 Oct 2020 20:22:30 +0200
References: 
Subject: =?UTF-8?B?UmU6INCi0LDQutGC0LDRgNC+0LI6INGD0LzQtdC90LjRjyDQk9GN0YLQttC4INCyINCx0L7RgNGM0LHQtSDQvtGH0LXQvdGMINC/0LXRgNC10L7RhtC10L3QuNC70LgsINC90L4g0Y3RgtC+INC90LUg0YPQvNCw0LvRj9C10YIg0LfQsNGB0LvRg9CzINCd0YPRgNC80LDQs9C+0LzQtdC00L7QstCw?=
Message-ID: 
From: "user" 

And this is the article itself. So, the Subject looks good indeed.

Wed Oct 28 2020 20:22:30 EET from adminski Subject: Re: Тактаров: умения Гэтжи в борьбе очень переоценили, но это не умаляет заслуг Нурмагомедова

Trying to type in Russian would garble the characters.
 
Wed Oct 28 2020 14:03:07 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

The Subject: header is shown correctly while viewing the articles using textclient. You can view the same article from webcit and text client and see the difference.

Mon Oct 26 2020 09:10:35 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Just wanted to let you know that <p>VTimes</p> is in the XML source.

Sun Oct 25 2020 17:50:31 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Sun Oct 25 2020 17:04:37 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I think you can take a break for now platonov..

Well, I'd like to try some things to get this issue resolved. I need to know if I can rely on cit929 with these RSS issues. Else, I'd have to go back to my old 8.24. The way it is right now is not something I'd be willing to swallow.

But I need to be able to build citserver. I did comment out the rm -rf in the easyinstall.sh and then tried to do ./configure. It complained about sieve2.h configure: error: sieve2.h was not found and is required. But I just did a fresh install on citadel yesterday, so... why is it getting stuck there and what do I do to be able to build citserver? Is there any documentation on building and dependencies?

although it might be interesting to still create a list of Russian language feeds, and see if we can find any that parse..

Somehow we're getting Header Injected.. which as I expected, appears to be related.. so it's  going to take more research.

It is in the <author> field in XML source. So, it isn't really "injected", it was there to begin with.

I'll have to get in touch with the "Big Guy".. [Not Joe Biden in this case, but the "Big Guy" knows who he is...]

Return-Path: <p>          VTimes       </p>
Date: Fri, 23 Oct 2020 12:27:00 -0000
Subject: В Москве введут скидки на проезд на двух линиях метро

It looks like we are running into a problem having both HTML Entities and the UTF-8 Russian in the subject line..

See this xml at https://www.vtimes.io/rss

 

Sun Oct 25 2020 04:35:43 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'm using grep (ag actually) + my brain.

I'll have a look at cscope.. thanks for the tip!

 

Sun Oct 25 2020 03:49:24 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, are you using cscope?

- Highly recommended. Don't you even START thinking about finding things in citadel if you don't have its database in cscope!!!

Sun Oct 25 2020 14:29:17 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Okay, that did not work..trying something else..

Return-Path: rss
Date: Sun, 25 Oct 2020 13:30:17 -0000
Subject: Матыцин назвал Нурмагомедова выдающимся борцом современности
Message-ID: <5F95C063-0000163F@southernwork.org>
From: "rss" <rss@southernwork.org>
Content-type: text/html

Sun Oct 25 2020 02:27:23 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Whatever you sent probably got stripped away, because I don't see anything.

So, here's my test...

1) Log into a remote vps..

2) wget -O - http://tass.ru/rss/v2.xml

3) Edit <language>ru-ru</language> to be <language>ru_RU</language>

4) Add new test room in webcit.. add new feed under Remote Retrieval (link to the new edited file..)

5) Just waiting for the event to run to see what it looks like now..

 

Sun Oct 25 2020 02:08:23 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

 

Here's the complete article as viewed in Citadel: (header is obtained from Citadel [headers] article menu choice while viewing the article, and body is directly copied from Citadel article source)

Notice that the Subject header is all screwed up, but the article body is fine.

Now, when you look at a screwed up article header in Citadel, push the [headers] article menu choice and you will see that the Subject header is actually there, but its character encoding causes something in Citadel to trim it down to 0.

And, what is even more interesting and is proof is that we are dealing with character encoding problem is that the encoding is specified as the fist thing in the SMTP Subject header.

So, if you look at the article with trunkated Subject and Subject starts with English/Latin characters, the ARE displayed correctly and if you scroll that feed in citadel, you will notice that a few articles begin either with English chars or delimiters.

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: 
From: "rss" 
Content-type: text/html
Sun Oct 25 2020 17:25:28 EET from rss
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА

https://tass.ru/obschestvo/9812229

Well, there are several Cyrillic encodings, including KOI, Windows-1251, 1252. So, which one we are dealing with is not quite clear and probably not very relevant.

Sun Oct 25 2020 13:39:59 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, you are way ahead of me on this and I am not quite following some things... see in-line.

Sun Oct 25 2020 13:12:51 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

"This is the probably the problem....." Got stripped out of my last message..

<?php

$s = 'Студентов университета &quot;Сириус&quot; в Сочи научили отражать кибератаки на банкоматы';
$s = html_entity_decode($s); 
$moji= mb_convert_encoding($s, 'UTF-8', 'KOI8-U');  # uses iconv
print $moji.PHP_EOL;

# prints п║я┌я┐пЄп╣пҐя┌п╬п╡ я┐пҐп╦п╡п╣я─я│п╦я┌п╣я┌п╟ "п║п╦я─п╦я┐я│" п╡ п║п╬я┤п╦ пҐп╟я┐я┤п╦п╩п╦ п╬я┌я─п╟пІп╟я┌я▄ п╨п╦п╠п╣я─п╟я┌п╟п╨п╦ пҐп╟ п╠п╟пҐп╨п╬п╪п╟я┌я▀

Sun Oct 25 2020 01:05:38 PM EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

You will probably want to read my last message, but a faster test would be to download the rss/xml file, then play around with editing the language string. 

I am not following this "play around with editing the language string".

Thunderbirds rss parser is probably advanced enough to figure out what the characters set should be, regardless of the language specified in the xml file..

Citadel probably giving you exactly what is asked for (XML_Parser) .. UTF-8, converted to KOI-8 ..

I'll bet if you play around with the ISO language code, you can probably get it to parse.. so the fastest fix might be to write a wrapper script which grabs the xml and fixes the lang code..

Wooo! I have no intention to "play around with the ISO language code". I bet you can do a much better job at that :)

 

Sun Oct 25 2020 12:58:08 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

If we want to look and see how Thunderbird handles this RSS feed - http://tass.ru/rss/v2.xml we can add that feed directly to Thunderbird and then look at it.

Taking one article as an example and saving it in Thunderbird as .eml  (which is a standard SMTP article format):

X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Received: by localhost; Sun, 25 Oct 2020 18:12:04 +0200
Date: Sun, 25 Oct 2020 18:25:28 +0300
Message-Id: <https://tass.ru/obschestvo/9812229@localhost.localdomain>
From: <ТАСС>
MIME-Version: 1.0
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Keywords: Общество, Спорт, Бой Нурмагомедов - Гэтжи
Content-Transfer-Encoding: 8bit
Content-Base: https://tass.ru/obschestvo/9812229
Content-Type: text/html; charset=UTF-8

<html>
<head>
<title>Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором</title>
<base href="https://tass.ru/obschestvo/9812229">
</head>
<body id="msgFeedSummaryBody" selected="false">
Всего на счету 32-летнего бойца смешанных единоборств 29 побед и ни одного поражения в ММА
</body>
</html>

When we look at the same article in Citadel (doing view as email) and pushing headers menu choice in the article, what we get is this:

Return-Path: rss
Date: Sun, 25 Oct 2020 17:25:28 +0200
Subject: Песков заявил, что решение Нурмагомедова завершить карьеру является его выбором
Message-ID: <5F959A4A-0000777E@preciseinfo.org>
From: "rss" <rss@preciseinfo.org>
Content-type: text/html

So, there is no charset specified. The question is "how come"?

Does anybody know?

Sun Oct 25 2020 16:56:40 EET from veeren Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Sun Oct 25 2020 11:04:26 EDT from platonov @ Uncensored
Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Well, this Subject header issue is quite something. I'll try to be brief...

I removed /usr/local/ citadel, ctdlsupport and webcit dirs (rm -rf).

Then I did a fresh install of Citadel to make sure there are no side-effects or residual something that remained from my 8.24 version.

Then I created the accounts in Thunderbird to IMAP access citadel account. So, the RSS room would get automatically copied to Thunderbird.

Now...

Subject header problem, as viewed in that context:

The Subject headers, as viewing from Citadel, are absent on TASS RSS feed: http://tass.ru/rss/v2.xml
But they look fine as automatically copied via IMAP to Thunderbird. That means to me that headers do exist and the reason we see them as absent is because they are being trimmed down to pure English/Latin characters if present in the BEGINNING of the subject header.
Except in Thunderbird even though the Subject header looks fine, but article body is scrambled. (see below).

It is like KOI-8 encoded characters (in the body) are interpreted as UTF-8, which is a default for XML feeds.
So, the article body encoding does not recognize that it is not a UTF-8 char set, and we are reading KOI-8 or Windows-1251 as UTF-8.

Thunderbird version: (this is how it looks in Thunderbird)

Subject: В Ярославле разработали приложение, автоматически вызывающее скорую в случае опасности

Пока оно предназначено для сотрудников "Россетей", работающих с электрооборудованием, однако разработчики готовы модернизировать его и для других категорий граждан

https://tass.ru/obschestvo/9811873

Sat Oct 24 2020 09:31:09 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Actually, now I have 2 citadel installations, one is the original 64 bit 929 and the other is the one that originally had cit 824 on it, running on Ubuntu 18.04 and was upgraded in-place to 929 via easyinstall.

Now, BOTH of these have the same exact problem with Subject: header. So, I am wondering what kind of a miracle had happened on your version that makes Subject headers work correctly...

Sat Oct 24 2020 09:09:44 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Yep, finding it would be great!

Btw, what version of Citadel are you running? When I did a fresh install it was 9.17, but I did apt-get install first and had some problems of starting it up. So, I removed those packages and did the easyinstall version and everything went fine and it was running OK from the very beginning.

I wonder if that had something to do with my Subject problem. May be something did not get completely removed when I deinstalled the package.

Since you are saying it works on your box, then what is the difference between your and mine.

Fri Oct 23 2020 17:42:03 EDT from warbaby Subject: Re: Is there a problem of incorrect displaying of Subject: header?

No, not really, but am learning. I encourage you to hang in there & keep looking.

We both really need to fix this and I am committed to finding it.

Well, that's good to hear. At this point I's be even willing to do something drastic to verify that it is not a problem of dpkg version of install vs. easyinstall.

We need all the eyes we can get...

Fri Oct 23 2020 05:28:05 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I am thinking about looking at diff between 8.24 and 9.28 of serv_rssclient.c. Looks like lots of changes...

Well, those trim and strip functions and the places where they are used is something to look at.

Btw, do you know that code well enough?

Fri Oct 23 2020 16:23:52 EDT from warbaby @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

it's in modules/rssclient/serv_rssclient.c [attached]

took a look also at striplt() in libcitadel/lib/tools.c ..

kind of a trim() function.. didn't see anything obvious..

 

Fri Oct 23 2020 03:23:17 PM EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

What is interesting about this Subject problem is that there is no such a problem in the article body. This RSS feed below is from the biggest Russian news agency and so the article bodies are in Russian.

Furthermore, the problem occurs only during article reception processing. Once the article is stored, from then on this problem does not occur. This is evident from the fact that these headers look fine on cit 824 and only the articles being received are affected.

Fri Oct 23 2020 15:10:29 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

This issue of Subject: header is getting more and more interesting. But there are several things involved.

First of all, I upgraded cit 8.24 to 9.29 via easyinstall. Everything went OK and after logging-in, the first thing I noticed is that the RSS feed room, which looked fine under 8.24 now is having problems with incorrect Subject header. Old messages that were received while it was still 8.24 looked correct, but the new ones that were received after cit upgrade did have the Subject problem.

When I looked at the xml source for the RSS feed http://tass.ru/rss/v2.xml all the <title> xxxx </title> fields were present, and those are the fields that would become the Subject header. So... Why such a simple thing as string copy of title to Subject: header would not work?

Furthermore, all the Subject headers that would start with English/Latin words would only show up the English/Latin characters, but everything else would get chopped off, starting with european languages delimiters and non-English/Latin characters.

So... Where is the problem in this case?

I really can't have this issue slide and it's a shame to loose some major news feed because of such an obvious issue. Loosing a Subject header is loosing about the only thing that people read in majority of cases, the title. In today's news, it is painful to read even the article titles.

This problem is present on all the RSS feeds. So, for all practical purposes the RSS feed functionality could be classified as "not very useful", putting it in the mildest of terms.

Fri Oct 23 2020 09:53:56 EDT from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

I'll comment in-line of your text..

Thu Oct 22 2020 17:47:55 EDT from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
Well, yesterday I tried to do /usr/local/citadel/ctdlmigrate to xfer database

from 8.24 to 9.29 node.

I don't think that's even possible. If it is, then we need to put some code into the migrate utility to prevent that from being done. Citadel has a lot of code that runs when you upgrade, to convert old data formats into new data formats. For those interested, you can look at the source of serv_upgrade.c to see a fascinating timeline of things we've changed over the years and how it handles all of the conversion behind the scenes without bothering you about it.

What it *can't* do is perform that conversion and export/import at the same time.

The correct procedure is to upgrade to the latest version in-place, and *then* migrate it.

OK. Unfortunately, if I upgrade 824 in-place, then I'll probably get the original problem with Subject: headers getting trimmed down or absent for the RSS rooms.

I wonder if it would be helpful to you to track down that problem if I tell you exactly which RSS feed is guaranteed to have a problem with Subject headers.

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and there are very few headers that begin with Latin words.

I vaguely recall some Subject: header weirdness, such as character sets for the article are defined in the Subject header or something like that.

I wonder if you have a clue of where any manipulations involving the Subject header take place in the source code. Would be interesting to look at that code.

Thanx in advance.

I apologize if this isn't clear in the documentation.

 

 



 



 



 



 



 



 



 



 



 



 



 



 



serv_rssclient.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Fri Nov 06 2020 00:19:09 EST from IGnatius T Foobar @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and

there are very few headers that begin with Latin words.

I have subscribed to this feed here at Uncensored in a hidden room called "tass.ru"

Can you please go to that room and tell me whether you are seeing the same problems here that you are on your own system, and if so, point them out?
Unfortunately I cannot read Russian.

[#] Fri Nov 06 2020 04:31:14 EST from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Yep, I am seeing exactly the same problem in your room. Subject: header is not showing up, even though the header is not empty as you can see if you look at it from [headers] choice in the article menu. And there is no UTF-8 specified in the Subject header.

Fri Nov 06 2020 00:19:09 EST from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and

there are very few headers that begin with Latin words.

I have subscribed to this feed here at Uncensored in a hidden room called "tass.ru"

Can you please go to that room and tell me whether you are seeing the same problems here that you are on your own system, and if so, point them out?
Unfortunately I cannot read Russian.

 



[#] Fri Nov 06 2020 09:16:14 EST from platonov @ Uncensored

Subject: Re: Is there a problem of incorrect displaying of Subject: header?

[Reply] [ReplyQuoted] [Headers] [Print]

Btw, I'd like to show you the diffs between 929 and 824 in serv_rssclient.c

It looks like it is pretty clear where this problem is, when/if you compare the 824 and 929 versions of serv_rssclient.c

It seems to me that the issue is missing the call to StrBufRFC2047encode().

Two lines, just before CM_SetAsFieldSB(), are missing in 929

serv_rssclient_824.c Lines 355 - 358

    StrBufTrim(Encoded);
StrBufRFC2047encode(&QPEncoded, Encoded);
CM_SetAsFieldSB(&SaveMsg->Msg, eMsgSubject, &QPEncoded);

Compare it to:
serv_rssclient_929.c (around line 202 CM_SetField()

Lines 200 - 204

else if (!strcasecmp(el, "title")) { // item subject (rss and atom)
if ((r->msg != NULL) && (CM_IsEmpty(r->msg, eMsgSubject))) {
CM_SetField(r->msg, eMsgSubject, ChrPtr(r->CData), StrLength(r->CData));
striplt(r->msg->cm_fields[eMsgSubject]);
}
}

I'll include the c sources in attachments here.

Fri Nov 06 2020 00:19:09 EST from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and

there are very few headers that begin with Latin words.

I have subscribed to this feed here at Uncensored in a hidden room called "tass.ru"

Can you please go to that room and tell me whether you are seeing the same problems here that you are on your own system, and if so, point them out?
Unfortunately I cannot read Russian.

 



serv_rssclient_824.c (text/x-csrc, 30309 bytes) [ View | Download ]
serv_rssclient_929.c (text/x-csrc, 15729 bytes) [ View | Download ]
[#] Sat Nov 07 2020 12:01:28 EST from Smashbot @ Uncensored

Subject: https redirection

[Reply] [ReplyQuoted] [Headers] [Print]

Please forgive me if I am missing something obvious. I  have looked through the docs and forums. I was able to get my citadel install to successfully serve up a certificate issued by Letsencrypt. 

Would someone please tell me how to make webcit redirect plaintext requests for port 80 over to 443 so it may use the https connection? 

Thanks in advance



[#] Sat Nov 07 2020 12:59:55 EST from johnfound @ Uncensored

Subject: Problems with EasyInstall on Manjaro/Arch Linux.

[Reply] [ReplyQuoted] [Headers] [Print]

I have tried to install Citadel through EasyInstall, but got an error message: "multiple definition of `ThreadKey'" and LD failed to link the code.

Easy Install can't install the dependencies automatically in Arch Linux, so I have tried to make it by hand (relating to the citadel.org description) and it seems that the dependencies are here.

So, how to proceed further?

Regards.



[#] Sat Nov 07 2020 15:49:00 EST from platonov @ Uncensored

Subject: Problem with incorrect displaying of Subject: header fixed

[Reply] [ReplyQuoted] [Headers] [Print]

Well, I've got it working on my box with Subject headers. But there are 2 more headers that need to be fixed as well.

I did it mostly on intuition and with very lil understanding of code. So...

Fri Nov 06 2020 09:16:14 EST from platonov @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?

Btw, I'd like to show you the diffs between 929 and 824 in serv_rssclient.c

It looks like it is pretty clear where this problem is, when/if you compare the 824 and 929 versions of serv_rssclient.c

It seems to me that the issue is missing the call to StrBufRFC2047encode().

Two lines, just before CM_SetAsFieldSB(), are missing in 929

serv_rssclient_824.c Lines 355 - 358

    StrBufTrim(Encoded);
StrBufRFC2047encode(&QPEncoded, Encoded);
CM_SetAsFieldSB(&SaveMsg->Msg, eMsgSubject, &QPEncoded);

Compare it to:
serv_rssclient_929.c (around line 202 CM_SetField()

Lines 200 - 204

else if (!strcasecmp(el, "title")) { // item subject (rss and atom)
if ((r->msg != NULL) && (CM_IsEmpty(r->msg, eMsgSubject))) {
CM_SetField(r->msg, eMsgSubject, ChrPtr(r->CData), StrLength(r->CData));
striplt(r->msg->cm_fields[eMsgSubject]);
}
}

I'll include the c sources in attachments here.

Fri Nov 06 2020 00:19:09 EST from IGnatius T Foobar @ Uncensored Subject: Re: Is there a problem of incorrect displaying of Subject: header?
The feed is: http://tass.ru/rss/v2.xml
You'll notice there are no Subject headers in vast majority of cases and

there are very few headers that begin with Latin words.

I have subscribed to this feed here at Uncensored in a hidden room called "tass.ru"

Can you please go to that room and tell me whether you are seeing the same problems here that you are on your own system, and if so, point them out?
Unfortunately I cannot read Russian.

 



 



serv_rssclient_929.c (text/x-csrc, 15729 bytes) [ View | Download ]
serv_rssclient_824.c (text/x-csrc, 30309 bytes) [ View | Download ]
[#] Sat Nov 07 2020 20:22:18 EST from warbaby @ Uncensored

Subject: Re: https redirection

[Reply] [ReplyQuoted] [Headers] [Print]

The package versions have an init.d script which you can edit to specify the ports for webcit http and https. 

If this is the case, just edit the webcit script in init.d, and make sure it's not starting on port 80, by commenting out a few lines. 

Then, jump down to the nginx configuration (below).

Hopeful you have used easyinstall.  There are two services for webcit that easyinstall puts in /etc/systemd/system

webcit-http.service

and

webcit-https.service

you can mv or rm webcit-http.service out of there. Just run webcit-https

service start webcit-https

If you want to redirect your http traffic to https, install nginx with one simple vhost

(delete the default virtual host in /etc/nginx/site-enabled), and add a file in site-enabled like

my-citadel.com.conf (substitute your own domain).

### NON-SSL

  {
   listen 80;
   listen [::]:80;
   server_name my-citadel.com;
   return 301 https://my-citadel.com$request_uri;
   }

 

 

Sat Nov 07 2020 12:01:28 PM EST from Smashbot @ Uncensored Subject: https redirection

Please forgive me if I am missing something obvious. I  have looked through the docs and forums. I was able to get my citadel install to successfully serve up a certificate issued by Letsencrypt. 

Would someone please tell me how to make webcit redirect plaintext requests for port 80 over to 443 so it may use the https connection? 

Thanks in advance



 



Go to page: First ... 21 22 23 24 [25] 26 27 28 29 ... Last