SQL to fix old [IMG] tag ?

Random stuff about serendipity. Discussion, Questions, Paraphernalia.
3nd3r
Regular
Posts: 71
Joined: Tue Aug 19, 2008 12:58 pm
Location: NeoMadrid, Spain
Contact:

SQL to fix old [IMG] tag ?

Post by 3nd3r »

Hello!

Well, I'm porting from my old SMF with a great help from Garvin. Posts and comments are imported, but I need to fix a little issue with images, because the BBCode in SMF are not dumped properly in s9y.

So, I've wonder that the solution would be a SQL query in serendipity_entries and serendipity_comments tables to fix it...

The old img BBCode still present in my new web actually is like:

Code: Select all

[img (align=left,right,center) (width=xxx) (height=yyy) (alt=AAA)]http://sourceimageurl[/img]
That has to be turned in:

Code: Select all

<img src="http://sourceimageurl" (align=left,right,center) (width=xxx) (height=yyy) (title="AAA") (alt="AAA")>
Knowing that parameters in brackets '()' are optional, which means that they're not present on all img tags, but some can be; and 'title' tags will always be the same that 'alt' ones.

I'm not a SQL expert, so I've wonder if someone can help me if this is the right solution to fix this, knowing that serendipity_entries has 2,771 lines and serendipity_comments has 977.

Hope that this will be possible...

You can see my s9y site in http://www.pajareo.com/s9y/

Thanks in advance for your time.
Try not. Do, or do not. There is no try.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

I'm not sure this can be easily solved in SQL.

However, it's a cinch for any editor that supports regexp (regular expressions). Vim is one such editor.

Using your phpMyAdmin panel (or equivalent), get an SQL dump of your database. Copy it somewhere safe, just in case we mess things up and need to restore a working version.

I believe alt= BBCodes need to be enclosed in quotes, since they can include spaces.

Open a separate copy in your editor. Do search-and-replace with these regexps (displayed in Vim format; searches for everything between the first ## and replaces with everything between the second ##):

Code: Select all

:%s#\[img \([^]]*\)width=\([^ ]*\)\([^]*\)]#[img \1width="\2"\3]#g
:%s#\[img \([^]]*\)height=\([^ ]*\)\([^]]*\)]#[img \1height="\2"\3]#g
:%s#\[img \([^]]*\)alt="\([^"]*\)"\([^]]*\)]#[img \1alt="\2" title="\2"\3]#g
:%s#\[img \([^]]*\)width=\([^ ]*\)\([^]]*\)]#[img \1width="\2"\3]#g
:%s#\[img \([^]]*\)]#<img \1>#g
:%s#\[/img]#</img>#g
Check to make sure that my regexps worked as expected; I did a quick check here, but I can't anticipate everything.

Then restore your modified SQL dump through phpMyAdmin (or equivalent) again.

If everything went as expected, you should have usable images. If not, tell me what happened and I'll fix it!
Judebert
---
Website | Wishlist | PayPal
sonichouse
Regular
Posts: 196
Joined: Sun May 11, 2008 2:53 am
Contact:

Post by sonichouse »

Excuse me for seeming dense here, but could we not extend the existing BB code plugin to cater for this ?
Steve is occasionally blogging here
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

I believe we could, but the regexps in that plugin are quite dense. Rather than trying to modify them myself, I'd rather just substitute them with canonical code from an approved source.
Judebert
---
Website | Wishlist | PayPal
sonichouse
Regular
Posts: 196
Joined: Sun May 11, 2008 2:53 am
Contact:

Post by sonichouse »

judebert wrote:I believe we could, but the regexps in that plugin are quite dense. Rather than trying to modify them myself, I'd rather just substitute them with canonical code from an approved source.
Roughly translated - "We can perform miracles, but the impossible just takes longer" :D
Steve is occasionally blogging here
3nd3r
Regular
Posts: 71
Joined: Tue Aug 19, 2008 12:58 pm
Location: NeoMadrid, Spain
Contact:

Post by 3nd3r »

judebert wrote:I'm not sure this can be easily solved in SQL.

However, it's a cinch for any editor that supports regexp (regular expressions). Vim is one such editor.

Using your phpMyAdmin panel (or equivalent), get an SQL dump of your database. Copy it somewhere safe, just in case we mess things up and need to restore a working version.

I believe alt= BBCodes need to be enclosed in quotes, since they can include spaces.

Open a separate copy in your editor. Do search-and-replace with these regexps (displayed in Vim format; searches for everything between the first ## and replaces with everything between the second ##):

Code: Select all

:%s#\[img \([^]]*\)width=\([^ ]*\)\([^]*\)]#[img \1width="\2"\3]#g
:%s#\[img \([^]]*\)height=\([^ ]*\)\([^]]*\)]#[img \1height="\2"\3]#g
:%s#\[img \([^]]*\)alt="\([^"]*\)"\([^]]*\)]#[img \1alt="\2" title="\2"\3]#g
:%s#\[img \([^]]*\)width=\([^ ]*\)\([^]]*\)]#[img \1width="\2"\3]#g
:%s#\[img \([^]]*\)]#<img \1>#g
:%s#\[/img]#</img>#g
Check to make sure that my regexps worked as expected; I did a quick check here, but I can't anticipate everything.

Then restore your modified SQL dump through phpMyAdmin (or equivalent) again.

If everything went as expected, you should have usable images. If not, tell me what happened and I'll fix it!
Thanks for your answer!

Well, I've downloaded and intallled Vim 7.2. I've copy that code into a .vim text file. Then, I've made a SQL dump and load it in Vim.
Then I run the script .vim file with the code you posted, and this is the result (Sorry I don't know how to paste this message as text. I've just made a screen copy):
Image

And in the SQL modified file I've found that, for example, this line

Code: Select all

[img align=right width=300 alt=Poltergeist]http://www.channel4.com/film/media/images/Channel4/film/P/poltergeist_xl_01--film-B.jpg[/img]
is turned into this one

Code: Select all

<img align=right width="300" alt=Poltergeist>http://www.channel4.com/film/media/images/Channel4/film/P/poltergeist_xl_01--film-B.jpg</img>
That is almost the perfect code... I think this is done because the 'alt' attribute isn't between quotes in the source SQL... And then no proper 'alt' are dumped and no 'title' attribute are generated.

So... How can I remove the search for 'alt' attribute without quotes in your original source code?
Is this the way to solve it?

Thanks again for your time and help!
Try not. Do, or do not. There is no try.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

I was thinking of just typing those lines from within Vim! I assume you use :source to run the script, which is much better (a lot less typing, and fewer opportunities for mistakes).

The first two errors are a single problem: there's a \( without a matching \). I was missing a ]; I've put it back in.

The last error is no big deal; it just says it couldn't find alt="whatever" in the file. That's almost expected.

I left out the align attribute, too.

The script below should take care of all those problems. The only question remaining is: how can we tell when an alt= is finished? I assumed, since the alt can have spaces, that it would be enclosed in quotes. Since that's not the case, this time I'll assume that it's the last thing in line, and therefore it ends at the next ].

Code: Select all

:%s#\[img \([^]]*\)width=\([^ ]*\)\([^]]*\)]#[img \1width="\2"\3]#g
:%s#\[img \([^]]*\)height=\([^ ]*\)\([^]]*\)]#[img \1height="\2"\3]#g
:%s#\[img \([^]]*\)alt=\([^]]*\)\([^]]*\)]#[img \1alt="\2" title="\2"\3]#g
:%s#\[img \([^]]*\)align=\([^ ]*\)\([^]]*\)]#[img \1align="\2"\3]#g
:%s#\[img \([^]]*\)]#<img \1>#g
:%s#\[/img]#</img>#g 
If you find any other problems, just let me know!
Judebert
---
Website | Wishlist | PayPal
3nd3r
Regular
Posts: 71
Joined: Tue Aug 19, 2008 12:58 pm
Location: NeoMadrid, Spain
Contact:

Post by 3nd3r »

judebert wrote:I was thinking of just typing those lines from within Vim! I assume you use :source to run the script, which is much better (a lot less typing, and fewer opportunities for mistakes).

The first two errors are a single problem: there's a \( without a matching \). I was missing a ]; I've put it back in.

The last error is no big deal; it just says it couldn't find alt="whatever" in the file. That's almost expected.

I left out the align attribute, too.

The script below should take care of all those problems. The only question remaining is: how can we tell when an alt= is finished? I assumed, since the alt can have spaces, that it would be enclosed in quotes. Since that's not the case, this time I'll assume that it's the last thing in line, and therefore it ends at the next ].

Code: Select all

:%s#\[img \([^]]*\)width=\([^ ]*\)\([^]]*\)]#[img \1width="\2"\3]#g
:%s#\[img \([^]]*\)height=\([^ ]*\)\([^]]*\)]#[img \1height="\2"\3]#g
:%s#\[img \([^]]*\)alt=\([^]]*\)\([^]]*\)]#[img \1alt="\2" title="\2"\3]#g
:%s#\[img \([^]]*\)align=\([^ ]*\)\([^]]*\)]#[img \1align="\2"\3]#g
:%s#\[img \([^]]*\)]#<img \1>#g
:%s#\[/img]#</img>#g 
If you find any other problems, just let me know!
We're almost there! (and you're right with 'alt' attribute... it was always before the ']' )

Now this

Code: Select all

[img align=right width=300 alt=Poltergeist]http://www.channel4.com/film/media/images/Channel4/film/P/poltergeist_xl_01--film-B.jpg[/img]
is turned into this

Code: Select all

<img align="right" width="300" alt="Poltergeist" title="Poltergeist">http://www.channel4.com/film/media/images/Channel4/film/P/poltergeist_xl_01--film-B.jpg</img>
but should be turned into this

Code: Select all

<img align="right" width="300" alt="Poltergeist" title="Poltergeist" src="http://www.channel4.com/film/media/images/Channel4/film/P/poltergeist_xl_01--film-B.jpg">
The 'src' thing... :wink: Sorry, I did'nt see it before...
Thank you!
Try not. Do, or do not. There is no try.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Well, duh. I knew that. Don't you think I knew that? Of course I knew that!

I just forgot. :oops:

We probably want a closing slash, too, just to keep XHTML happy. Something like <img src="whatever" otherstuff="othervalues" />.

Here's the code to do that, starting from a clean SQL dump:

Code: Select all

:%s#\[img \([^]]*\)width=\([^ ]*\)\([^]]*\)]#[img \1width="\2"\3]#g
:%s#\[img \([^]]*\)height=\([^ ]*\)\([^]]*\)]#[img \1height="\2"\3]#g
:%s#\[img \([^]]*\)alt=\([^]]*\)\([^]]*\)]#[img \1alt="\2" title="\2"\3]#g
:%s#\[img \([^]]*\)align=\([^ ]*\)\([^]]*\)]#[img \1align="\2"\3]#g
:%s#\[img \([^]]\{-}\)]\(.\{-}\)\[/img]#<img src="\2" \1 />#g
If you want to work with your already-modified version, this should do the trick:

Code: Select all

:%s#<img \(.\{-}\)>\(.\{-}\)</img>#<img src="\2" \1 />#g
I've tested both of them out, and they even work on lines with multiple images.

Let me know how it works!
Judebert
---
Website | Wishlist | PayPal
3nd3r
Regular
Posts: 71
Joined: Tue Aug 19, 2008 12:58 pm
Location: NeoMadrid, Spain
Contact:

Post by 3nd3r »

It woooooorksss!!!

Now, just for adjust a little more and to forget about this...

I've noticed that some images had a padding style attached... But only some of them from frontpage... It's a little strange...

Maybe front page are using a style not present in other pages?

I mean, for example, if I edit this entry I see this

Code: Select all

<img src="http://epigenome.eu/media/images/large/1028.jpg" align="right" alt="Sirtuins" title="Sirtuins" />
and that's correct, but if I see the source code from rendered (in browser, I mean) page I see this

Code: Select all

<img style="padding: 5px" src="http://epigenome.eu/media/images/large/1028.jpg" alt="" align="right" />
So:

1) Are images in front page treated in different way than in other pages (i.e, archives) ? Why 'alt' and 'title' attributes disappear? Where can I look for this?

2) How can I modify whatever to make all images posted in entries like that one? I mean with padding:5 px ?

I hope this two questions will be easy to answer...

If you find that this must be asked in other place, please tell me.

And thank you very much for helping me! This community is awesome! :wink:
Try not. Do, or do not. There is no try.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

We're happy to help!

Unfortunately, I'm not so sure about this particular problem. It's quite likely the Wiki Markup plugin would alter your HTML, so you should disable it (if it's installed) and see if that fixes the problem. After that, I suspect rogue Javascript.

If neither of those is the case, I'd recommend creating a new topic so other eyes can look at it.
Judebert
---
Website | Wishlist | PayPal
3nd3r
Regular
Posts: 71
Joined: Tue Aug 19, 2008 12:58 pm
Location: NeoMadrid, Spain
Contact:

Post by 3nd3r »

No wiki markup installed and not suspect Javascript.

In a new entry the image isn't padded...

So, I'm thinking in some CSS or .tpl thing...

What CSS attribute should I modify to fix any image posted in entries with img style="padding: 5px" ? Is this possible?
Try not. Do, or do not. There is no try.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Weird. The only way to modify it in CSS would affect all images, and you don't want that. The problem is definitely in the entry; somehow the style has been added.

It could be in your template, I suppose; but then it should be on all the images. What template are you using?

I just did a quick search through the Serendipity code, and the only place I can find explicitly adding padding is in the WYSIWYG editor. Can you switch your user preferences to non-WYSIWYG, re-edit the entry, remove the padding if it shows up or make some other minor change, and save the entry?
Judebert
---
Website | Wishlist | PayPal
3nd3r
Regular
Posts: 71
Joined: Tue Aug 19, 2008 12:58 pm
Location: NeoMadrid, Spain
Contact:

Post by 3nd3r »

judebert wrote:Weird. The only way to modify it in CSS would affect all images, and you don't want that. The problem is definitely in the entry; somehow the style has been added.
I'm thinking I don't express myself correctly, because that's what I really want: Get all the images in entries with padding: 5 px...
judebert wrote:It could be in your template, I suppose; but then it should be on all the images. What template are you using?
I'm using Translucency blue, if that's what your asking...
judebert wrote:I just did a quick search through the Serendipity code, and the only place I can find explicitly adding padding is in the WYSIWYG editor. Can you switch your user preferences to non-WYSIWYG, re-edit the entry, remove the padding if it shows up or make some other minor change, and save the entry?
WYSIWYG has been always off, and no padding info was written in any entry...
So then, I've made a little experiment:
I've edited an entry with that padding "ghost" style applied. No padding style are shown on the editing entry page, so I manually add style="padding: 3px" before src= in image tag. Save the entry and yes, the padding style with 3px are applied.
Well, then I edited again this entry and remove the padding from the image tag, save it and... Ta-daah! The padding style has gone, and now this entry is like the others, with no padding style applied...

:shock:

So, my desire is to apply that style="padding: 5 px" to all images in entries... How can I get it?

Thank you for your time.
Try not. Do, or do not. There is no try.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Ah! Sorry, I did not understand the question. You want all the images in your entries to have 5px padding.

You'll want to edit the style.css in your templates/translucency/ folder. Add this:

Code: Select all

.serendipity_entry_body img {
  padding: 5px;
}
That should do it.
Judebert
---
Website | Wishlist | PayPal
Post Reply