Search broken using UTF-8 and SQLite

Found a bug? Tell us!!
Post Reply
SvOlli
Regular
Posts: 9
Joined: Wed Mar 03, 2010 7:39 pm

Search broken using UTF-8 and SQLite

Post by SvOlli »

Hello,

I'm running s9y for almost five years now, using SQLite as database engine. On the migration 1.4.1 -> 1.5.2, I also changed the database format from Latin-1 to UTF-8. After that the search does not find anything if I search for an umlaut, which worked when the blog was run using Latin-1. Searching for "word1 word2" does fail also, even though an article contains the exact phrase. A search for "word1*word2" does fail also.

I've tried another blog that runs on UTF-8 since it's creation and the bugs appear there also. Any hints on where to start looking for problem causing this?
SvOlli
Regular
Posts: 9
Joined: Wed Mar 03, 2010 7:39 pm

Re: Search broken using UTF-8 and SQLite

Post by SvOlli »

After some more testing and by accident I found the real cause. It has nothing to do with umlauts and probably nothing with the database: you can't search anything that's the text of a link.

If your text includes:

Code: Select all

<a href="http://www.s9y.org">jump to s9y page</a>
a search for "jump" or "page" will not find the article containing that link.

Sorry if I caused any inconvenience,
SvOlli
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Search broken using UTF-8 and SQLite

Post by garvinhicking »

Hi!

That's strange, because I believe the SQL search uses "LIKE %...%" which should also find something if you have "ajumper" and search for "jump"...?!

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
SvOlli
Regular
Posts: 9
Joined: Wed Mar 03, 2010 7:39 pm

Re: Search broken using UTF-8 and SQLite

Post by SvOlli »

Hmmm... I tried it again and it's not quite that either. Fact is, that it doesn't find obvious texts. Please try it yourself: go to http://blog.h8u.de/ and search for the terms "Qt 4.6" and "lighttpd". Both are available in the latest article "Das Ende meines Apachen...?", but cannot be found using the search.

I tried to dig up the search function by greping the the code for strings like "searchTerm", which is passed as an argument to search for, but could not dig up something. Could anyone point me out on where to look for the function in question?

And from another project, where I was using databases from Qt 4.x, I can tell that SQLite and MySQL are "more compatible" as SQLite and PostgreSQL for example. So, yes, SQLite uses the same placeholders as described in the SQL standard: "_" for one random character and "%" for zero or more random characters.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Search broken using UTF-8 and SQLite

Post by garvinhicking »

Hi!

Do you have an external SQLite3 tool that you could use to execute SQL queries? If so, you could try:

Code: Select all

SELECT * FROM serendipity_entries WHERE body LIKE '%lighttpd%'
If this does not yield results, SQLite might not support it properly, and I wouldn'T know how to change that in the application, if the database query does not return expected results...

The function is in include/functions_entries.incphp, serendipity_searchEntries().

HTH,
GArvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
SvOlli
Regular
Posts: 9
Joined: Wed Mar 03, 2010 7:39 pm

Re: Search broken using UTF-8 and SQLite

Post by SvOlli »

garvinhicking wrote: Do you have an external SQLite3 tool that you could use to execute SQL queries? If so, you could try:

Code: Select all

SELECT * FROM serendipity_entries WHERE body LIKE '%lighttpd%'
Using the SQLite2 tool, it works as expected: it finds the article.

I tried it again on some other data, and it seems to me that everything that's in between tags, e.g. <span>...</span>, <table>...</table> can not be found?

I'll dig into function_entries as soon as possible.

Greetings,
SvOlli
SvOlli
Regular
Posts: 9
Joined: Wed Mar 03, 2010 7:39 pm

Re: Search broken using UTF-8 and SQLite

Post by SvOlli »

Who needs sleep anyway? ;-)

When the search didn't bring up enough matches, the search was restarted with an appended "*". SQLite treats "*" like any other character, so the second search didn't bring up anything at all.

Here's my suggestion for a fix / workaround:

Code: Select all

diff --git a/include/functions_entries.inc.php b/include/functions_entries.inc.p
index 5518e4a..fae7913 100644
--- a/include/functions_entries.inc.php
+++ b/include/functions_entries.inc.php
@@ -773,6 +773,7 @@ function &serendipity_searchEntries($term, $limit = '', $sea
     } elseif ($serendipity['dbType'] == 'sqlite' || $serendipity['dbType'] == '
         // Very extensive SQLite search. There currently seems no other way to 
         // But it's better than no search at all :-D
+        $term = str_replace( "*", "%", $term );
         $cond['group']     = 'GROUP BY e.id';
         $cond['distinct']  = '';
         $term              = serendipity_mb('strtolower', $term);
@@ -863,7 +864,7 @@ function &serendipity_searchEntries($term, $limit = '', $sea
 
     //if * wasn't already appended and if there are none or not enough
     //results, search again for entries containing the searchterm as a part  
-    if (strpos($term, '*') === false) {
+    if ((strpos($term, '*') === false) && (strpos($term, '%') === false)) {
         if (! is_array($search)) {
             return serendipity_searchEntries($term.'*', $orig_limit);
         }else if (count($search) < 4){
So it wasn't my imagination... just some bad and misinterpreted coincidences. :mrgreen:
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Search broken using UTF-8 and SQLite

Post by garvinhicking »

Hi!

That's a very good catch, thanks a lot! I just committed your patch.

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
SvOlli
Regular
Posts: 9
Joined: Wed Mar 03, 2010 7:39 pm

Re: Search broken using UTF-8 and SQLite

Post by SvOlli »

You're welcome, and I couldn't have done it without you pointing me to include/functions_entries.inc.php, serendipity_searchEntries().

I've updated my patch, as my first attempt had one drawback: it searched twice just to get the same results, if there was none or less than four matches.

Code: Select all

diff --git a/include/functions_entries.inc.php b/include/functions_entries.inc.php
index 5518e4a..ef6643d 100644
--- a/include/functions_entries.inc.php
+++ b/include/functions_entries.inc.php
@@ -773,6 +773,7 @@ function &serendipity_searchEntries($term, $limit = '', $searchresults = '') {
     } elseif ($serendipity['dbType'] == 'sqlite' || $serendipity['dbType'] == 'sqlite3') {
         // Very extensive SQLite search. There currently seems no other way to perform fulltext search in SQLite
         // But it's better than no search at all :-D
+        $term = str_replace( "*", "%", $term );
         $cond['group']     = 'GROUP BY e.id';
         $cond['distinct']  = '';
         $term              = serendipity_mb('strtolower', $term);
@@ -863,7 +864,9 @@ function &serendipity_searchEntries($term, $limit = '', $searchresults = '') {
 
     //if * wasn't already appended and if there are none or not enough
     //results, search again for entries containing the searchterm as a part  
-    if (strpos($term, '*') === false) {
+    if ((strpos($term, '*') === false) &&
+        ($serendipity['dbType'] != 'sqlite') &&
+        ($serendipity['dbType'] != 'sqlite3')) {
         if (! is_array($search)) {
             return serendipity_searchEntries($term.'*', $orig_limit);
         }else if (count($search) < 4){
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Search broken using UTF-8 and SQLite

Post by garvinhicking »

Hi!

Thanks a lot, committed your fix!

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Post Reply