Page 1 of 1

Search broken using UTF-8 and SQLite

Posted: Sun Mar 07, 2010 7:23 pm
by SvOlli
Hello,

I'm running s9y for almost five years now, using SQLite as database engine. On the migration 1.4.1 -> 1.5.2, I also changed the database format from Latin-1 to UTF-8. After that the search does not find anything if I search for an umlaut, which worked when the blog was run using Latin-1. Searching for "word1 word2" does fail also, even though an article contains the exact phrase. A search for "word1*word2" does fail also.

I've tried another blog that runs on UTF-8 since it's creation and the bugs appear there also. Any hints on where to start looking for problem causing this?

Re: Search broken using UTF-8 and SQLite

Posted: Mon Mar 08, 2010 7:39 am
by SvOlli
After some more testing and by accident I found the real cause. It has nothing to do with umlauts and probably nothing with the database: you can't search anything that's the text of a link.

If your text includes:

Code: Select all

<a href="http://www.s9y.org">jump to s9y page</a>
a search for "jump" or "page" will not find the article containing that link.

Sorry if I caused any inconvenience,
SvOlli

Re: Search broken using UTF-8 and SQLite

Posted: Mon Mar 08, 2010 12:55 pm
by garvinhicking
Hi!

That's strange, because I believe the SQL search uses "LIKE %...%" which should also find something if you have "ajumper" and search for "jump"...?!

Regards,
Garvin

Re: Search broken using UTF-8 and SQLite

Posted: Mon Mar 08, 2010 2:30 pm
by kleinerChemiker
does sqlite use the same placeholder like mysql?

Re: Search broken using UTF-8 and SQLite

Posted: Mon Mar 08, 2010 10:05 pm
by SvOlli
Hmmm... I tried it again and it's not quite that either. Fact is, that it doesn't find obvious texts. Please try it yourself: go to http://blog.h8u.de/ and search for the terms "Qt 4.6" and "lighttpd". Both are available in the latest article "Das Ende meines Apachen...?", but cannot be found using the search.

I tried to dig up the search function by greping the the code for strings like "searchTerm", which is passed as an argument to search for, but could not dig up something. Could anyone point me out on where to look for the function in question?

And from another project, where I was using databases from Qt 4.x, I can tell that SQLite and MySQL are "more compatible" as SQLite and PostgreSQL for example. So, yes, SQLite uses the same placeholders as described in the SQL standard: "_" for one random character and "%" for zero or more random characters.

Re: Search broken using UTF-8 and SQLite

Posted: Tue Mar 09, 2010 2:15 pm
by garvinhicking
Hi!

Do you have an external SQLite3 tool that you could use to execute SQL queries? If so, you could try:

Code: Select all

SELECT * FROM serendipity_entries WHERE body LIKE '%lighttpd%'
If this does not yield results, SQLite might not support it properly, and I wouldn'T know how to change that in the application, if the database query does not return expected results...

The function is in include/functions_entries.incphp, serendipity_searchEntries().

HTH,
GArvin

Re: Search broken using UTF-8 and SQLite

Posted: Tue Mar 09, 2010 9:56 pm
by SvOlli
garvinhicking wrote: Do you have an external SQLite3 tool that you could use to execute SQL queries? If so, you could try:

Code: Select all

SELECT * FROM serendipity_entries WHERE body LIKE '%lighttpd%'
Using the SQLite2 tool, it works as expected: it finds the article.

I tried it again on some other data, and it seems to me that everything that's in between tags, e.g. <span>...</span>, <table>...</table> can not be found?

I'll dig into function_entries as soon as possible.

Greetings,
SvOlli

Re: Search broken using UTF-8 and SQLite

Posted: Tue Mar 09, 2010 10:44 pm
by SvOlli
Who needs sleep anyway? ;-)

When the search didn't bring up enough matches, the search was restarted with an appended "*". SQLite treats "*" like any other character, so the second search didn't bring up anything at all.

Here's my suggestion for a fix / workaround:

Code: Select all

diff --git a/include/functions_entries.inc.php b/include/functions_entries.inc.p
index 5518e4a..fae7913 100644
--- a/include/functions_entries.inc.php
+++ b/include/functions_entries.inc.php
@@ -773,6 +773,7 @@ function &serendipity_searchEntries($term, $limit = '', $sea
     } elseif ($serendipity['dbType'] == 'sqlite' || $serendipity['dbType'] == '
         // Very extensive SQLite search. There currently seems no other way to 
         // But it's better than no search at all :-D
+        $term = str_replace( "*", "%", $term );
         $cond['group']     = 'GROUP BY e.id';
         $cond['distinct']  = '';
         $term              = serendipity_mb('strtolower', $term);
@@ -863,7 +864,7 @@ function &serendipity_searchEntries($term, $limit = '', $sea
 
     //if * wasn't already appended and if there are none or not enough
     //results, search again for entries containing the searchterm as a part  
-    if (strpos($term, '*') === false) {
+    if ((strpos($term, '*') === false) && (strpos($term, '%') === false)) {
         if (! is_array($search)) {
             return serendipity_searchEntries($term.'*', $orig_limit);
         }else if (count($search) < 4){
So it wasn't my imagination... just some bad and misinterpreted coincidences. :mrgreen:

Re: Search broken using UTF-8 and SQLite

Posted: Wed Mar 10, 2010 1:37 pm
by garvinhicking
Hi!

That's a very good catch, thanks a lot! I just committed your patch.

Regards,
Garvin

Re: Search broken using UTF-8 and SQLite

Posted: Thu Mar 11, 2010 12:09 am
by SvOlli
You're welcome, and I couldn't have done it without you pointing me to include/functions_entries.inc.php, serendipity_searchEntries().

I've updated my patch, as my first attempt had one drawback: it searched twice just to get the same results, if there was none or less than four matches.

Code: Select all

diff --git a/include/functions_entries.inc.php b/include/functions_entries.inc.php
index 5518e4a..ef6643d 100644
--- a/include/functions_entries.inc.php
+++ b/include/functions_entries.inc.php
@@ -773,6 +773,7 @@ function &serendipity_searchEntries($term, $limit = '', $searchresults = '') {
     } elseif ($serendipity['dbType'] == 'sqlite' || $serendipity['dbType'] == 'sqlite3') {
         // Very extensive SQLite search. There currently seems no other way to perform fulltext search in SQLite
         // But it's better than no search at all :-D
+        $term = str_replace( "*", "%", $term );
         $cond['group']     = 'GROUP BY e.id';
         $cond['distinct']  = '';
         $term              = serendipity_mb('strtolower', $term);
@@ -863,7 +864,9 @@ function &serendipity_searchEntries($term, $limit = '', $searchresults = '') {
 
     //if * wasn't already appended and if there are none or not enough
     //results, search again for entries containing the searchterm as a part  
-    if (strpos($term, '*') === false) {
+    if ((strpos($term, '*') === false) &&
+        ($serendipity['dbType'] != 'sqlite') &&
+        ($serendipity['dbType'] != 'sqlite3')) {
         if (! is_array($search)) {
             return serendipity_searchEntries($term.'*', $orig_limit);
         }else if (count($search) < 4){

Re: Search broken using UTF-8 and SQLite

Posted: Thu Mar 11, 2010 10:14 am
by garvinhicking
Hi!

Thanks a lot, committed your fix!

Regards,
Garvin