Okay, let's see here... *cracks knuckles*
Holy carp!
So, knowing the href contains the .../{ASIN}/..., you want to pull the src="{weird stuff}" out of that HTML? And we're scraping the
entire Amazon product page to get that HTML?
Don't they provide tools for Amazon affiliates that make this much easier? Sheesh.
I
think what we need here is:
Code: Select all
if (preg_match('@id=[\'"]prodImageCell\1.*images/'.$asin.'/.*<\s*img.*?src=\s*([\'"])(http://[^/]*images(|-de\.|-jp\.|-eu\.|\-)amazon\.com/images/.+/.+?\.(png|jpg|gif))\2@i', $content, $matches))
That should find the element with the ID "prodImageCell", skip over everything to the URL with the ASIN (just to confirm it's there), skip some more to the <img> tag, and pull everything from the src= attribute. You'll have to change the $matches[2] to $matches[3] for the $image_url, and the $file_type will likewise need to be $matches[5] instead of $matches[4].
This contains some extra stuff to ensure we're not being sidetracked: we don't really *need* to check for the prodImageCell ID, and we don't need to ensure the src= occurs in an <img> tag. It could be made more specific, if necessary: we could make sure the id= were in a <td> tag, and even check for the id="prodImage" attribute after the src.
I say "think" because I haven't tested it myself. Let me know if there are any problems.