[FX.php List] Security Concerns with FileMaker Website

Joel Shapiro jsfmp at earthlink.net
Thu Jan 25 11:26:06 MST 2007


Can anyone briefly explain the risks of these bots and how "nasty" (a  
term used in this thread) they can really be?

I've looked at the link Gjermund posted, and it looks like the  
biggest real problems are email harvesting and use of bandwidth.   
Ed's original post was because he has many emails and websites on his  
site, but if there are just one or two contact email addresses on a  
site, what's the big risk?

Thanks for enlightening me.

-Joel


On Jan 25, 2007, at 10:11 AM, Edward L. Ford wrote:

> I didn't think about checking user agents.  I just did a cursory  
> investigation of this, and I may personally implement this system.   
> For the bots that say they're a real browser, I'll set up other  
> roadblocks to keep the info on my site (relatively) protected from  
> harvesting.
>
> For others interested in checking user agents, I found the  
> following to investigate:
> o) look in the $_SERVER['HTTP_USER_AGENT'] string.  More info @  
> http://us3.php.net/reserved.variables   There's a built in  
> get_browser() function in PHP, which I never knew.
>
> o) PEAR also has some tools:  Check out http://pear.php.net/package/ 
> Net_UserAgent_Detect
>
> --Ed
>
> -----------------------------------
> http://www.edwardford.net
>
> On Jan 25, 2007, at 11:41 AM, Gjermund Gusland Thorsen wrote:
>
>> Alternative 3 works in theory but still leaves some bots crawling,
>> as the worst bots tells your webserver that it's the most popular  
>> browser.
>>
>> ggt667
>>
>> On 1/24/07, Jason LEWIS <jasonlewis at weber.edu> wrote:
>>> Ed,
>>>
>>> There are a few things you can do:
>>>
>>> 1. Sue to get them to stop (I don't like this option because it  
>>> makes
>>> enemies and takes a VERY long time.)
>>> 2. use a robots.txt (This option is good for bots that actually  
>>> honor
>>> this.  Google honors the robots.txt.)
>>> 3. Detect the type of browser before you give them anything and ship
>>> out bad information for unwanted browser types.  (This is only  
>>> good if
>>> the bot owner does not imitate browser variables.)
>>>
>>> As for my options, I prefer #3 as I can give the bots something that
>>> they are looking for, bad information.  In perl, I use
>>> $ENV{HTTP_USER_AGENT}, but I am not sure how to call this in php.
>>> Anyone else know?  A quick search returned nothing on this.   
>>> Could this
>>> involve $HTTP_ENV_VARS?
>>>
>>> Jason
>>>
>>> >>> elford at cs.bu.edu 01/23/2007 10:18 PM >>>
>>> Hello everyone,
>>> In the past hour, I've done some analysis of various logs and  
>>> emails,
>>>
>>> and I've come to a chilling realization that I've never had before
>>> about bots harvesting information from websites -- I knew it
>>> happened, but I never knew the scope of the problem until tonight --
>>> and this is a low traffic website!
>>>
>>> So, I have a website which contains a public listing of email
>>> addresses and websites from a FileMaker database.  I want to stop
>>> unknown bots from crawling the site.  All of the data comes out of
>>> FileMaker, nicely formatted as links for the end user's clicking
>>> convenience.  I have a solution to fix email addresses from being
>>> harvested, but I was wondering if anyone knows of a way to prevent
>>> website addresses from being harvested, but still clickable as a
>>> hyperlink.
>>>
>>> I thought maybe a PHP redirect link, like redirect.php?id=16 where
>>> redirect puts a user at the website listed in record 16, but once  
>>> the
>>>
>>> PHP is all said and done, we're still at the linked website, so that
>>> doesn't really prevent anything from being harvested.
>>>
>>> Is there a way to maybe detect is a link was actually clicked by a
>>> person, and not just passed through by an automated bot?  PHP is
>>> preferable for such a solution -- JavaScript is too easy to turn
>>> off.  Or, is there a way to specify that only bots from places like
>>> Google, Live, and Yahoo are allowed to crawl the site?
>>>
>>> Hopefully my predicament is clear.  I need to solve this ASAP...
>>>
>>> --Ed
>>> ---------------------
>>> http://www.edwardford.net
>>>
>>>
>>> _______________________________________________
>>> FX.php_List mailing list
>>> FX.php_List at mail.iviking.org
>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>
>> _______________________________________________
>> FX.php_List mailing list
>> FX.php_List at mail.iviking.org
>> http://www.iviking.org/mailman/listinfo/fx.php_list
>
> _______________________________________________
> FX.php_List mailing list
> FX.php_List at mail.iviking.org
> http://www.iviking.org/mailman/listinfo/fx.php_list



More information about the FX.php_List mailing list