[FX.php List] Security Concerns with FileMaker Website

Joel Shapiro jsfmp at earthlink.net
Thu Jan 25 12:20:38 MST 2007


Thanks for your explanation, Ed.  And your efforts are certainly  
commendable (I hope your  higher-ups are aware!)

Just to clarify, then, if a site doesn't have confidential info or  
email addresses & phone numbers, etc., there's no huge concern over  
these bots.  Anybody disagree?

Thanks,
-Joel


On Jan 25, 2007, at 11:04 AM, Edward L. Ford wrote:

> In the case of the website I started this thread about, it's hosted  
> by a major academic institution (see my email address for a clue),  
> so bandwidth isn't really a problem -- in August 2006, the number  
> of pages on our servers was between 400k and 500k (not an  
> exaggeration!) -- I don't know the stats, but I can only imagine  
> our monthly data transfer is on the order of tens of terabytes, if  
> not more.
>
> My main concern is about information confidentiality.  The search  
> engines can be told where and where not to go, but the nasty bots  
> aren't so kind.  Since the information provided on this site is  
> valid contact info, it's sellable info that can be easily picked up  
> automated bots.
>
> The purpose of my site is essentially to be a "Musician White  
> Pages", so that these people may be contacted by anyone within or  
> outside the University for music jobs.  Within this website,  
> contact info such as phone numbers, email addresses, personal  
> websites, and bios are provided, based on how much info people  
> provide when they sign up to be listed.
>
> Now, when someone signs up for the service, they must agree to our  
> privacy policy, which includes a statement:
> "By signing up for this service, you agree to release your contact  
> information publicly through this website.  Steps have been taken  
> to prevent your contact information from showing up in public  
> search engines, but the information is still listed on a public  
> website, accessible to anyone in the world.  By signing up for this  
> service, you acknowledge the risks associated with listing contact  
> information publicly."
>
> I have a robots.txt file that indicates that information about the  
> service should be indexed, but the actual pages with people's info  
> are not to be indexed.
>
> How nasty are these bots?  I got an eye opener this past week.  A  
> new entry on these white pages had been public for only 6 hours  
> before the email address received a spam message -- and this  
> address was brand new, never ever published anywhere before.  This  
> spam message was sent to about 20 people, the majority of which  
> were addresses listed on this site, so I know this email was  
> harvested from that site.
>
> A new concern of mine is also regarding phone numbers.  TXT spam  
> isn't big yet, but let's face it, it's coming -- and so by similar  
> harvesting methods, spammers can harvest phone numbers and start  
> sending spam TXTs to them -- it costs them nothing, but it costs  
> the users, and the cell companies have been slow so far to  
> implement controls since it's more money in their pocket.
>
> So while these folks have agreed to the "dangers" of listing info  
> publicly online and I could stop there and let things take their  
> course, I feel it is my responsibility to do what I can to  
> adequately prevent abuse of the information these white pages  
> provide for protection of the individuals using our service, while  
> still maintaining a user friendly system for those users who have  
> legitimate needs for the contact information listed.
>
> I could go on about my position, but I think this is long enough :-)
> --Ed
> ------------------------------------
> http://www.edwardford.net
>
> On Jan 25, 2007, at 1:26 PM, Joel Shapiro wrote:
>
>> Can anyone briefly explain the risks of these bots and how  
>> "nasty" (a term used in this thread) they can really be?
>>
>> I've looked at the link Gjermund posted, and it looks like the  
>> biggest real problems are email harvesting and use of bandwidth.   
>> Ed's original post was because he has many emails and websites on  
>> his site, but if there are just one or two contact email addresses  
>> on a site, what's the big risk?
>>
>> Thanks for enlightening me.
>>
>> -Joel
>>
>>
>> On Jan 25, 2007, at 10:11 AM, Edward L. Ford wrote:
>>
>>> I didn't think about checking user agents.  I just did a cursory  
>>> investigation of this, and I may personally implement this  
>>> system.  For the bots that say they're a real browser, I'll set  
>>> up other roadblocks to keep the info on my site (relatively)  
>>> protected from harvesting.
>>>
>>> For others interested in checking user agents, I found the  
>>> following to investigate:
>>> o) look in the $_SERVER['HTTP_USER_AGENT'] string.  More info @  
>>> http://us3.php.net/reserved.variables   There's a built in  
>>> get_browser() function in PHP, which I never knew.
>>>
>>> o) PEAR also has some tools:  Check out http://pear.php.net/ 
>>> package/Net_UserAgent_Detect
>>>
>>> --Ed
>>>
>>> -----------------------------------
>>> http://www.edwardford.net
>>>
>>> On Jan 25, 2007, at 11:41 AM, Gjermund Gusland Thorsen wrote:
>>>
>>>> Alternative 3 works in theory but still leaves some bots crawling,
>>>> as the worst bots tells your webserver that it's the most  
>>>> popular browser.
>>>>
>>>> ggt667
>>>>
>>>> On 1/24/07, Jason LEWIS <jasonlewis at weber.edu> wrote:
>>>>> Ed,
>>>>>
>>>>> There are a few things you can do:
>>>>>
>>>>> 1. Sue to get them to stop (I don't like this option because it  
>>>>> makes
>>>>> enemies and takes a VERY long time.)
>>>>> 2. use a robots.txt (This option is good for bots that actually  
>>>>> honor
>>>>> this.  Google honors the robots.txt.)
>>>>> 3. Detect the type of browser before you give them anything and  
>>>>> ship
>>>>> out bad information for unwanted browser types.  (This is only  
>>>>> good if
>>>>> the bot owner does not imitate browser variables.)
>>>>>
>>>>> As for my options, I prefer #3 as I can give the bots something  
>>>>> that
>>>>> they are looking for, bad information.  In perl, I use
>>>>> $ENV{HTTP_USER_AGENT}, but I am not sure how to call this in php.
>>>>> Anyone else know?  A quick search returned nothing on this.   
>>>>> Could this
>>>>> involve $HTTP_ENV_VARS?
>>>>>
>>>>> Jason
>>>>>
>>>>> >>> elford at cs.bu.edu 01/23/2007 10:18 PM >>>
>>>>> Hello everyone,
>>>>> In the past hour, I've done some analysis of various logs and  
>>>>> emails,
>>>>>
>>>>> and I've come to a chilling realization that I've never had before
>>>>> about bots harvesting information from websites -- I knew it
>>>>> happened, but I never knew the scope of the problem until  
>>>>> tonight --
>>>>> and this is a low traffic website!
>>>>>
>>>>> So, I have a website which contains a public listing of email
>>>>> addresses and websites from a FileMaker database.  I want to stop
>>>>> unknown bots from crawling the site.  All of the data comes out of
>>>>> FileMaker, nicely formatted as links for the end user's clicking
>>>>> convenience.  I have a solution to fix email addresses from being
>>>>> harvested, but I was wondering if anyone knows of a way to prevent
>>>>> website addresses from being harvested, but still clickable as a
>>>>> hyperlink.
>>>>>
>>>>> I thought maybe a PHP redirect link, like redirect.php?id=16 where
>>>>> redirect puts a user at the website listed in record 16, but  
>>>>> once the
>>>>>
>>>>> PHP is all said and done, we're still at the linked website, so  
>>>>> that
>>>>> doesn't really prevent anything from being harvested.
>>>>>
>>>>> Is there a way to maybe detect is a link was actually clicked by a
>>>>> person, and not just passed through by an automated bot?  PHP is
>>>>> preferable for such a solution -- JavaScript is too easy to turn
>>>>> off.  Or, is there a way to specify that only bots from places  
>>>>> like
>>>>> Google, Live, and Yahoo are allowed to crawl the site?
>>>>>
>>>>> Hopefully my predicament is clear.  I need to solve this ASAP...
>>>>>
>>>>> --Ed
>>>>> ---------------------
>>>>> http://www.edwardford.net
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> FX.php_List mailing list
>>>>> FX.php_List at mail.iviking.org
>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>
>>>> _______________________________________________
>>>> FX.php_List mailing list
>>>> FX.php_List at mail.iviking.org
>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>
>>> _______________________________________________
>>> FX.php_List mailing list
>>> FX.php_List at mail.iviking.org
>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>
>> _______________________________________________
>> FX.php_List mailing list
>> FX.php_List at mail.iviking.org
>> http://www.iviking.org/mailman/listinfo/fx.php_list
>
> _______________________________________________
> FX.php_List mailing list
> FX.php_List at mail.iviking.org
> http://www.iviking.org/mailman/listinfo/fx.php_list



More information about the FX.php_List mailing list